ChrisGreaves wrote: ↑18 Oct 2021, 15:37
After reading three Helpful Hints pages I built this search string for a Google Search in FireFox FWIW):
In the same vein:
I am running an automated search using Chrome & Selenium.
Yesterday I ran one thousand searches, looping from 0 to 9999, asking for every listed/active number in the 709-468-xxxx exchange (Bonavista, since you ask). Got 6,655 hits. Reasonable? High, I think, for BELL landlines. The population of Bonavista is 3,500 to 3,800 depending on whether you include Spillars Cove etc. There aren't that many businesses in Bonavista to lift us to 5,000, let alone 6,000.
My search reviews each page of hits, and if the first three URLs belong to an exclusion list of 51 URLs such as "YellowPages.ca", "411.ca", "CHECK-CALLER"" etc. then that number is discarded; otherwise the number is assumed to be a real and active telephone number.
Thus of 10,000 numbers in my For-Loop, some 10,000 minus 6,655, so 3,345 numbers were weeded out because they failed the "number in service test".
Today I re-ran with the search term "709-468-xxxx Bonavista", in the belief that adding the term "Bonavista" would trim out some errant responses from, oh, I don't know what. Adding a term to the search string will reduce the number of hits because it narrows the search.
Got 8,373 items!
1,718 more than yesterday's number-only search (in quotes, of course, "709-468-0000" rather than 709-468-0000)
Why the leap?
Selenium010.png
Above is a manual search with a sample number from yesterday's search strings. Note the "tel2na.me" domain, which is on my exclusion list. The first three URLs are all on my exclusion list, so this number was ignored, and did not contribute towards the 6,655 total for yesterday.
Selenium009.png
Above is a manual search with a sample number from today's search strings.
Presumably some scam line in New Jersey, of all places
has "Bonavista" in its web pages. That firm makes it to the first URL on the page ogf hits, and so the first three URLs in the hits ("AND" logic) are NOT all on my exclusion list, so this URL made it through, and became one of 8,373 items.
I am prepared to add more of these "scam" and "fraud"-like generic URLs to my list as time goes by, for we have a long way to go. But I draw the line at specific web pages.
You just can't win!
This search was meant to be super-fast (120 minutes) to reduce the number of numbers that are fed to my ten-second-per search (eight search engines) task, so a reduction of 1/3 the numbers is as effective as reducing the time for the main search from 10 seconds to 6.66 seconds.
Cheers
Chris
You do not have the required permissions to view the files attached to this post.