Google search (allintexct:) (or) (and)

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 12605
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Google search (allintexct:) (or) (and)

Post by ChrisGreaves »

Forum?

After reading three Helpful Hints pages I built this search string for a Google Search in FireFox FWIW):

Code: Select all

allintext: 709-468-7370 breakfast or bed or room
allintext: is new to me, as is intext:

The phone number is that of a Bed&Breakfast right here in Bonavista.
I expected that phone number alone to return me ONLY hits for the HarbourView Bed and Breakfast.
Breakfast/bed/room were just added terms to make me feel good.

"About 17 results (0.56 seconds)" seemed about right for me; these B&B have postings on several travel/vacation sites as well as Facebook et al..

The seventh or eight entries go off the rails (attached image).

How could/should I modify my search string so that the hits returned are exclusively the targeted phone number?
Assume in this case that whoever runs the B&B is not also the president of the Local Legion, nor runs an Electrical Contracting business on the side etc.

There are two reasons for wanting exclusive hits:-
(1) I could use the report on number-of-hits to qualify each search string (with a score of some sort)
(2) I would not need to add a parsing step to the returned results, that is, to examine each reported hit and drop hits which are NOT (in this case) B&B.

Thanks for any tips.
I have several candidate searches lined up, so any adjustment to the general search expression, I can test pretty quickly.

Thanks
Chris
You do not have the required permissions to view the files attached to this post.
Ignorance is not knowing; Stupid is not asking.

User avatar
Argus
GoldLounger
Posts: 3062
Joined: 24 Jan 2010, 19:07

Re: Google search (allintexct:) (or) (and)

Post by Argus »

ChrisGreaves wrote:
18 Oct 2021, 15:37
The seventh or eight entries go off the rails (attached image).
[...]
How could/should I modify my search string so that the hits returned are exclusively the targeted phone number?"
I'd say it's impossible nowadays.
ChrisGreaves wrote:
18 Oct 2021, 15:37
I expected that phone number alone to return me ONLY hits for the HarbourView Bed and Breakfast. Breakfast/bed/room were just added terms to make me feel good."
Well, you did add the OR operator ...
With allintext it should be similar to: "A" "B" "C"; i.e. as is typed, all found in text, but not necessarily in that order, since AND is implied (I still think).
ChrisGreaves wrote:
18 Oct 2021, 15:37
allintext: is new to me, as is intext:
I posted this reply in another place/lounge 2007 (I belive links not working, but I have other), and if it was bit difficult then, I don't belive it's easier now.

But I think you should look at pages that talk about search engine optimization, SEO.
Odd, I agree. I've seen this, and see it every day.

Understanding search engines can sometimes be hard. It could be that some word (the last added) is in a link from a page, but the other words are not on the page that links. Thus adding a word (I agree it goes against my logic too), could increase the number of hits.

I.e. if a page that links to a page had all the words then it should already be in the result, but if the page that links to a page uses your additional word in the link itself, then it could be added to the second search result, perhaps ...

Aha! Someone says, but I specifically entered these words (and no need for AND in Google) then all should be on the pages in the result. Well, I don't understand search engines. [smile] Sometime it's very irritating to find a search result that do not contain the word, even the cached version does not.

Many probably knows about "site:" (and "link:") as useful operators in Google. You can also use "allintext:" to find only pages with all the words in the body of the page.

http://www.google.com/help/operators.html

Then the search engine must have some limit, searching only X number pages of a web site.

So, if I do a search for:
headlight trunk gem curry, I'll get 14,400 hits
headlight trunk gem curry street, I'll get 15,900 hits

as you have shown. Or 14,500 second time tried, sigh, they updated some index perhaps.

Yep, you will get the same result if you after first result chose Search within results, and add street; you will get 15,900.

See discussion here: What Do You Hate About Google?
http://forums.searchenginewatch.com/sho ... 514&page=2

But if I search for:
allintext:headlight trunk gem curry, I'll get 817 hits
allintext:headlight trunk gem curry street, I'll get 804 hits

Allintext generates smaller results etc, but in general I think a "clean" search will get the top ranked results one normally are looking for.

Allinanchor: will give a result with pages that contain a link with the word(s) in the anchor text of the link.
allinanchor:headlight trunk gem curry, I'll get 480 hits
allinanchor:headlight trunk gem curry street, I'll get 13 hits.

To complicate it, for your word "street" Google also includes pages with the abbreviation "St", sigh.. Compare if search for:
headlight trunk gem curry "street"

one can also use the punctuation operators, as you know +, -, "", ~ (synonyms to the word. Nifty).

And also, Google only shows the first 1000 results for any query. (If anyone would be interested for so many).

Robert Scoble on the subject: Why do search engines lie?
http://scobleizer.com/2006/02/08/why-do ... gines-lie/
Byelingual    When you speak two languages but start losing vocabulary in both of them.

GeoffW
PlatinumLounger
Posts: 3575
Joined: 24 Jan 2010, 07:23

Re: Google search (allintexct:) (or) (and)

Post by GeoffW »

If you enclose the phone number with single quotes, any results returned must contain that phone number.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 12605
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Google search (allintexct:) (or) (and)

Post by ChrisGreaves »

GeoffW wrote:
18 Oct 2021, 16:24
If you enclose the phone number with single quotes, any results returned must contain that phone number.

Code: Select all

allintext: "709-468-7370" breakfast or bed or room
Thanks Geoff.
Regarding your first response: I suppose that if ***I*** were Google I'd always want to slip in a few extra results on statistical grounds, that, especially if they contained advertisements, the extra pages might induce people to click-through or whatever the term is, making those pages sponsors happier, which would be good for Google's reputation.
Travel rental sites such as AirBNB add a zillion places within quite a few miles of your intended destination. I suppose this to be that you might see something ten miles away that induces to stay there instead of where you first thought, and any booking through the travel site is good for that site. No matter that your sister lives in Bay Bulls, you could end up in Witless Bay where lives your mother-in-law.

Regarding quotes, as shown above. I had already tried combinations with quotes

Code: Select all

strSearch = "allintext: "
strSearch = strSearch & "+""newfoundland"" & +""(709)"""
strSearch = strSearch & " & +"""
strSearch = strSearch & Format(lngPhone / 10000, "000")
strSearch = strSearch & """ & +"""
strSearch = strSearch & Format(lngPhone Mod 1000, "0000")
strSearch = strSearch & """ & +"""
strSearch = strSearch & "bed"" & +""breakfast"""
ending up with an awful piece of code just so that I could ring the changes and see what happened.

I have just noticed that introducing DuckDuckGo returns fewer results than a pure Google search, so I suspect the time has come to code a bit more Selenium and rattle through every popular search engine.

I am not against parsing the source results, but thought that if a search string got me close to the results I wanted, there ought to be less coding going on in VBA,

FWIW Google protests with a captcha after ten consecutive robotic searches, whereas Yahoo does not. Another reason for trying different search engines.

Thanks again.
Chris
Ignorance is not knowing; Stupid is not asking.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 12605
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Google search (allintexct:) (or) (and)

Post by ChrisGreaves »

ChrisGreaves wrote:
18 Oct 2021, 16:46
... so I suspect the time has come to code a bit more Selenium and rattle through every popular search engine.
I went to an alternative search engine and grabbed the first ten alternate sites to test manually..
Untitled.png
The score 5/8 indicates that 8 results were returned on the first page and of these 8, 5 were visibly about “harbourview”, my target.
I know that sometimes the text is in the body and not visible on the laptop screen, but the text saved to my Document, so it can be searched under my program control.

I would be inclined to go for 10/10

I used the search string from above (enclosed in quotes and code/code).

Eileen’s Lounge members in search of working solutions to problems would gravitate to BoardReader and avoid MicroSoft's pages of pap.
Google issues Captchas after ten successive inquiries, Yahoo does not. So I want to build this table back up to ten useable engines, and then test them to see which ones don’t turn humans into robots decoding captchas for big firms.
That will take me a few hours ....
Cheers
Chris
You do not have the required permissions to view the files attached to this post.
Ignorance is not knowing; Stupid is not asking.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 12605
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Google search (allintexct:) (or) (and)

Post by ChrisGreaves »

ChrisGreaves wrote:
18 Oct 2021, 17:18
... So I want to build this table back up to ten useable engines, and then test them to see which ones don’t turn humans into robots decoding captchas for big firms.
Untitled.png

These results were obtained at great speed, and might be treated with caution. For example when I saw “dark web” I panicked; also “meta search” and similar tech terms might indicate that I was using a platform not intended for the general public.

After the twelfth engine I got a bit punch-drunk through typing in the name of a search engine in the previous search engine's text box ...

In the second set of ten, two had a hit that went to my post in Eileen's Lounge, so for me that would not be a true hit, but I smiled paternally and let it in.

I have marked “Y” those engines which I will now add to my Selenium roster and while away the hours thus ... looking for non-captcha engines.

Cheers
Chris
You do not have the required permissions to view the files attached to this post.
Ignorance is not knowing; Stupid is not asking.

LisaGreen
5StarLounger
Posts: 897
Joined: 08 Nov 2012, 17:54

Re: Google search (allintexct:) (or) (and)

Post by LisaGreen »

I miss a feature I remember from Alta Vista. No connection with the renowned Bona. THat of being able to search within a search. It may actually be possible and I don't know about it. Which is worse knowing about a feature and it's not there or suspecting a feature is there but not knowing how to use it!?!
I use the + and - features of google all the time and suspect that allintext.. which I didn't know about and has got me rushing to googles help pages to try and discover other useful stuff I've missed.. was introduced because google employees were fed up with typing quotes and plus signs!

Lisa

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 12605
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Google search (allintexct:) (or) (and)

Post by ChrisGreaves »

LisaGreen wrote:
22 Oct 2021, 16:25
I miss a feature I remember from Alta Vista. No connection with the renowned Bona. That of being able to search within a search. It may actually be possible and I don't know about it. Which is worse knowing about a feature and it's not there or suspecting a feature is there but not knowing how to use it!?!
My high school teachers made it quite clear, that ignorance is not a bad thing; but putting up with being ignorant is gross misbehaviour.
So I know that I know only about 0.0000001% of all that there is to know, but because of that, I read books. On just about anything.

I think something like Selenium would let you create a super-search engine. After all, if you can express it clearly in English, then it can be programmed. Right?

An example can be found in this post of mine.

This week, with the power of Selenium, has been a heady week for me, and I might post a week-end report on telephone numbers, if I can get some progress on the wind turbine before David arrives back here and starts nagging me.
Again.
Cheers
Chris
Ignorance is not knowing; Stupid is not asking.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 12605
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Google search (allintexct:) (or) (and)

Post by ChrisGreaves »

ChrisGreaves wrote:
18 Oct 2021, 15:37
After reading three Helpful Hints pages I built this search string for a Google Search in FireFox FWIW):
In the same vein:
I am running an automated search using Chrome & Selenium.

Yesterday I ran one thousand searches, looping from 0 to 9999, asking for every listed/active number in the 709-468-xxxx exchange (Bonavista, since you ask). Got 6,655 hits. Reasonable? High, I think, for BELL landlines. The population of Bonavista is 3,500 to 3,800 depending on whether you include Spillars Cove etc. There aren't that many businesses in Bonavista to lift us to 5,000, let alone 6,000.
My search reviews each page of hits, and if the first three URLs belong to an exclusion list of 51 URLs such as "YellowPages.ca", "411.ca", "CHECK-CALLER"" etc. then that number is discarded; otherwise the number is assumed to be a real and active telephone number.
Thus of 10,000 numbers in my For-Loop, some 10,000 minus 6,655, so 3,345 numbers were weeded out because they failed the "number in service test".

Today I re-ran with the search term "709-468-xxxx Bonavista", in the belief that adding the term "Bonavista" would trim out some errant responses from, oh, I don't know what. Adding a term to the search string will reduce the number of hits because it narrows the search.
Got 8,373 items!

1,718 more than yesterday's number-only search (in quotes, of course, "709-468-0000" rather than 709-468-0000)

Why the leap?
Selenium010.png
Above is a manual search with a sample number from yesterday's search strings. Note the "tel2na.me" domain, which is on my exclusion list. The first three URLs are all on my exclusion list, so this number was ignored, and did not contribute towards the 6,655 total for yesterday.
Selenium009.png
Above is a manual search with a sample number from today's search strings.
Presumably some scam line in New Jersey, of all places :grin: has "Bonavista" in its web pages. That firm makes it to the first URL on the page ogf hits, and so the first three URLs in the hits ("AND" logic) are NOT all on my exclusion list, so this URL made it through, and became one of 8,373 items.

I am prepared to add more of these "scam" and "fraud"-like generic URLs to my list as time goes by, for we have a long way to go. But I draw the line at specific web pages.

You just can't win!

This search was meant to be super-fast (120 minutes) to reduce the number of numbers that are fed to my ten-second-per search (eight search engines) task, so a reduction of 1/3 the numbers is as effective as reducing the time for the main search from 10 seconds to 6.66 seconds.

Cheers
Chris
You do not have the required permissions to view the files attached to this post.
Ignorance is not knowing; Stupid is not asking.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 12605
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Google search (allintexct:) (or) (and)

Post by ChrisGreaves »

LisaGreen wrote:
22 Oct 2021, 16:25
... and suspect that allintext.. which I didn't know about and has got me rushing to
Hi Lisa.
This morning I wanted to exclude results for "airbnb" from search results.
This Help page suggested "There are many search operators to choose from, but below is the one you should use to exclude a domain: -inurl:[write URL here]", so I tried searching for

Code: Select all

Exceptional Value in the heart of Bonavista -inurl:airbnb
Didn't work!

Code: Select all

Exceptional Value in the heart of Bonavista -inurl:[https://www.airbnb.ca]
worked for me, because the search results then included no instance of the text "Exceptional Value in the heart of Bonavista", which tells me/my program, that there is no web page containing that term except on airbnb.ca.
(some of the AirBNB data passed on to me uses a description of the property instead of the property name)

Cheers
Chris
Ignorance is not knowing; Stupid is not asking.

LisaGreen
5StarLounger
Posts: 897
Joined: 08 Nov 2012, 17:54

Re: Google search (allintexct:) (or) (and)

Post by LisaGreen »

To quote a song from the inimitable Dire Straights .. It's a mystery to me, the game commences.

LisaGreen
5StarLounger
Posts: 897
Joined: 08 Nov 2012, 17:54

Re: Google search (allintexct:) (or) (and)

Post by LisaGreen »

I wonder if () could alter the search priorities.

And as a tangent.. There are already a lot of meta search engines on the net. Maybe putting your searches into one of them will help?

https://www.cryer.co.uk/resources/searc ... s/meta.htm

Is from a simple google search!

I used to use copernic and was quite happy with the results.

HTH
Lisa

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 12605
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Google search (allintexct:) (or) (and)

Post by ChrisGreaves »

LisaGreen wrote:
29 Oct 2021, 16:13
There are already a lot of meta search engines on the net. Maybe putting your searches into one of them will help?
Thank you Lisa; I have added this URL (of SEs) to my Whatfaq.
About two weeks ago I grabbed a similar list and evaluated two dozen SEs to see which produced good results and settled on eight that I run in parallel.

This project is not a simple search; it HAS to be based on a specific custom SE (airBNB) to obtain reference data and then I have to search on other SEs to augment that data.

A simple example:-
If you wanted to get the telephone numbers from all the B&B in your area, you could NOT get them from a B&B web site (which is a specialized search engine). Travel web sites do NOT hand out telephone numbers, because then you could short-circuit the booking process. So you go to the travel site to get Names of businesses, then try to determine the telephone number (street address etc) from other sources.

All the telephone numbers in MY area run from 709-468-0000 to 709-468-9999, ten thousand numbers! I can not test each number against each B&B name because
(a) the combinatorial load is staggering and
(b) a surprising number of B&B do not use a business name.
Search for B&B in your area and see how many B&B are titled "Quiet room in cosy town" or similar, rather than "Chris's Cosy Cabin".

Hence a cascading series of searches
(i) determine which telephone numbers in my area are assigned/in use/active and use that sub set
(ii) general search engine (eight in parallel) to search for each telephone number coupled with giveaway terms so something like

Code: Select all

"709-468-1234" and (bed or breakfast)
Not what I would choose, but you get the idea.
Right now the theory is that the most common 709-468-xxxx telephone number in the 20 to 30 document pages of text harvested as hits, is probably the telephone number of a specific business.

Finding suitable SEs for this task is a small project in itself!

Cheers
Chris
Ignorance is not knowing; Stupid is not asking.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 12605
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Google search (allintexct:) (or) (and)

Post by ChrisGreaves »

I should add that a number of search engines are not good at all when reporting on 10-digit telephone numbers:-
Untitled.png
This first page of results has three entries (I have over a thousand pages on my web site, all with my telephone number) and yet I can assure you that not one of the people named in the second hit live in or near Bonavista, and unless I was drunk at the time, have no recollection of visiting San Antone. And wouldn't have gone to the children's hospital anyway, not with so many Little Free Libraries nearby!
Cheers
Chris
You do not have the required permissions to view the files attached to this post.
Ignorance is not knowing; Stupid is not asking.

LisaGreen
5StarLounger
Posts: 897
Joined: 08 Nov 2012, 17:54

Re: Google search (allintexct:) (or) (and)

Post by LisaGreen »

LOL!!! Drunk.. yes been there.. and never knew until a package I wasn't expecting arrived from alie thingy 2 months later!!

It does sound though, like basic plodding work!

Lisa

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 12605
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Google search (allintexct:) (or) (and)

Post by ChrisGreaves »

LisaGreen wrote:
29 Oct 2021, 17:30
It does sound though, like basic plodding work!
Exactly! Exactly the sort of job that a computer should do.
And that's what I make when the day's all wet,
It's a good sort of brake, but it hasn't worked yet!

Cheers
Christopher
Ignorance is not knowing; Stupid is not asking.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 12605
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Google search (allintexct:) (or) (and)

Post by ChrisGreaves »

LisaGreen wrote:
29 Oct 2021, 16:13
And as a tangent.. There are already a lot of meta search engines on the net. Maybe putting your searches into one of them will help?
More Work for Chrissy. :groan:

I am thinking ... At the start I wanted to use a variety of SEs, looked up a list and chose ten out of about thirty.
Some of the SEs on your list(grin) bundle a search into ten or more search engines.
This has advantages and dis-advantages.
(1) While these search engines return data, they return a lot of fluff, which may cloud our analysis for contact details (which is a statistical search)
(2) Some metaSE (e.g. eTools) includes Google which obviates the Captcha!
(3) These metaSE reduce my need to implement several interface drivers
(4) Often, though, I still get, say, only one hit from all those engines.

For the serious scraper we need to set up an Acceptance test which goes way beyond seeing "who is best at locating ["the lancaster inn" bonavista newfoundland].
So far we have available for use 40 search engines: AOL, About, Altavista, Ask, Base, Bing, Brave, Direct, Directory, DuckDuckGo, Exalead, Excite, Fast, Fastbot, FindWhat, Findelio, Google, Goto, Hit, Hotbot, Infoseek, Inktomi, Jeeves, Kartoo, Lilo, Live, LookSmart, Lycos, MSN, Mojeek, Moose, Overture, Qwant, Search, Snap, Tiger, Webcrawler, Wikipedia, Yahoo, and Yandex
There is more in the attached document.
Cheers
Chris
You do not have the required permissions to view the files attached to this post.
Ignorance is not knowing; Stupid is not asking.

User avatar
ChrisGreaves
PlutoniumLounger
Posts: 12605
Joined: 24 Jan 2010, 23:23
Location: paused.undefined.exposed

Re: Google search (allintexct:) (or) (and)

Post by ChrisGreaves »

Google Search Engine is not necessarily the best engine if you have specific search strategies in mind. If you are searching for a vacation rental, tripAdvisor, Expedia, airBNB and the like are far better search engines, although you pay for their services.
I decided to measure search engine suitability, and gathered 51 engines, some basic (like Google, Bing) and some meta (like MonsterCrawler). "Best" in my case was defined as:-
    Finds the sort of data I am looking for.

To that end I set up two tables in an MSWord document.
Table 1 held known data on ten results I was looking for - vacation rentals with the name, town, province and as a result of half a day's manual searching by me, street address, phone, email, website and all the stuff that the vacation rental engines do not want to tell you (for obvious reasons)
Table 2 held the run-time parameters for five of the 51 search engines (element class, element identifier, search engine URL and so on).

Two nested program loops - by the ten properties and by the five engines, and for the fifty combinations I kept score of each engine (attached document)
I used a scoring algorithm that awarded merit and demerit points for matched and mis-matched fields, and then ran again with a simpler scoring algorithm.

Both results showed me that four of the five engines grouped together, so I might drop the outlier.

I had not previously thought of grading search engines, but now that I have this neat little application I might use it more often. Some search engines will offer better results on almost whatever you are searching for. It might pay to use the best engine for that particular search.

Cheers
Chris
You do not have the required permissions to view the files attached to this post.
Ignorance is not knowing; Stupid is not asking.