Can't do it, can we? Well, now we can. This is a function that's been around for... (this pause in important, just to drag home the enormity of what I'm about to say) five or six YEARS according to one blog. It's an undocumented feature apparently. We've had a work around (that we clearly haven't actually needed) by using the asterisk symbol such as three * mice for a single word between two others, or three ** mice for up to two words between two others and so on. However, we can use the proximity search operator AROUND(x) to work more effectively. You have to put around in capitals to ensure that Google knows you want to do a proximity search, then add in the brackets with a number in there. Why Google has decided to use 'around' as a proximity term is a bit odd - most sensible resources would use something like NEAR instead, but that's Google for you.
Let's see how it works:
National Orchestra gives us almost 8 million results. To be clear - Google is finding any page that includes both national and orchestra anywhere on the page, and ranking by many methods including proximity.
If we now do a search so that we've got one word between:
We've chopped down to just over half a million results.
It doesn't actually seem to matter if there's a space after the D and before the opening bracket, as we get the same result.
Moving on:
We're increasing the results as we should, and again:
All makes perfect sense to me. Can you sense a 'but' coming up here though? I'm sure you can. However, let's end this section by saying that it's great we've got proximity search with Google, though why they've never actually bothered to TELL us, or why they're using such an odd term for it is strange.
Let's try and compare the use of this with a few other functions. "national orchestra" is telling Google that we want the two words next to each other in a phrase.
Fair enough. Now, doing the same search national AROUND(0) orchestra should give us the same result shouldn't it - we're asking for the two words to be apart by no words in between:
But no. A completely different figure, much reduced. If anything I would have expected it perhaps to have been the other way, if Google had run my search as either orchestra or national as the first word (given that in the phrase search I'm telling Google I want national first). Let's torture Google a little bit more:
Now - since I'm telling Google that I don't want any spaces between the two words (first part of the search) and to exclude anything that DOES have spaces between them (second part of the search) I should get 0 results. But I don't. I checked a few pages for 'national orchestra and couldn't find any - there was always a word in between so either AROUND(0) doesn't work, or -phrase takes precedence.
How about the old asterisk work around. A search for national * orchestra should give us the same result at national AROUND(1) orchestra shouldn't it? It doesn't of course - 103,000,000 hits as against 562,000 hits - a difference of about 102,500,000 (not forgetting the Google catch all escape clause of 'about so many results).
So...
Asking for a word between national orchestra and excluding any site that contains a word between national orchestra should give us nothing. But we have over 1 million. "Ah" but I hear you cry (and as an aside, the fact that you've kept up this far is quite phenominal!) "perhaps the * symbol just means one or many words between national and orchestra?" Good idea, but no. A search for national * orchestra gives us 103,000,000 results, but national ** orchestra gives us 111,000,000 results. Any more than that - national ****** orchestra still just gives us 111,000,000 hits.
Summary: We have proximity search in Google. I think it works - it kinda seems to work, and I think it's better than what went before it, but as with most Google functionality when you're dealing with big numbers it seems to break down quite quickly. But that it's been around for years? And they never told us - if that's actually the case, has just dumbfounded me. I'll do some digging and asking around, see if the other search engine folks know any more than I do.
Phil,
I'm not sure why I'm so excited by your discovery - maybe it's a hankering for the days of Dialog.
I tried a couple of searches of my own and first of all found it almost impossible to use AROUND()with Google Instant switched on.
My results were as follows:
privacy internet 1,620,000,000 hits
privacy AROUND(10) internet 272,000,000 hits
all the way down to ...
privacy AROUND(0) internet 250,000,000 hits
not much difference between 10 and adjacency, and much higher than the two string searches below:
"privacy internet" 200,000 hits
picks up lists where the words happen to be adjacent or categorisations of software such as: privacy - Internet etc.
"internet privacy" 1,060,000
So for this search it seems that the quotes narrow the number of hits by a factor of 1,000.
I look forward to trying this feature out on some other topics - it might work better for more unusual topics where searching for a string between quotes is too specific.
Thanks for a thought-provoking blog.
Posted by: David Haynes | January 09, 2011 at 03:13 PM
I always understood the asterisk to be equivalent to AROUND - but without the possibility of specifying the number of words i.e. National * Orchestra was the same as National **** Orchestra. (I read / heard once that the * meant that words could be up to 10 words apart). It's possible that duplicating the * has the same sort of impact as a search for "national orchestra" "national orchestra" i.e. duplicating the search terms gives a different result.
Posted by: Arthur Weiss | January 17, 2011 at 04:21 PM
Arthur - when the * was introduced it was supposed to stand for a word, so ** would mean up to two words in between and ***** would be up to five, but that's gone by the board now. I believe that as a consequence they are two different functions, but with Google as inept as they are, who knows?
Posted by: Phil Bradley | January 17, 2011 at 04:48 PM
The reason they don't publicise the around facility is the same as for a few other Google tricks - it's computationally much more expensive than simple searches.
Posted by: PaulOnBooks | January 18, 2011 at 12:23 AM
My experience is the same as Phil's - the * originally meant one word (and ** meant two words etc) BUT it pretty soon came to mean just NEAR and several of us figured out it was actually doing w/para (within the same paragraph) since we regularly got (and still get) up to 25-30 words between the two terms, including the end of one sentence and start of the next. It's possible that more than one * (e.g. **) defaults to the original "one word per star" meaning but it's more likely IMO that it's no longer supported and therefore disregarded. Google's Help (which doesn't often say enough) says that * is treated as "a placeholder for any unknown term(s)" - which is how it works at least by me.
I'm excited about AROUND() too - and am not surprised that the minimum is AROUND(1) i.e. that (0) doesn't work.
OK, let's try my favorite subroutine for finding the names of "key opinion leaders" who have been quoted on a topic I'm researching: in Lexis it's ((said w/2 (dr or prof!)) - together with the topic of course. So I plug into Google:
heart-disease * said AROUND(2)(dr. OR prof. OR professor) and do a news search and whaddaya know, it actually works pretty well. It even interpreted the OR clause right (as AROUND(2) DR OR AROUND(2) prof... etc). EUREKA!
Posted by: Judy Koren | January 18, 2011 at 04:23 PM
Very strange!
However I've confirmed it works the way you've described - at least for the firs few results.
Searching for the following;
apartment AROUND(1) galt
thru
apartment AROUND(5) galt
Yields a consistent first result in which they have highlighted the searched term, correctly bolding the appropriate number of terms between the search terms.
However, searching using the *, asterisk yields completely different results.
It will be interesting to follow up on where this goes, and I can't image it's been around 5 years with any accuracy making it worth using - or publishing.
Posted by: Steve Simofi | January 27, 2011 at 02:10 AM