17 January 2006

Wikipedia gets a Google blessing for biographical searches

I've been looking into definitional question answering of late, and have observed something that intrigues me. No doubt other people have noticed it as well, but I haven't read a reference to it so maybe it's not totally obvious.

Google appears to be doing some quite clever query processing, and is making use of additional indicators that you're asking a question if you type in queries like "who is ...?", "what is ...?". For some time, they have had the Google Answers service, and include a link tip to redirect you to their pay-for-answer facility for more complex queries - e.g. "what symponies did Beethoven write?".

For simpler "what" queries (e.g. "what is a symphony?"), their Web definitions facility highlights a short factual answer for common terms, selected from one of a number of different Web definitions providers.

But what is most interesting (especially given the recent controversy surrounding Wikipedia involving biographical entries) is that for "who is/was ...?" queries, Google has given a blessing to the authorativeness of Wikipedia by highlighting the Wikipedia entry (if one exists) for the person. e.g. "who was Beethoven?"

Note that this works for both real people, alive or dead, and for imaginary people or things, provided they have a Wikipedia entry (e.g. "who is Winnie the Pooh?").

You can tell you're getting something different, because Google treats the result differently, placing it above any news results for the person, and acknowledging the source with a tag "According to http://en.wikipedia.org/wiki/...".

In a small number of circumstances, Google appears to prefer references from www.who2.com, which provides a service listing details about famous people, but it was not the case that if there is a who2.com entry, then Google returns that in preference. Given that two results which returned me the who2 entry were for "Bill Gates" and "George Bush", it may be that Google use who2 in cases where significant defacing of Wikipedia entries occurs (even if speedily rectified).

I wonder if this is one of the examples of Google applying some of the research/practices from the enterprise search arena that John Battelle refers to, or clever natural language processing (though that tends to be computationally expensive), or just some simple and efficient query analysis.

Overall, my reading of this is that Google believes Wikipedia provides the best results for the vast majority of biographical subjects. I imagine it's unlikely that Wikipedia is doing any deals to get placed in this way, since they don't really need any more traffic than they're getting already! Thus it's a strong statement to make by Google in support of collaboratively-authored and mediated content.

3 Comments:

Elihu Vedder said...

Interesting. Unfortunately, I don't see this when I try "Who is?" questions. I tried both logged in and not logged in. Any clue as to why you see this and I don't? For me, no wikipedia entry comes at the top of the list, not even for Beethoven, as you suggest.

9:01 AM  
Peter said...

That's very interesting! I was logged in at the time, but if you tried both then that can't be the difference. I get it completely consistently - for instance - I just tried "who is larry page?".

Perhaps Google is trialling it for users from certain countries - I'm accessing this in Australia.

9:58 AM  
Elihu Vedder said...

Fascinating! I was doing it from the US. The first result I get for [Who is Beethoven?] is

Ludwig van Beethoven
Article from The Grove Concise Dictionary of music with portrait and links.
Includes information on symphonies, concerti, piano and chamber music, ...
w3.rz-berlin.mpg.de/cmp/beethoven.html - 10k - Cached - Similar pages

All things considered, the Grove Dictionary is a much better source for the result than would be Wikipedia, since the Grove Dictionary is the standard musical reference.

6:37 AM  

Post a Comment

<< Home