Vertical searching - published papers
Great to see today that Google (in the form of Anurag Acharya) has just launched Google Scholar [via Google Blog]. In my recent post on how big are everyone's sites, I talked about how interesting it is to see generic search and vertical industry-specific search facilities. Google Scholar is a great example of both. It searches generically over published papers, but also provides an form of vertical search - that of academe. It also shines a light onto some dark matter of the web (as I talked about in that previous post as well), by crawling subscription-only material, following agreements with publishers.
The name of the principal engineer, Anurag Acharya, seemed awfully familiar to me. So using his new facility, I tried searching for publications written by "Anurag Acharya". Sure enough, Anurag has been involved in computing from way back, and I'd read some of his work on distributed vs shared memory computing back when doing my own graduate study back in the mid 1990's. Of course, there may be multiple Anurag Acharya's in computer science, just like there are multiple Peter Bailey's.
I had to do a vanity search of course for my own publications, and am pleased to see my favourite paper (Engineering a multipurpose test collection for Web retrieval experiments) that I wrote (with Nick Craswell and David Hawking) at number 2, with 41 citations.
I think Google have taken an interesting approach by ranking results apparently almost exclusively by number of citations when searching on author names. This certainly provides a rapid method for establishing paper popularity.
Of course, by adding other subject terms, more of Google's ranking algorithm comes into play, and the citation rank is not the only factor. From first impressions, Google Scholar appears to do a better job than Citeseer, which has long ruled in the area of searching for research papers. (Steve Lawrence, the main developer of Citeseer, now works as a senior research scientist at Google.)
The Scholar search is not yet perfect however, as it appears to add in some papers that are referenced by you in one of your papers.
For example, I tried searching on "peter bailey" information retrieval, and got back the classic Cleverdon paper "The Cranfield tests on index language devices", sited in our paper mentioned earlier. My suspicion is that this arises because of the added value being provided by the ACM, which provides lists of citings of ACM papers by other published papers. Hence, since our paper references it, it is listed in the citings by the ACM, and then Google Scholar picks it up as part of the indexable material for the paper. Personally I think this is a bug, and either ACM should cloak these paper abstracts without citings for Google, or Google Scholar should exclude it. After all, I wasn't even born in 1967 when Cleverdon published that paper!
[More analysis and discussion at Search Engine Watch.]