25 May 2005

Memex and associating a distributed information repository

In Vannevar Bush’s prescient paper As We May Think, the foundations of modern information retrieval were laid. The memex as described by Bush is an intriguing combination of tools to aid humans in discovering, cataloguing and/or commenting upon, and associating information. Bush discusses the capacity to make associative leaps between information as one to which humans are supremely adapted, whereas our ability to memorise and mechanically sift information ourselves is relatively very poor. As a consequence, we’ve developed some amazing technologies for recording information and retrieving it.

Most, if not all, of the pieces of the technology puzzle which are needed to build an effective memex exist today. However, they are yet to be combined into a single seamless environment.

I have: a web browser for general information access; highly effective Internet search engines such as those from Google and Yahoo to find information from the Web at large; Microsoft’s OneNote for writing notes and cataloguing them; this weblog to record publicly my ideas and thoughts; and our own Sauce Reader for subscribing to chosen high value information feeds. But nothing exists to tie them all together and to provide highly effective search over both Internet resources and my own personal information.

(Sue Dumais’s work at Microsoft Research on Stuff I’ve Seen is a great example of where Microsoft is heading with this kind of concept, as are their much reported intentions of integrating search thoroughly into the Longhorn release of the Windows operating system, sometime in the coming years.)

These tools are essential as part of our armoury to prevent us from drowning under the information deluge. By not yet having them integrated, we impose appreciable cognitive overloads on our brains as we switch between different software packages, leaving us less productive than we could be. I am constantly staggered by how much information is available to me, almost instantaneously. I am also convinced that if only I could have less disruption between different information manipulation tasks, the task of associating information in useful new ways would become easier.

Bush envisaged the memex as being capable of storing, recording and retrieving vast quantities of information, all within a desktop environment. While our capacity to carry out this vision due to the massive increase in storage capacities and processing power over the past decades has increased, we have simultaneously accelerated the quantity of information being produced and recorded. A UC Berkley report How Much Information estimates that written information has been growing at the rate of 36% a year in the three years since their previous study, and in 2002 was estimated at 1.6 petabytes (a petabyte is 1000 terabytes, and a terabyte is 1000 gigabytes). Even compressed, the data is approximately 0.3 petabytes. Right now, no one will be able to afford the money or space to store this volume of information locally on their desktop. (It may be the case that this balance will change over time given ongoing improvements in technology, but ongoing increases in the amount of information may continue to prevent it being accomplished.)

But then, why would you given the highly effective distributed nature of the Web? The key to building a memex to meet Bush's grand and almost 60 year old vision will rely not just on building an integrated suite of highly effective tools for accessing, indexing and recording information in this vast distributed information repository, but building tools that enable people to create or augment their own associative content architecture over it as well.

0 Comments:

Post a Comment

<< Home