Mass customisation of content
With the launch of MSN Spaces today (as widely reported in the blogosphere), I've been contemplating the mass customisation of content.
MSN Spaces (which seems to be the Powerpoint approach to blogging) is a very powerful concept. Provide a tool that thousands of people a day will use to start communicating, and what do you get? A facility for mass customisation of networked content that all shares much the same general shape and structure. (Of course, this is done by Blogger as well and others.)
Weblogs in general have the same principles, as does indeed the web at large, but with progressively looser degrees of conformance between different content items.
Why should the general shape/structure of content be interesting?
Well, the answer comes down to the degree to which more implicit information can be extracted from it.
The cleverness associated with Google arose because they extracted some of this implicit structure from the general morass of the web (namely recurring patterns associated with hyperlinks) to more effectively calculate relevance to our information needs.
Now the web is a pretty loose association of content, syntactically strung together just with hyperlinks. The folks involved in the Semantic Web (TIm Berners-Lee no less) are working towards a substrate for recording networked information that will embed meaning (semantics) into the very descriptions of the content. (For example, it might include information that a piece of content is a photo, and that this is a collection of photos belonging to me, and that I'll license people to use them under the Creative Commons license.)
My suspicion is that it's going to be a long time before we see the web in full semantic web glory, if ever. Why? Because I don't believe most of us are librarians - cataloguers - and that's what you need to be good at. (Not to mention the known problems of using a taxonomy.) So the tools for creating semantic web content had better be good, and there's never going to be a general purpose tool that's useful, because content is intrinsically mass customised.
That's why content-specific applications, like MSN Spaces (which let you nominate content as: this is my blog and these are my posts, these are my lists and here is a list item, this is my photo album, and here are my sections, here are my photos; and here are names for all of these things) are gold for finding more implicit structure and messy meaning, without the formal rigour of the semantic web.
Of course, to make use of the patterns that are associated in this implicit structure, you really need access to a very large computing facility. Because the volume of data we're talking about is huge, and growing bigger at a staggering rate. The recent article about Google's Urs Holzle describing the technology challenges gives some indication of how much computing power and system management capabilities is needed to throw at what is a comparatively straightforward and embarrassingly parallel computing problem. In fact, I suspect there's only a handful of companies in the world (Google, Microsoft, Yahoo!, Ebay, Amazon) who know how to run such computing systems.
And preferably, you also need access to all the user data associated with how people interact with any service you provide over this content.
That's why if you're trying to understand what people do when they search, you need a really big search engine (Yahoo! or Google or MSN Search).
If you're trying to understand what information people need to buy a product, you need a really big shop (Amazon or Ebay).
The bigger you are, the more data you have, and the better service you're going to be able to offer.
That's why MSN Spaces is a smart idea for Microsoft, and even smarter to tie it all in with their other technologies, like Windows Media Player and MSN Messenger. It's not that they couldn't integrate with other open technologies for these facilities, but they can get better instrumentation of the flow of interaction between various communication products that they provide with the hundreds of millions of people who use them. And long term, it's owning that knowledge which delivers you power in the technology arms race.