25 May 2005

Search and connected structured data

The ever interesting Jon Udell writes some more about WinFS and its infrastructure. I mention it really just to draw attention to the support Jon provides in passing for my earlier post about whether search needs metadata (schemas).

Admittedly my "finding versus organizing" distinction was a bit of a cheat, since finding depends sensitively on prior organization. Except when it doesn't: brute-force free-text search routinely trumps navigation and structured search. But OK, we've all got to hope that better organization, someday, will level the playing field. [emphasis added]

The very next paragraph, Jon goes on to describe some of the similarities between RDF and WinFS and the mechanisms they use to relate different content together.

Today's personal information systems are organized hierarchically. WinFS proposes that they be organized semantically. A number of observers have noted a family resemblance between RDF (Resource Description Framework) "triples" and WinFS relationships. An RDF triple, in geek-speak, is a subject-predicate-object relation. Sets of RDF triples can be (and Semantic Web people say must be) used to represent and organize knowledge.

Sytadel  incorporates a very similar mechanism which we call a Relation - just another content type, which happens to implement a useful subset of the XLink standard. We use relations everywhere in Sytadel, to semantically connect two content items together. Relations are typed, which provides the meaning behind the connection.

For example, we have topic hierarchies in Sytadel. Topics are connected in a vertical tree by the Parent topic relation. When a topic doesn't have a Parent topic, then it's a root level topic in the tree. When a topic exists which has no other topics with a Parent topic to it, it's a leaf topic. (As usual in computing, trees are usually visualised upside down, so that the "root" is in the sky, and the "leaves" are at the bottom.) Of course, Sytadel can have multiple root topics. Parent topic typed relations only exist with a topic at either end.

Another kind of relation is the one with type Related topic. Just about any kind of content item (not just topics) can have a Related topic relation. This means you can connect topics horizontally across the hierarchy, not just up and down, providing more interesting navigation and discovery opportunities. And while you can only have one Parent topic, you can have as many Related topics as you like. Thus an article or a press release might appear related to many different topics in the hierarchy. Sytadel has many other kinds of typed relations in use as well.

These typed relations provide a much richer set of data to mine for creating and discovering associative meaning as Udell points out. However, elegant and intuitive interfaces for searching these relations remain difficult to design and implement. Hopefully the resources of Microsoft may assist with that.

My suspicion is that generic ad hoc search interfaces over connected structured data will remain a pipe dream. No one I know voluntarily goes and types in SQL statements to interrogate their relational database. In controlled environments (by which I mean, content whose types and relationships are understood by the supporting system, such as Sytadel), interfaces to search specific kinds of relationships within this structured data may well provide valuable new ways to find information.

In the meantime, we'll continue to resort to brutally effective free text search. It's interesting to note that most effective Web search engines now use algorithms which incorporate the extraction of additional information (such as anchor text, URL text, surrounding paragraph text) about the hyperlink connections between items. This is a form of untyped semantic meaning.

0 Comments:

Post a Comment

<< Home