« David Parnas at the University of Limerick | Main | Under the hood at PubSub »

RDF pixie dust

Brett gets the boot in:

I'm curious how RDF honestly helps in search. Watching RSS, most people generate crap feeds. Honestly. Expecting people to magically generate good RDF descriptions of their sites is almost laughable. And the obvious gambit of writing some ai pixy dust to automatically generate RDF from someone's ramblings is enough to keep me chuckling for most of the afternoon.

Honestly yes, I am looking to build tools will generate the RDF (indexes and metadata). I want to scrape RDF metadata from structured data, analogous to the way spiders today scrape indices from unstructured data. It's much the same issue, but I figure the signal to noise ratio will be better in the former - at least I don't see how it could be worse. It already looks like one of the first things I'll have to do is recast http server logs and syslog as RDF triples.

Part of this project is about exercising RDF in a domain I understand. After it, I expect to know whether RDF has value outside academia and standards worlds and what that value is. I was a huge huge fan of the technology, even serving on the working group for the best part of year, before becoming deeply disenchanted with where that process and the community at large was going (models, models, models) to the point where I felt I had little to contribute other than ranting from the sidelines. For the record, I'm still a fan, on my third reading of Shelly's book, am waiting for Danny's, and despite my opinions on the process, still have enormous respect for the work RDFCore has done. But I take a strong view that RDF metadata should layer on top of statistical and automated magma, not manual data entry; that is pixie dust. This hetereogenity is what we know works in robotics, reinforcement learning* and hybrid AI or for any technique that has to live outside a closed environment. So I see much less need for the tidy substrate and attention to good modelling the current RDF model-think presupposes. I also think the semweb cake is missing or willfully ignoring a key layer that the search engines are thriving in - the environmental noise of the web.

It's not metacrap, it's meta living on crap.

As for the AI pixie dust, I don't see computing RDF from structured data being any more pixiesh that computing pagerank from a page or computing a spam filter from spam (did I say I like hybrid techniques? :). The truth is, I'm at least as skeptical as Brett, but it's like being skeptical about what a computer can do in light of the halting problem - yes there's a hard limit, but you can still do something useful before you get there.

* and will be needed for IBM's autonomic computing feedback loops, but I digress...

[roni size: heroes]

February 28, 2004 12:50 AM


(February 28, 2004 06:22 AM #)


One thing I really don't get is why you keep saying RDF solves the "many vocabularies" problem. This just doesn't make any sense to me.
1) Everybody who uses RDF still invents their own vocabulary. I've already witnessed debates about elements 'colliding'. RDF hinges on everybody choosing the same vocabularies which is kinda crazy imo.
1.5) You keep saying XML namespaces can't be mixed. Last I checked, XML was just data--really just syntax. What's preventing me from mixing them any way I want? What do you mean by this claim that XML vocabularies are islands?
2) If my application "understands" RDF (really it understands a particular vocabulary) how can it magically understand other RDF vocabularies that it has no knowledge about?
3) Do you think that RDFQL is more expressive and powerful than XQuery? I'm pretty sure it's not... so why do you think search would benefit from RDF so much. Would it be better to spend 6mos creating a 'site index description language' that could be XQueried?

I ask all these questions because, well.. you're a smart guy and here you are preparing to invest heavily in RDF and... I just don't see it. What am I missing?

Bill de hra
(February 28, 2004 08:50 AM #)

Hi Bo,

Many good (tough!) questions. I'll take them on in another entry...

Brett Morgan
(February 28, 2004 09:13 AM #)

Well, I must say, I will be watching your adventure with interest. I am personally playing with crossing hybrid ai with rss and plain old html scraping. I wont say anymore until my proof of concept either sinks or swims... ;-)

(February 28, 2004 07:20 PM #)

Well, the "flaws" in RDF were one of the reasons that ontologies are the next stage, via OWL.

OWL provides not only ontology definition, but ontology translation, where you can provide a mapping between your ontology and someone else's. This mapping supposedly includes data conversion specifications (metric to english, etc).

Once that mapping is in place, your code should be able to translate automatically from someone elses language into your own. In fact it may be possible to write xslt stylesheets to do that automatically if you'd rather not build the translation into your own codebase.

or at least, that's the theory.

Trackback Pings

TrackBack URL for this entry: