« YesWiki | Main | Excellent piece by Kevlin Henney »

RDF Name Service

Bill Kearney: Documents vs triple stores?

Following the TAG list lately, I'm starting to think that RDF needs a DNS. Bill nails the use case.

Where this becomes an issue for me is how to tell that statements made in a Foaf file are authoritative. As in, how can I state that this is MY own Foaf file and that statements I choose to make about resources I control are to be considered authoritative. This, to many people, is a very obvious need. When you start smushing data together it gives rise to possible problems where the statements I make about something under my direct influence are contradicted by others. I'm not expressing this as a control-freak issue but as a genuine concern.

If you look at how DNS works, you's see it has the notion of authority built-in:


dehora:~ 504$ nslookup www.ideaspace.net
Server: ns1.tinet.ie
Address: 159.134.237.6

Non-authoritative answer:
Name: homepage01.svc01.clickvision.com
Address: 208.171.83.4
Aliases: www.ideaspace.net


That's telling me the query did not go back to the authority for Bill's zone, so caveat lector. This allows DNS itself to scale as well as controlling the amount of it that goes over the Internet, which can be substantial. It also improves responsiveness. At work we recently installed a caching DNS server behind the firewall at one of our offices. The difference it makes to browsing is evident. Now consider that triples queries will be machine generated and so happen very likely at rates orders of magnitude more than humans can force today. A web where everyone has their own spider (Metcalfesque predictions of web meltdown are no doubt imminent)

Something like this would be desirable for triple lookups, or any web-centric fact base.

When everything's dumped into a repository there's going to be a little trouble 'being sure' that you've got correct stuff dumped into it. The web services folks are perhaps scratching their heads here, thinking 'well, duh, just go resolve the URL and parse it' and they're correct.

Well no, they're wrong if they say that. Sometimes you don't get to go back to the authority. Deep down, the web is a mish-mash of caches, mirrors and geographically located content delivery networks all doing their bit to make it scale. Often, the physical activity involved in routing requests and responses between machines has little bearing on the logical structure of the web -being the clients and what web-heads call origin-servers. Arguably web caching isn't sufficient for the semantic web and is an undesirable munging of layers (the output of your reasoner suddenly depends on a Pragma header).

For the most part, always going back to the origin doesn't scale, inferential fidelity be damned. To make that work we'd need a P2P, not client-server architecture, or much mroe likely, we need to build the semantic web loading tools with caching and staleness in mind, if not the reasoners themselves. Technologies like client-sided caches and offline access need to improve, greatly. For the reasoners proper, Truth Maintenance Systems are the first port of call.


August 4, 2003 07:17 PM

Comments

Bill Kearney
(August 4, 2003 10:41 PM #)

Hmmm, rather nitpicking I'd say. On one hand it is correct to say that a resolvable document MAY not result in retrieval from that actual source due to caching or other network-related issues. But this is certainly 'less worse' than retrieval of the triples from a local store. The document is, in the case I'm suggesting for FoaF, the authoritative source for the context. Now, being sure you have the correct document, not affected by caching or other transport issues, is certainly something to consider.

The only point I'm raising is the consideration of using a triplestore without context vs the document itself. The points you raised, while valid, are well outside the scope of what I'm discussing.

Trackback Pings

TrackBack URL for this entry:
http://www.dehora.net/mt/mt-tb.cgi/1053