« There was only one catch and that was Catch-22 | Main | Turn and then attack »

RDF datatypes, literals, quads

Datatypes

Mark Nottingham is wondering:

"WHY DOES RDF HAVE A SPECIAL CASE, THEREBY LOSING ITS SIMPLICITY?

I'm talking about RDF datatypes, of course. As far as I can see, they're a special case to the data model; although the datatype itself is identified with a URI, the property 'RDF datatype' isn't, and as a result you can't meaningfully talk about (as in, reason with CWM, or access with most RDF APIs) them using that oh-so-delicious subject, predicate, object triple."

The charter when I was on the RDF wg said, when you got down to it, that RDF had to play nice with XML Schema. That was back when you could remember that XML Schema was meant to be a simple replacement for DTDs and just before people starting seeing serious problems with that technology (ie, it may not be sanely implementable). RDF Datatypes attempted to cover that requirement off.

Anyway, if the RDF wg didn't address that, others would, over and over. Some folks are deeply, deeply attached to data typing - Web Services proves that beyond question. It does not matter whether they are needed or even appropriate, people want data to have machine based types. There's a lot to be said for pre-empting that desire. For example Atom is taking much the same form of pre-emption with link types in the use of atom:link[@rel].

Literals

The literals are another special case that RDF datypes try to cater for. XMl literals in particular proved to be quite hairy; I seem recall a few calls with Jeremy Carroll at the time while we were sent off to figure something out.

There's a been a lot of back and forth on whether literals should be subjects of RDF statements. Some people think that anything worth talking about should have a name - so name it. Others will point out that a huge amount of legacy blob data exists out there that RDF excludes to some degree. Consider that you 'can't talk meaningfully' about HTTP representations in RDF either; that's probably a bigger problem than datatype inelegance.

However none of this type stuff hurts a whole lot for real work, as RDF processors treat type information as inessential - it's optional metadata.What's likely to break is your application making unwarranted presumptions about what information will be available (if you haven't learned your data typing lesson from Web Services at this point, well... mU :)

Quads

Consider another, more significant problem RDF has. I'm currently integrating Sparta (Mark's RDF library) and rdflib into a desktop application, and I can see that soon I'm going to run into the situation where A says X Y Z and B says X Y Z and I will want to preserve the provenance of those two statements as coming from A and B.

The problem here is a straight up loss of information - you can't easily ask 'who said X Y Z?', without the context of the statements. I've never worked on a real-world application of RDF that didn't come up against this issue. Solving it in pure RDF is very clumsy; APIs tend to add fourth item to the statement, often called 'quads', but that can rope your data to the API in question, which is definitely not the point of using RDF. Plus the meaning of quads isn't nccessarily shared between systems. I'm hoping not to have to switch to 4Suite to solve this problem; 4Suite is a big full-featured API and I want to keep things as light as possible. If I can.


April 2, 2005 01:10 PM

Comments

Danny
(April 2, 2005 06:54 PM #)

Thanks for the background Bill.

Re. quads, I'm just in the process of running into the same issues, except I'm expecting to be able to deal with most cases using Redland's contexts (quads in camouflage). The Python binding's been working nicely so far, btw.

Rich Boakes convinced me that reification was a realistic alternative, however not-pretty (check his paper/presentation for Ancona on http://www.rdfx.org/). I wonder if it would be possible to make reification look like quads in the code interface..?

Very interesting to see where you're going with HTTPLR, especially with the RDF aspects. If I ever get some time I wouldn't mind trying it for triplestore-triplestore sync (not sure that necessarily needs the reliability, but certainly would be nice-to-have).

Bill de hra
(April 2, 2005 07:12 PM #)

"If I ever get some time I wouldn't mind trying it for triplestore-triplestore sync"

If you can wait long enough, there'll be a python client and jython/java server being published as open source code.

Patrick Logan
(April 2, 2005 08:18 PM #)

Triples, quads, etc. I have to say I've not used RDF at all explicitly.

I guess quads are intended to support the A-X-Y-Z and B-X-Y-Z relationships concisely?

But won't we soon run into a desire for a fifth party in this relationship? Why stop at quads?

More questions on my blog I guess...

Bill de hra
(April 2, 2005 09:05 PM #)

"I guess quads are intended to support the A-X-Y-Z and B-X-Y-Z relationships concisely?"

Patrick,

They are; the problem is that you would need to a) standardize the idiom and b) 'type' the quad URI so code can dispatch sanely on it. Afaict at the moment quads are useful but not interoperable.

"But won't we soon run into a desire for a fifth party in this relationship? "

In theory yes; languages don't get to be their own metamodels (there are no perfect languages); in practice you get to go a long way with something like the *awesome* hack in the smalltalk object model.

Mark Nottingham
(April 3, 2005 07:37 AM #)

Thanks, Bill, that's helpful. I agree that datatyping is less useful/necessary than many think, but despair at the damage that it's done, particularly to interoperability; if someone else requires/depends on it, I have to work with it to interoperate with them. Argh.

The context problem is also a big one, and I'm frankly surprised it hasn't been dealt with yet in a standard fashion. Redland contexts seem to work.

P.S. Bug reports / suggestions on Sparta much appreciated! :)

Bill de hra
(April 3, 2005 05:25 PM #)

"The context problem is also a big one, and I'm frankly surprised it hasn't been dealt with yet in a standard fashion."

My take on this is that reification covered up some holes in that regard until it was too late. Dave Beckett knows RDF as well as anyone, if he supports contexts in redland there's a good reason for it. I think the W3C will adopt a position on them in time based on implementor feedback. I honestly don't know how the data model would cope with the fourth element; maybe there would be no impact.

Dave Beckett
(April 4, 2005 03:24 PM #)

Yeah, there were good reasons for contexts that I outlined a few times: to the DAWG, when explaining the design of contexts in Redland, and for a SWADE Europe large scale demonstrator.

For these reasons, I support getting something providing access to this into SPARQL as the GRAPH (nee SOURCE) keyword nearby where triples are used.

Let me try to say it briefly if I can. If it's a semantic web of documents, you often need to consider the resulting web as both one big graph and the set of documents that form them, and ask where the answer came from, the so-called "oh yeah?" button.

Post a comment

(you may use HTML tags for style)




Remember Me?

Trackback Pings

TrackBack URL for this entry:
http://www.dehora.net/mt/mt-tb.cgi/1521