« How many mailing lists? | Main | On the job: recommended reading »

RDF, pedantry, and the web

Warning: of interest only to people who care about RDF graphs, are comfortable with jargon like "surface syntax", "model theory" or "entailment", web architecture minutiae, and have heard of a guy named Tarski.

I think that what the semantic web needs is two rather different things, put together in a new way. It needs a content language whose sole function is to express, transmit and store propositions in a form that permits easy use by engines of one kind and another. There is no need to place restrictions or guards on this language, and it should be compact, easy to use, expressive and syntactically simple. The W3C basic standard is RDF, which is a good start, but nowhere near expressive enough. The best starting-point for such a content language is something like a simple version of KIF, though with an XML-style syntax instead of KIF's now archaic (though still elegant) LISP-based format. Subsets of this language can be described which are equivalent to DLs, but there really is no need to place elaborate syntactic boundaries on the language itself to prevent users from saying too much. Almost none of them will, in any case. Pat Hayes

Mark Baker was wondering:

Self-description and namespace mixing If I produce a multi-namespace document, am I automatically importing the entailments of those namespaces? Dan Connolly says yes (at least for RDF Schema), and I disagree with him. But I lack the background in this space to be able to convince Dan (or even myself, for that matter). It's just a hunch at this point, but the issue has very important consequences, especially to REST which requires self-descriptive messages.

Let's get "entailment" straight. Entailment has to do with true sentences (or formulae) in a formal language. If a sentence A, "entails" B, that's to say "when A is true, B is neccessarily true too". Then any interpretation which holds A as being true, neccessarily holds B as being true. Roughly, for our purposes an RDF graph is much like a sentence. Contrariwise, there can be no intepretations where A is true and B is false. Indeed, searching for such "nonsense" interpretations is a technique to determine the internal consistency (or not) of a formal language.

If we produce a multi-namespaced document, we don't import any entailments. Namespaces don't imply entailments. There's no notion of namespaces or QNames in the RDF Model. They're specifically a hack to get URIs into XML, for some definition of hack. Or, we could reasonably say that namespaces in XML are a surface syntax macro without which we couldn't use XML to ship URIs around. In themselves they have no bearing on the RDF graphs being shipped about. And they certainly have no bearing on the RDF Model.

Now suppose we dispose with the namespace macro for minute and said we produce a multi-URIed document. Strictly, we still don't import any entailments, because URIs don't suggest entailments, sentences (graphs) do. Also, while abstractly URIs are terms, within a document they are simply marks and as such have no semantics.

We imply entailments, not through the use of terms, but by announcing the formal language of discourse. When I say I'm speaking OWL you may assume the semantics of the OWL language as expressed through the sentences I impart to you (becuase in turn you assume I wish to commmunicate clearly). Once we have shared semantics we can begin to agree on things like entailment. But, as a practical matter we might want to use URI terms to do exactly that (importing semantics), if it turns out the mimetype mechanism is unsuitable to describe semantic web languages.

One approach is to say that each semantic web Model Theory ("MT"), a theory about formal language, gets its own mimetype. In this approach and with respect to the web, the semantics of something like RDF/XML is defined by fiat - whomever defines a mimetype for RDF/XML gets to say that the RDF MT applies, and it's up to the rest of us to follow that convention or not. Now on the web, we can drop some OWL into an RDF graph, serialize it as an RDF/XML, declare the RDF mimetype, and we're set. However unless the RDF mimetype used has something interesting to say about using the OWL MT, we can't really apply any computations over and above the RDF MT without crapping all over any number of principles that make the Internet work. Well actually, of course we can - after all, who's going to stop me interpreting OWL URIs as OWL? But in terms of the reality of clients and servers, this is bit like the GET-7 rathole of the consequences of your (and your user-agent's) actions - the publisher of OWL in an entity body who declares it with an RDF mimetype incurs no risk by having it interpreted as OWL. The representation is to be understood as whatever the mimetype says it is. If that happens to be RDF and only RDF, then the consequence of interpreting it as anything else is at the intepreter's cost, not the publisher's. Just as interpreting application/octet as HTML is your problem, so is interpreting application/rdf as OWL.

The problem with this approach is that is doesn't lend itself well to mixing and matching formal languages (as opposed to URIs). Today we only really mix subgraphs of a particular formal language, but it's not going to be long until we'll start to construct hybrid domain models using a variety of formal languages each with their own MT, and you would assume, mimetype.

The other option is to drop mimetypes (except for application/rdf+*) and target the URIs themselves for import. In other words, if you use a term unique to a particular formal language you are bound to the theory of that language, even if you didn't know what you were saying.

There are immediate problems with either approach (or any approach using mimetypes). First is the exclusion of hackworthy processing of RDF, such as is common with RSS 1.0, Dublin Core and FOAF today - I doubt more than a fraction of code processing these vocabularies is compliant with the RDF MT (and why should they be, if what they do is useful?). The second is further away but quite serious - individuals and organizations may not care to be held to the logical entailments of their published graphs. As an industry we don't expect to be held responsible software defects - will it be any different when new software is data driven in this way? Then again this may work out just like OLAP and Data Warehousing - where we pay a lot of money to figure out what the hell we've actually said across a number of domains, without much concern about where the inferences lead.

Deep down, I have the sense that this might well become a big a mess as the URI name/addressing debacle. While there are only a few ratified semweb languages it's tolerable to use mimetypes. But if the semweb is even remotely successful, and is even remotely like the KR, ontology, and AI fields it borrows heavily from, then we can expect a myriad of formal languages, all keyed off RDF and we can expect users to mix and match terms from these languages literally without knowing what they're saying.

There are other alternatives, such as negotiation to a language. This is not pie in the sky. There have been real results, and real work done in internet protocols, AI, economics, and multi-agent computing that allows two entities to automatically agree on how to impart information, including utilizing an interpreting entity.

Mark also points to something Dan Connolly said over on rdfig as part of an argument pro people accepting the entailments of their sentences:

we need as many model theories (i.e. constraints on terms) as we need terms
neither RDFS nor OWL is special.
they're just like the C standard library.

Only the second sentence is true. We do not need an MT for every term. We need an MT for every formal language. For every term we need an intepretation (I) that maps a meaning to a term- RDFers usually call this "denotation".

And an MT is nothing much like the C standard library, as I understand the analog (C ~= RDF). OWL is closer to java/javac than time.h, and an OWL vocabulary is more like an EJB domain model than a C program. You can't define the theory of OWL in RDF the way you can define time.h in C. OWL is a distinct, more powerful formal language to RDF and as such has both a distinct theory and set of formulae. Nevermind that the semantics of C and Java are decidely non-trivial compared to RDF and OWL - so much so that the comparison quickly breaks down. To get an idea of the sense of this breakdown, try running your EJB source through the gcc reasoner and see what happens.

October 13, 2003 10:39 PM


(October 14, 2003 11:14 AM #)

Phew, that was a bit hard this early in the day...

Thanks though, I'm beginning to get a clearer picture of the twisty world of imports. Not really there yet with how mimetypes could cause problems, need to catch up with the lists I 'spose.

Re. The not-very-good analogy C/Java RDF/OWL: I'm pretty sure you're right, though perhaps DanC's analogy of standard libraries can be useful, in the same way that OO inheritance is useful to get the idea of RDF class/property inheritance across. Only when it comes to treating it the same way in practice does it fall flat on its face.

Re. Tarski - dark-haired bloke, played opposite David Soul in 70's cop show. Reckoned that any sentence which is persistent through expansion is existential, and liked D.I.S.C.O.

You knew that was coming, didn't you ;-)

Mark Baker
(October 14, 2003 08:24 PM #)

Yep, that's almost exactly my position. Dan's is understandable from a simplicity POV, but I don't *think* it scales.

FWIW, where I disagree with you is where you say "We need an MT for every formal language.". I don't think we need quite that many, and I'm hoping much less than that (though perhaps we just disagree on what a "formal language" is). I just think we need one per language which is not entailed by an existing language (at least that's the most succinct way I can think of saying it at this point in time).

Trackback Pings

TrackBack URL for this entry: