« RSS1.0 in drag | Main | Stop downloading the web into a database »

Web ontologies: problems, benefits

Computers are insanely, hair-tearingly, stupid - they have to be told everything in precise detail. Things you don't normally need to be clear about, ever, have to be written down in exacting detail for a computer. This is done usually in languages which are simultaneously not designed to express logical relations, unforgiving in their exactness, and bereft of anything you or I might call expressiveness. But, if you think being precise in any language is easy, try reading the terms and conditions of your credit card.

The RELATIONSHIP list should make it obvious that explicit linguistic clarity in human relations is a pipe dream. It probably won't though - the madness of the age is to assume that people can spell out, in explicit detail, the messiest aspects of their lives, and that they will eagerly do so, in order to provide better inputs to cool new software. - Clay Shirkey

There are a number of problems with declaring relations like isBossOf for use in computers; three come to mind, none of which Shirkey addresses.

First off, the chance is that they're modal: that means first order logic can't express them, which means we can't unambiguously specify what they mean (modal logics remain... "controversial"). Anyone who's well-informed on the matter recognizes there is a need for some amount of code to support ontology definitions. In the immediate future you are most likely to see such "scripting" capabilities built into business process specs, being in many cases, ontologies in denial.

Second, they're unconstrained: a network filled with unconstrained relationships is perhaps no better than a network filled with unconstrained method calls. The limitations of unconstrained interfaces we have been forced to relearn at the protocol level with webservices (which is thankfully drawing to a close). We already know that unconstrained interfaces do not work well across networks, yet web ontologies are supposed to be for internetworking.

Third, they're often temporal: modern web ontologies have no concept of a time varying relationship such as isBossOf; in fact RDF semantics explicitly excludes it, and RDF is the bedrock for any web ontology you're likely to come across. Web architecture also has no concept of time varying identity thanks to the way it assigns ownership of URIs. Cool URIs may not change, but cool URI owners do - frequently. This is compounded by the fact that the web architects would really like you to use the transient (in the ownership sense) URIs because you can type then into a browser to get more data. This transcience is less than ideal for semantic web and ontological uses. While it might smack of a badly layered stack, the point is that the architecture doesn't support it and you have to fall back on urging some kind of best practice or an architectural dictat. However, there are good technical reasons to avoid baking in temporal logic. The most important is that it's complex stuff. The nearest analogy that comes to mind is the reliable message in webservices - not having having it as core makes some people's jobs much more difficult and expensive but having it as core makes everyone pay a complexity tax whether they need it or not. So not sense of time in the semantic web core makes some sense. But no matter; the key issue is that the absence of time leaves us with significant data integrity and management issues to contend with - be wary that many facts will go stale and we have no interoperable means of expressing that yet.

So that's some of the downside. The upside is that in using ontology with computers can't possibly be worse than the in-code truth functions we use today. Legions of programmers are writing down things like isBossOf all the time (myself included). It's their job. Except they don't call them relations, or predicates, they call them methods and those methods capture what most of us call business logic. [Nor do they call themselves ontologists.] So it's pretty far from logic but good enough for business - until the time comes to change the logic where the cost of using all that code becomes apparent. It's a long held truism that we'd be all much better off if we could get such logic out of code. But it seems only so much of it can come out of code. We end up with the same design tension that afflicts web development; except this time it is separating code and relation instead of code and presentation. Today we place logic in code as we used to place markup inside code; tommorrow we might place code in logic as we place code in markup today. Either way it requires real discipline to keep the two apart and the system clean. Relational databases were supposed to do this for us and I suppose to some degree they do, but the modern SQL powered RDBMS is some ways away from its relational heritage - certainly, there's not much talk about databases as logical theorem provers anymore.

Ian Davis responds to Clay Shirkey's critique:

Despite all the obvious thought Clay put into his piece, he still managed to overlook the raison d'etre for the relationship vocabulary. Indeed it's the raison d'etre for all vocabularies. Without these vocabularies, incomplete and imperfect as they are, we would be mute in the machine readable web, unable to express ourselves in any meaningful way. You only have to look at the etymology to realise that vocabularies give you a voice. - Ian Davis

Certainly Shirkey's argument has less bite when you consider that the authors of the work he's critiquing are simply not they thinking in the naive way he imputes they do. Is Shirkey right that this is ultimately a pipe dream? Yes. But almost everyone writing these ontologies down knows this too. Such criticism is similar to criticizing a hacker for hacking because the Halting Problem represents a hard limit on the capability of a computer. But the hacker already knows this and is get something useful done anyway.

This is because Shirkey is wrong on one point - it is not the madness of our age. In the history of writing facts and relations about the world down, our age is perhaps the most sane. We have 20th Century mathematics and philosophy to thank. Arguably, more logical shibboleths were killed in the last 100 years than in the entire history of thought beforehand. Today's ontologists are quite sane and are usually painfully aware of the limitations they work with.

For my part, the raison d'etre of declaring such relations is reducing code complexity while increasing flexibility. In other words it's all about the Benjamins. This happens in two ways - we write less logic in the wrong languages and write more code in the right languages. One side effect of working more with declarative logic and ontology and less with systems languages is that it leans you towards alternative programming styles. Unfamiliar syntax aside, when you express business logic in largely unused languages like Prolog, Haskell (or functional Python), and then implement the same logic in the hugely poular VB, Java or C#, and hold to two styles side by side, you have wonder why we stay loyal to the latter languages. We can question the wisdom of an industry which flagrantly uses suboptimal tools for the job. Writing business logic in a systems programming language makes questionable economic sense and is something like driving into the future by shoving an exhaust pipe up a horse's arse - all filth no benefits. Ontology, especially targeted at middleware business logic makes a lot of sense in comparison.

Nonetheless the criticism is valuable because it keeps the non-geeks among us skeptical. We're an industry driven by hype. And while I'm not aware of anyone looking for a perfect language in web ontology, that's not to say other people who decide whether this stuff gets used won't succumb to delusional reports on their expressive and economic power.

If you are interested or invested in the ontology, service-oriented, data integration and social networking spaces and have not read Data and Reality by Bill Kent, you really should. Seriously. It is the best book ever written on data modelling with computers - at the time of writing this it is a quarter of a century old. Pertinent to this entry, it has the clearest explanations I've ever read on why philosphical discussions on ambiguity and meaning we so often disregard as pedantic nitpicking cannot be so disregarded when it comes to computers.

Reading list:

Data and Reality: Bill Kent
Knowledge Representation: John Sowa
Programming in Prolog: Clocksin and Mellish
Philosophy of Artificial Intelligence: ed Margaret Boden

[rem: the lifting]

March 17, 2004 03:09 PM


Trackback Pings

TrackBack URL for this entry:

Listed below are links to weblogs that reference Web ontologies: problems, benefits:

» Pros and Cons of Web Ontologies from Raw
An excellent post from Bill de hra, standing back and contrasting web programming with traditional languages and the declarative, ontological... [Read More]

Tracked on March 20, 2004 09:49 AM

» Relatedly yours from confectious
Reflecting on the classic relationship vocabulary flap of just last week, I am shocked, shocked! that no one (that I [Read More]

Tracked on March 29, 2004 09:29 AM

» Web ontologies: problems, benefits from HOLLOBLOG (ֺε)
Web ontologies: problems, benefits... [Read More]

Tracked on May 30, 2004 11:54 PM