« Lisp | Main | Sigh »

RDF - schema versioning and data typing

One of the advantages of storing an RDF representation in an RDBMS is that you'll never (hardly ever?) need to make a schema change in the RDBMS - because the domain is not represented using tables - tables are solely used for storage of RDF triples.

Your Mileage May Vary

Using RDF storage provides flexibility at the domain level. Altering tables isn't needed because RDF, being a graph based, is naturally additive. Instead keep adding new rows, where every row represents a link between two nodes in the graph. The downside is the number of rows you'll have to manage will explode; depending on the size of the datasets you're working with this might not matter. My (somewhat anecdotal) experience with RDF is that datasets in the order of 106 and greater aren't uncommon and that you should budget for an order of magnitude increase in terms of the number of rows required for the domain storage compared to an entity relational approach.

Dissonance

It's an interesting question whether using RDBMSes to store RDF counts as some form of abuse, or bad engineering. RDBMSes were after all designed to support relational algebra, not RDF's model theoretic semantics (when you do the math, you find the math are different). That said a number of relational experts point out that RDBMSes don't implement relation theory properly anyway. The mismatch between RDBMS and RDF is similar to the mismatch between RDBMSes and OO (collections of objects being graphs as well). This doesn't bode well - ORM, yow. However most of this mismatch occurs when the graph data is shredded across the domain's tables and roundtripped in and out of the database server. If the RDF store is using an RDBMS primarily as a storage and indexing mechanism for graph structures rather than mapping onto domain specific entity tables (Users, Cars, those sort of things), the dissonance is lessened, and you're left with a straight-up engineering matter (getting the RDBMS to perform CRUD efficiently) rather than a domain modelling/mapping one.

Complexity Conservation

One last thing to consider is that where you gain in structural flexibility you might lose in developer convenience. Consider Ruby On Rails and Django. One reason cited for the immense productivity of these stacks is the dynamic and flexible nature of the underlying languages (Ruby and Python). Part of the productivity boost is also is coming from leveraging the 'static' types of database tables (or put another way, when you take away the backing databases these frameworks have less to offer). When RDF is stored abstractly on an RDBMS, the type information that could be derived from entity tables is lost. There's an argument to be had that not having this table metadata around will makes the automation of things like forms generation/capture and validation trickier (and perhaps intractable). With the exception of some RDF/XForms related work by the folks at Copia and maybe Danny Ayers, I don't know if the RDF community has looked at this, much of the focus lately has been on query support through SPARQL.


December 14, 2005 08:05 PM

Comments

Dan Sickles
(December 15, 2005 03:58 AM #)

Lynn
(December 15, 2005 10:59 PM #)

Certainly it's easy to store RDF triples in a relational database. But isn't there also some use (some application) for the data? And if so, isn't there some relational structure other than RDF triples that would be preferable to RDF triples?

Inotherwords you can't have your cake and eat it too: if you're only interested in RDF (and the application is only incidental and maintained in the code) then by all means use an RDBMS to store RDF triples.

If, on the other hand, you have an actual application to write, you would best use the standard techniques for defining your application's (non-triple) relations [tables] and then use a separate step to map the applications' relations to/from RDF.

Bill de hOra
(December 16, 2005 12:14 AM #)

Lynn, my thinking on this is where RDF data is available, mapping triples onto RDBMS 'domain' tables is something of an optimisation. If you had a triples table filled with FOAF people and DOAP projects, you could map those onto the tables person and project with maybe a user_person association table. The point is, with the exception of the tables' type metadata (which strictly speaking isn't about the domain), there isn't any more information to be had about the domain items by using entity tables, but you will probably be able to apply improved indexing strategies when you use an RDBMS in the standard way. You might also lose flexibility compared to the table of triples approach.

Matthew Gertner
(December 22, 2005 07:38 PM #)

My experience with fully generic SQL schemas is that they universally provide miserable performance in real-world applications. You basically have to map into a more domain-specific database schema. We use RELAX NG schemas to describe our domain objects and generate the SQL schema automatically. To my mind this is an optimal approach since I feel that domain-specific schemas of some flavor are a necessity anyway (for enforcing data types and much more). The mapping code is non-trivial but it isn't rocket science either.

Laurent Szyster
(December 24, 2005 03:04 PM #)

RDBMS are indeed at odd with RDF.

By releasing the constraint on the RDBMS schema to a triple, the burden of optimization is transfered to the RDF intermediary between an relational database server and an HTTP network client, where it does not belong.

Such tiers exhibit terribe SPARQL performances and will allways be orders of magnitude below RDBMS to do what SQL servers have allready been optimized for since the 70's: access a database using a simple relational algebra and enforce a declared application model.

If you are searching for something completely different than RDF and SQL servers to store metadata, then have a lool at Allegra's metabase peer:

http://laurentszyster.be/blog/allegra/

Regards,