« Jive Messenger 2.2 ships | Main | More database type switching »

Database type switching

Suppose you had an event object that had the following properties:

  class event:
    def __init__(self, what, where, when)

Now, in the above imagine that 'when' is a date and 'where' is a URI. But suppose in your domain the 'what' could be one of three types:

  • a URI: this means something well known enough that we have a pre-canned name for it.
  • a String: this means it's some kind of opaque literal, like a java stacktrace or hexed coredump, or maybe some chunk of XHTML.
  • some XML format: this means the event has a 'well-known' XML format that we know how to process, ie some SOAP thing or an Atom entry.

The interesting thing here is that these 3 types are disjoint; they're not specializations of some common form, modulo the crashingly trivial observation they could all be represented as text. In code or markup you'd probably have a some extra metadata about the 'what' that would allow you to switch-on-type For example, in XML that could result in different element names altogether for each type maybe or some @type attribute, or in RDF, it would an rdf:type annotation.

So given that the range of value types can be disjoint I was wondering what people think the idiomatic way to represent this event structure in an RDBMS would be. Have a foreign-key for the 'what' column pointing to 1 of 3 possible tables? Make the 'what' column a blob? Change the design not to have use disjoint types? Have the three what tables hold foreign keys back the event table? Or some other approach?

August 10, 2005 07:52 PM


Leslie Hensley
(August 10, 2005 11:46 PM #)

I'm liking this approach lately: single table inheritance.

Jimmy Cerra
(August 10, 2005 11:55 PM #)

I am not a database expert. I wouldn't use a blob, since that seems like overkill. How about option 1: "foreign-key for the 'what' column pointing to 1 of 3 possible tables"? Of course, that won't tell you what the type of 'what' is. In RDF, I'd use a NODE type and have the object be either a URIRef or a typed literal (definitely not rdf:type though).

Bill Seitz
(August 11, 2005 12:11 AM #)

I might have a whatType field to hold one of the 3 types, and a what blob field to hold the contents.

Or, maybe I'd split the latter into whatBlob and whatNonBlob (varchar 128?), which probably has some practical advantages if I need to query or sort without resorting to freetext plugins.

Or, maybe I'd make whatType, whatString, whatUri, and whatBlob fields in a single (sparse) table.

I hope that sufficiently restates the question... :)

Adam Vandenberg
(August 11, 2005 01:18 AM #)

Are you going to query against "what", or just process them when they come up in a query?

I'd either make it a string (with whatever long text field is in the DB you're using, since I assume you could have more than 255 chars), possibly a blob. Maybe a string and then a "kind_of_what" field that's a enum, if you didn't want to determine the field type by introspecting the contents.

Aristotle Pagaltzis
(August 11, 2005 04:36 AM #)

The clean, minimally redundant approach would be to use four tables, of which one is the “event” table which holds only when/where pairs and a primary key, and of which the other three are “what” tables whose the primary keys are simultaenously foreign keys to the event table. This way each datum can be stored in a properly typed column, without storing boatloads of NULLs as you’d have to if you did this with a single table having one column per type of value.

Unfortunately, this is stupidly costly to query – you need three left joins in every single statement. It is also stupidly difficult and costly to update – you need multiple interdependent statements grouped by a transaction. Worse, you need the primary key from the event table before you can update any of the what tables, so you have to chatter back and forth with the database instead of dumping bulk statements on it.

Given that the cleanly modelled approach is really useless in practice, there is no right answer to your question. Which compromises you need to make depends entirely on the kinds of queries you will want to run against your database, and you haven’t told us anything about that.

(January 28, 2007 08:47 AM #)

Thanks for your comment Aristotle Pagaltzis.

It confirms my original thoughts about my database. I have a design that has the same structure as you described and as you say, SQL is quite complicated and the tables are hard to update. But I don't have a choice, I need that structure.

At first I thought it was a bad design from my part. So your comment helped me.