« Lightweight Java | Main | Java unhinged »

MetaGraph: Domain knowledge v RDF

Here's a powerpoint on MetaGraph, a means of explicitly working with the investigation of relationships and hypotheses for biological experiments.

These folks went through the standard Object Relational two-step and found it wanting, because what they wanted to do was reason and hypothesise about relationships between data items, and not so much the data items themselves. They ended up with what AI folks might think of as semantic networks or conceptual graphs, and what Web folks might think of as RDF/OWL. After that, they threw in a query language, and an ORM persistence framework. I'm seriously impressed.

What's very interesting to me at the moment about this is the way they seem to have managed the dissonance between a general labelled graph model and the domain specificity of biology. I've hit just this problem with RDF, in an entirely different domain. The domain is events: namely arbitrary system and application events from multiple sources supporting a Service Oriented system. The events can be almost anything that happens below or in support of business level document exchanges - things like seeing a document pass by, touching a document, intrusion alerts, downed servers, log file analysis, any number of different warnings or failures, even something as mundane as as the result of an automated disk space reclamation or a db backup. While the exchanges are cleanly defined the ancillary data that results is usually not. Much of the data is low grade, has poor signal/noise ratios, or is incomplete. RDF so far, is appearing to be a good choice for packing up this kind of 'telemmetric' data into events. But it seems that, aside from the usual hurt that comes up around RDF v XML, there is a tricky balancing act that needs to be done between the 'ilities' provide by RDF statements and the immense value that comes from articulating your domain. I hope to have something more to say about this tension and what we're doing around it soon.

The other thing that is always fascinating about bioinformatics whenever I take a peek is the sheer volume of data these folks are contending with, in a decentralised, semi-standardised and by the looks of it, competitive, environment. The data sets are jaw-dropping - this is the world where Jim Gray ships the computers containing the data instead of using the Internet. The ability of the bioinformatics community to find a mix of hardware and software solutions as they go, without a whole lot of prior art, is formidable (bioinformatics also seems to be a driver for innovation in Grid technology). And yet, it's not so hard to imagine a future where we have to manage terabytes of data for a medium enterprise, and one day perhaps for ourselves. We, that is, Web, WS, SOA, REST types, spend a lot of time looking at what the likes of Google, Ebay and Amazon are doing to help us peer into the future; innovation in bioinformatics may also be useful in thinking about architectures to meet future demands.

August 21, 2004 12:24 AM


Trackback Pings

TrackBack URL for this entry: