« Apples and Oranges | Main | Team Development in Plone »

Automated mapping between RDF and forms, part I

Let's start with some FOAF, that shows some issues around roundtripping RDF data through a web form.

Foaf only has one property for a phone, namely foaf:phone. It has like, much values for IM, but not for phones:

  <rdf:Description 
    rdf:about="http://example.com/person/elvis"> 
    <foaf:phone   
      rdf:resource="tel:987654321"/> 
    <foaf:phone   
      rdf:resource="tel:123456789"/>     
  </rdf:Description>

So how do you distinguish between your home, work phones and 2 mobiles? No worries, let's type annotate these little jokers:

  <rdf:Description 
    rdf:about="http://example.com/person/elvis"> 
    <foaf:phone   
      rdf:resource="tel:987654321"/> 
    <foaf:phone   
      rdf:resource="tel:123456789"/>     
  </rdf:Description>
  <rdf:Description rdf:about="tel:987654321"> 
    <rdf:type 
      rdf:resource="urn:mobile"/> 
  </rdf:Description> 
  <rdf:Description rdf:about="tel:123456789"> 
    <rdf:type 
      rdf:resource="urn:work"/> 
  </rdf:Description> 

In psuedo-excel that looks like this:

  foaf:person, foaf:phone, tel:987654321
  foaf:person, foaf:phone, tel:123456789
  tel:987654321, rdf:type, urn:mobile
  tel:123456789, rdf:type, urn:work


Sorted. But not for forms.

Suppose you want to send down those two phone values to a form. You can divvy them based on the RDF type annotation (this one's a mobile, that one's for work) without too much trouble. This is what the form fragment might look like:

  <p>  
    <label for="work">work:</label> 
    <input size="25" value="tel:123456789" 
      type="text" name="foaf_phone_work">
    <input type="checkbox" 
      id="delete_foaf_phone_work" >delete</input>
    </p>
    <p>
    <label for="mobile">mobile:</label> 
    <input size="25" value="tel:987654321" 
      type="text" name="foaf_phone_mobile">
    <input type="checkbox" 
      id="delete_foaf_phone_mobile" >delete</input>
  </p>

There's some name munging ("foaf_phone_mobile", "foaf_phone_work") as you can see, but that's ok.

But on the way back if you have more than one foaf:phone value, things get tricky. That's because after the reversing the name munging and mapped the "foaf_phone_mobile" back, the way you're going to update a value typically is to find it first by matching against the subject/property and wildcard the value, thus:

  foaf:person, foaf:phone, ???

That would work for the typical case where you only have one possible property/value, but won't distinguish between the two phone property/values we have here. If you were using a delete/insert approach, the chances are you'd end up overwriting the wrong one, or worse, blowing one of the phone values out of the datastore.

So we need to use a richer pattern, after getting the form data back. Something like this that leverages the type annotations we declared would do it:

  foaf:person, foaf:phone, ???
  ???, rdf:type, urn:mobile

But. When you have two mobile phones you'll need even further name munging because the above pattern isn't sufficient to pick out the right phone anymore.

Generally you'll end up doing something like this:

  <p>  
    <label for="work">work:</label> 
    <input size="25" value="tel:123456789" 
      type="text" name="foaf_phone_work_14387975">
    <input type="checkbox" 
      id="delete_foaf_phone_work_14387975" >delete</input>
    </p>
    <p>
    <label for="mobile">mobile:</label> 
    <input size="25" value="tel:987654321" 
      type="text" name="foaf_phone_mobile_3434535">
    <input type="checkbox" 
      id="delete_foaf_phone_mobile_3434535" >delete</input>
  </p>

If you don't, one day the form will blow up the data. Normally you'd manage the roundtrip through the web tier. That means there's a hashmap somewhere tying up the RDF with the name "foaf_phone_work_14387975". The other way to do this, if you don't minding hitting the storage in the interim, is to write the name value to the datastore first and relate it to the phone number:

  foaf:person, foaf:phone, tel:987654321
  foaf:person, foaf:phone, tel:123456789
  tel:987654321, rdf:type, urn:mobile
  tel:123456789, rdf:type, urn:work
  tel:987654321, form:bind, foaf_phone_mobile_3434535
  tel:123456789, form:bind, foaf_phone_work_14387975

Now we're looking for the pattern:

  foaf:person, foaf:phone, ???
  ???, rdf:type, urn:mobile
  ???, form:bind, foaf_phone_mobile_3434535


And that will be enough to get you back to the target data. It's verbose, but on the fact of there's good potential for automation without expanding too much your web toolchain and the number of data structures to have to manage.

Wrap-up

So, here are the takeways:

  1. Round tripping RDF to forms and back is tricky, not as simple as RDBMS backed data. When the RDF data looks like a hashmap where keys are non-unique you will have some work to do. With an RDBMS your psuedo excel for a phone would just be one long row (as opposed to lots of little rows) so you'd be roundtripping based on a row key.
  2. Any RDF vocab that allows multiple values for the same property without support for further qualification on the data isn't going to give you enough information to roundtrip with a form. You'll want to do some kind of extra type annotation in that case.
  3. You can only really do the RDF type annotation trick sensibly if the value is itself a URI, as with the phone numbers shown here. If you are working with literals things will be more complicated.
  4. Anywhere you end up using a hashmap to manage bindings in the web tier, you can manage those bindings as more RDF those and thus keep your form engine code as generic as possible. RDF graphs being pretty much hashmaps on steroids.
[update] I see Laughingmeme taglined this post as follows "The problem with RDF? Even something as fundamental to webdev as round tripping to a form is hard" Even with databases or objects, there and back again still requires plenty of manual mapping between form controls, form handlers and persistence mechanisms. Frameworks like RoR and Django show how this can be further automated - but it's not a done deal. The question with RDF/XML is whether it's too flexible to be automated cleanly for forms building.

August 17, 2005 07:33 PM

Comments

kellan
(August 18, 2005 02:15 AM #)

I did also say I'm looking forward to part II, :)

Bill de hOra
(August 18, 2005 01:27 PM #)

Indded you did, so I toned it down a bit :)

Ian Bicking
(August 19, 2005 05:17 AM #)

Maybe formencode.variabledecode would be of use to you.
It packs and unpacks nested ordered lists and dictionaries into the flat set of keys you put in a form.

I think for your example that means you need a stable ordering of some of the tuples. Or you'd have to have a previous_value hidden field or something.


I don't think an RDBMS is that much easier, except insofar as you have primary keys for every row/tuple. Primary keys are awesome ;)

chimezie
(August 20, 2005 08:53 PM #)

I honestly don't think there ever will be an elegant solution for round tripping raw RDF directly to and from a form (unsuccessful attempts have been made: http://www.markbaker.ca/2003/05/RDF-Forms/).
Vanilla HTML Forms are built from the ground up to be associated with name/value data models, RDF is a knowledge representation format.

Even if XForms are employed (quite an improvement as a UI framework) there is still an impedence mismatch between the underlying data model (XML) and RDF 'in the wild', especially when the data is persisted natively as RDF.

I faced this problem at work while trying to devlop a dynamic User Interface that adapts to the underlying datamodel. The solution that worked for us was a little unexpected. Basically, if you associate a static transformation (XSLT) between an XML representation of RDF content (could be as isolated as foaf:Person chunks or as complete as the whole FOAF graph) and the RDF/XML serialization, you can predictably fit an XForm ontop of the XML data and serialize to RDF (and into the RDF store) when you are finished manipulating the data. Essentially, XML acts as an intermediary communication medium between the Forms (XForms) and the datastore (RDF). I plan to write more about effective this can be (in closed systems) beyond simply extracting RDF content from microformats embedded in XHTML (via GRDDL).

Danny
(August 22, 2005 11:14 AM #)

What I'd find helpful here would be more compare & contrast of how this specific problem would be solved using a traditional RDBMS.

I suspect there may be several intertwingled issues at play: for a start there's the general data structure, n-ary relations vs. triples/graph. (If you need to do a lot of joins then the two get closer in terms of difficulty). Then there's the handling of the get/set operations, which can happen directly against an SQL store or through OO languages get/set, compared to the accumulator open-worldiness of RDF. (The question of transactions is probably best left until later ;-)

The underlying difficulty here isn't necessarily in mapping the RDF model to HTML forms, but mapping the domain model to HTML forms. Kind-of the difference between lots of individual cases and the general case. Round-tripping RDBMS tables to table-like HTML (name/value pairs or whatever) is easy in part because you know the schema ahead of time and can structure the HTML accordingly.

I'm not sure, maybe in something like your example, an intermediary tier could map between the triples and n-tuples of a known shape. Inserting a get/set method facade (including some kind of null/delete) could make the triplestore look like an OO-fronted RDBMS. Heh, this could lead to the slightly perverse situation of a form going to a SQL-like interface mapping to an RDF model stored in a SQL DB...

Approaching from a more declarative angle (in a similar fashion to Chimezie), I've had some success using an intermediary XML format - specifically the (already pretty tabular) result sets from SPARQL followed by XSLT.

From several thousand feet the problem looks the same whether the backend is a regular relational DB or RDF. Perhaps one reason the former is easier in practice is the SQL sugar and all the laxness Date & Pascal gripe about.

I would guess for a system like Rails/Django (about which I know virtually nil...) what's needed is just a different bunch of code generators, a simple matter of programming. That a lot of form hard-coding can be pulled out into declarative data is demonstrated quite nicely in Longwell:

http://simile.mit.edu/longwell/guide.html

Bill de hOra
(August 22, 2005 08:03 PM #)

"The underlying difficulty here isn't necessarily in mapping the RDF model to HTML forms, but mapping the domain model to HTML forms."

That's just pushing things about. I can retort that the underlying difficulty here isn't necessarily in mapping the RDF model to HTML forms, but mapping the RDF to the domain model. I've talked about this before, and if RDF is going to see broad adoption behind the firewall it needs to be addressed. There's no obvious upgrade path from OO and RDBMS backed systems to RDF backed ones without it.

The general case that RDFers work with is smushing, but smushing algorithms aren't useful in places where RDBMSes and OO are dominant. They lack fidelity.

"From several thousand feet the problem looks the same whether the backend is a regular relational DB or RDF."

With enough feet all problems are shallow. The question isn't whether the problem is different, the quesion is how do the technologies support solutions.

"Perhaps one reason the former is easier in practice is the SQL sugar and all the laxness Date & Pascal gripe about."

The former is easier because you can establish the type and range of any RDBMS column at runtime and providing mappings exist from column types to form controls the row data can be projected onto onto a form. That's essentially how RoR and Django do their magic and how general case OR mappers work when the projection target is an Object.

You also have useful default domain chunking in terms of the rows - with RDF graphs you have to explicitly acquire the subgraph. RDBMSes don't get of scot free here tho'. When rows have foreign keys, determining what to pull back can be tricky. For example, a lot of the nitty gritty in Hibernate comes down to managing lazy loading of the object graph. You really do need to profile system usage to figure out what to do.

Incidentally I notice that Chemezie thinks this issue is intractable and that has something to do with RDF/XML. But I'm seeing this issue at the graph level, not in the serialization. I suspect we're talking past each other.

That doesn't address the issue of how to cleanly handle something that has multiple same properties hanging off it (cf the phone numbers) .

"I would guess for a system like Rails/Django (about which I know virtually nil...) what's needed is just a different bunch of code generators, a simple matter of programming."

It's not a SMOP, it's a data modelling issue. Do the generators have enough information to work? That's the question. At the moment my answer is - without further type annotation, no they don't. RDF without extra constraints you get for free in domain model puts you in the position of trying to build an Any To Any machine.

Btw, I see nothing in Longwell that makes me optimistic. Longwell's a browser, this problem is to do with writing and updating exisiting data.

Chimezie
(August 22, 2005 09:28 PM #)

I'm not so sure we were talking past each other, but I probably should have elaborated on how XForms->XML Instance->RDF Graph would solve your problem with dealing with multiple properties:

Consider your RDF/XML serialization of the FOAF graph with multiple phones:


rdf:RDF
rdf:Description -rdf:about="http://example.com/person/elvis"
foaf:phone -rdf:resource="tel:987654321"
foaf:phone -rdf:resource="tel:123456789"
rdf:Description -rdf:about="tel:987654321"
rdf:type -rdf:resource="urn:mobile"
rdf:Description rdf:about="tel:123456789">
rdf:type -rdf:resource="urn:work"

Consider the following more compact XML representation of the graph:

rdf:RDF
rdf:Description -rdf:about="http://example.com/person/elvis"
foaf:phone
vcard:mobile -rdf:about="tel:987654321"
foaf:phone
vcard:work -rdf:about="tel:123456789"

Assuming vcard is bound to the appropriate nsUri (http://www.w3.org/2001/vcard-rdf/3.0#), this is actually an RDF/XML serialization of the same graph, but it takes advantage of using the node qnames to represent rdf:type relationships (for the phone types). You can then (in XForms) bind an xforms:input to all children of foaf:phone nodes (XPath: /rdf:RDF/rdf:Description/foaf:phone/*) to pick them both up distinctly:

xforms:input -ref="/rdf:RDF/rdf:Description/foaf:phone/*"

This works because 1) it takes advantage of a more compact (hierarchical) serialization of the same graph and 2) XForms' mechanism for binding controls to nodes in an XML document is very precise (it's driven by XPath).


Ofcourse, the assumption is that the XML instance the XForms works ontop of has a predictable form (hence, the disadvantage of working with raw RDF/XML serialization of a graph which is completely amorphous - as was your original serialization).

This only works in a 'closed' system where the underlying FOAF graph is driven *soley* by the XForm and the rigid XML representation (this is where the XSLT transformation to RDF/XML comes into play, allthough it isn't neccessary here as the more compact serialization should still parse to the same graph).

Danny
(August 23, 2005 02:40 PM #)

I still get the impression this is comparing simple cases implemented with RDBMS with harder cases implemented with RDF, but it's not worth arguing over. I don't deny that HTML forms RDF roundtripping isn't straightforward.

"There's no obvious upgrade path from OO and RDBMS backed systems to RDF backed ones without it.".

That is a clear issue. Coincidentally I've got some day-job tasks to do that would be helped considerably by being able to do this kind of thing, so I'll call back when I've something to show...(first I was thinking of trying something not far from Chimezie's XForms approach, but tied directly to the XHTML a la microformats)

Bill de hOra
(August 27, 2005 11:02 PM #)

Chimezie: "This works because 1) it takes advantage of a more compact (hierarchical) serialization of the same graph and 2) XForms' mechanism for binding controls to nodes in an XML document is very precise (it's driven by XPath)."

Ok, I think I understand now, but it seems compactness is less important that uniformity. To work with XForms you need a single serialization (ie compactness is incidental) - it's precise as a function of the target's structural stability. The XPath args only work when the target XML is uniform .

So, all said, I think I'd rather use Versa than XPath as the mapping technology in this case.

Danny: "I still get the impression this is comparing simple cases implemented with RDBMS with harder cases implemented with RDF"

Having coded it both ways, I have to disagree with that, sorry.


All this firms up my dislike of RDF/XML. I really do think it's holding back RDF adoption. A properly constrained XML syntax would greatly help RDF adoption by making it easier to integrate with. Which I think in a roundabout way makes my point about excess flexibility.

Mark Birbeck
(September 21, 2005 09:03 AM #)

Bill,

The whole reason I got involved in XForms around 4 years ago, was because I was actually trying to develop UIs based on RDF Schema. As you have, I found that the two main solutions available at the time were inappropriate; using HTML on a server is difficult to manage, and of course requires a server! And using C++ or Java requires you to have a lot of logic in your application.

When I saw XForms I jumped in with both feet, since although it doesn't directly solve the problem, it has the potential to. For example, in XForms an input control that is 'bound' to an XML node of type date, renders as a calendar widget. If it's bound to a boolean node then it renders as a check-box. (Actually XForms is device-independent, but if I use 'GUI-centric' terms it makes it slightly easier to explain.)

Since this happens at run-time, it doesn't take a great leap to imagine having all sorts of clever widgets bound to specific data types--and that's what we have done in formsPlayer 2, a version of our XForms plug-in. We take the 'principle' from XForms to its logical conclusion--you can bind a colour-picker to a colour data type, a map to a data type of city...even a phone control to a data type of phone!

The main point is that the binding is defined much like you define a CSS rule, and it happens at run-time. (Actually it's even more flexible than it sounds, since there are two levels of indirection...the schema data type is connected to a form control by a rule, but an XML node in the data can be connected to a schema data type by a rule, too.)

In combination with this, the 'repeat' construct allows you to render as many controls as there are nodes in some XPath expression--so you don't need to know in advance how much data you have.

And one last feature that XForms provides to take this even further, is that you can even pick up labels and mouseover hints from an RDF Schema.

In short, although there are many other features that we could add to XForms to make the process of dynamic forms even easier--including of course, an RDF/XML parser--I know that even what we have now is a major step forward on what I was trying to do 4 years ago with HTML or C++.

An example XForm that manages a FoaF file and illustrates many of the above points is available in the topic TechniqueSchemaDrivenForms on the XForms Wiki.

Regards,

Mark


Mark Birbeck
http://www.formsPlayer.com/