« links for 2006-11-11 | Main | Over there, maybe »

Where to go from here

Danny Ayers left me a challenge: "I don't see an answer to 'so where do we go from here?'."

Well ok then - let's look at areas where the semweb community could induce adoption by elminating barriers. Please mind I'm using "RDF" loosely here; what I'm saying applies equally well in my mind to extended models, like OWL, or to vocabularies like FOAF and SKOS.

Databases, and efficiency. As this current kerfuffle began with relational databases, some work on the semweb side showing how to store RDF data relationally, in a time and space efficient manner wouldn't hurt. Or how RDF extension triples can be integrated into domain models. Storing RDF in an RDBMS is very inefficient relative to using domain/entity models. If it can't be done without making trade offs, that's is also useful information - nobody pretends storing trees in DBs is a slam dunk. In particular it's important to understand if there are upper limits on how many triples can reasonably be handled by an RDBMS (my sense is 5-8M records). It's one thing to speak of a web of data; it's another to say a web of data is going to be an engineering neccessity.

ORM and widget mappers. How can RDF serialized data be mapped in and out of HTML forms and databases? All modern db backed web frameworks do this via their ORM and widget mapping systems - it's one of the reasons they're insanely productive. Indeed you can make an argument to say that improvements in ORMs and automated form mapping are the most important advances in web technology in the last half-decade - not AJAX, not REST awareness, not compliant CSS engines, not even syndication. But you can't do any of that with RDF as unlike HTML forms and SQL tables, it doesn't come with a useful type system. Try to map arbitrary RDF in and out of HTML forms if you don't believe me. Even constrained formats like DOAP can be impractical to work with, and tend to result in point solutions. By comparison, stacks like RoR and Django are ridiculously easy to work with for arbitrary entities. Plone's model of content, called Archetypes is even more sophisticated - every content object in a Plone system gets view and edit handling for free due to its Schema and Widget designs. None of these are using RDF, and as far as I can tell they couldn't without a massive loss of functionality.

Tagging How RDF can add value over and above Web2.0 tagging schemes. Indeed, can it? If only I had a cent for every time I heard a semwebber point out how RDF is much better than blog categorization, atom:category, tag clouds. Yet no-one, statistically speaking, is using RDF for these things. Danny said recently: "A handful of metadata fields attached to a blob? We can do a *lot* better than that." Blobs with property values, in the form of Atom and ID3 and EXIF, and are creating more value on daily basis than RDF has done in an entire decade. A huge amount of value in social software and mashups are driven by blobs with metadata - also known as tagging. Blobs with property values are *insanely* useful. Just explain how RDF makes blobs better (clue: relating blobs to each other).

Integration: Above there's need to be a story on how RDF/semweb, can integrate with existing commercial technology. Phased deployment with RDF is too difficult. A key reason for this is that the official format, RDF/XML does not round trip due to the allowed variation in its syntax, which makes it inaccessible to other tool-chains, unless they become RDF/XML parsers. Most commercial work with data is fundamentally based on processing tool-chains. People shunt data from system to system, back and forth, and change the formatting as they go along. This is absolutely fundamental in the industry sectors I work in, and on the web itself. The cost of making all the various stages aware of RDF/XML aware is highly unlikely to make economic sense, and neither is deploying RDF toolsets end to end. Hence RDF/XML remains largely undeployed where it could in theory be valuable. If you're going to have deploy RDF/XML in toto, well a lot of people won't see the investment value no matter how well made the ROI case is made - just use a homegrown XML vocab that is syntactically static and can be transformed, or a standard something that is relatively consistent and can have the semantics layered on via scripting, like Atom.

Anything else is boiling the ocean. And while you can, and the semantic web community frequently do make the argument that there are overall cost benefits to be had, anyone with an iota of experience delivering production systems will know that expanding the scope of a project in that manner makes the project more likely to fail. If the technology can't be deployed organically, that's the technology's problem, not the ecosystem's.

Syntactic stability: in the last 18 months, I've become convinced that RDF is almost ideal as a backup format for semi-structured content. Well, not the content itself but the content metadata, and specifically relationships between content. Once your software systems internals are instrumented to identify each content item using a URL, associations like parent-child, translations, labelling, permissions statements, almost any kind of index, are prime candidates for RDF serialisation. The path based notations of the JCR or Zope aren't adequate as identifiers, nor are the database primary key identifiers used in blogging apps and Web/CMS frameworks. XML while good for raw content intrinsically doesn't support relations, and the kinds of guarantees you get from RSS/Atom or a custom/private format are weak sauce at best. Now, assuming you can instrument the data with URIs there's a big big opportunity to use RDF in an operationally critical part of a system. You can also in principle 'publish out the back' - by giving peole your backups for syndication or warehousing.

Again, the problem here is that the XML syntax doesn't roundtrip in and of other RDF systems; it also means you can't safely merge backups from multiple systems without being fully committed to an RDF/XML toolchain or there is a good chance either your marshallers or your incoming parsing layer will break. This, along with toolchain integration are prime arguments to revisit the XML syntax.


November 12, 2006 12:18 AM

Comments

Chimezie
(November 12, 2006 03:53 AM #)

Good poinits..

some work on the semweb side showing how to store RDF data relationally, in a time and space efficient manner wouldn't hurt.

I've actually managed both (time and space) with a rather lean relational model of RDF (and Notation 3)

Storing RDF in an RDBMS is very inefficient relative to using domain/entity models.

RDF pesistence is no different from persistence of DAG's in general and there are many models which work from classic Graph theory. The problem is that most current relational models for this purpose aren't tailored for RDF's abstract syntax specifically

my sense is 5-8M records

This is way lower than what current RDFMS can handle. I've been able to push 20M with the above model with very reasonable cross database response times.

This number is also misleading as most RDF modeling doesn't take advantage of DL shortcuts to reduce the amount of redundant triples. The reality is that with proper modeling that upper limit should never be reached.

How can RDF serialized data be mapped in and out of HTML forms and databases?

Haven't we had this conversation on this blog before =). GRDDL and rich web backplane architecture is well suited for this. Single-purpose, concrete XML syntaxes, document processing (removed from the well known disaster of RDF) serialization coupled with a mechanism to get the RDF out.

Now, assuming you can instrument the data with URIs there's a big big opportunity to use RDF in an operationally critical part of a system.

Bingo, Bill. You hit the nail on the head. This is where I think there is great oppurtunity to leverage both XML and RDF (seemlessly) for CMS. Especially when the level of granularity is syncronized at the document and (named) graph level - think ACL instead of policy aware-based security (which makes more sense with the semantic web than with CMS).

Again, the problem here is that the XML syntax doesn't roundtrip in and of other RDF systems

Full roundtripping (read: leave out RDF->XML) isn't neccessary if XML is the primary interlingua and you have a well designed mechanism for extracting RDF from the interlingua.

Though, to be frank, any progress depends on being honest about how much damage RDF/XML has had to sem web adoption. Freeing the abstract syntax from a concrete syntax solves *most* of the problem

ix
(November 12, 2006 09:26 AM #)

Databases, and efficiency: relational DBs need to account for a variety of combinations of database/table/column topologies. triple-stores only need to account for one. triples, and optionally graph contexts. it certainly narrows down the problem space quite a bit so im not sure why you consider it a big deal. i can think of at least 3 products claiming > 1 Billion triples, and im sure google would scale that to 65 billion if there was that quantity of RDF for them to chew up to enhance their search.

ORM and widget mappers: having written and used RDF ORMs i can say that theyre indeed simpler than those built on relational DBs, since RDF has the notion of classes, inheritance, attributes already. its building an ORM on top of meta-oo, instead of on top of CSV files. the normalized schema also presents more opportunities to reuse components without coding the 'last mile' as one would in a Rails app, for example.

Integration: it is true that W3 has presented an entire stack which is only one of many stacks out there. i think its good that theyve presented an entire layer cake; I myself am blissfully unaware of XML, SKOS, OWL, JSON, JAVA and XSLT which make up many of their member organization's fledgling tools, but find many parts useful. there are tons of papers on SQL/XML mapping and i believe the open source nature of the schemas saves time of reinvention for a single-organization's needs.

Syntactic stability: havent noticed this one. there was a RDF file out of hundreds that raptor couldn't import; it was from 1999. this issue is affecting me much more on ruby 1.8/1.9/2.0

Bruce
(November 12, 2006 01:46 PM #)

Just so I understand your position on the model and syntax: you'd argue for dumping reification, and tightening up the syntax by, say, not allowing properties to be encoded as attributes? That sort of thing?

Am struggling a bit on how to balance this with the OpenDocument metadata work. Constraining the model and syntax per above certainly has benefits for more XML-oriented workflows, but with other costs.

Bill de hOra
(November 12, 2006 04:38 PM #)

ix:

"triple-stores only need to account for one. triples, and optionally graph contexts."

Most triple stores are running on top of RDBMSes, afaict. I'm not sure how this is relevant.

"i can think of at least 3 products claiming > 1 Billion triples"

Great. What's the seek time under concurrent usage?

"im sure google would scale that to 65 billion if there was that quantity of RDF for them to chew up to enhance their search."

Seriously, who cares? Google does text indexing. If that's all there is to RDF, all semweb related research can safely stop now.

Bruce:

"dumping reification"

Absolutely. I don't think anyone knows what reification means in RDF.

"tightening up the syntax by, say, not allowing properties to be encoded as attributes? That sort of thing?"

I mean eliminating the optionality; the RDF/XML syntax is about 3.5 syntaxes munged together. It's horrible to deal with outside a closed RDF environment - I argue with Danny about this twice a year. he keeps saying it's not been a problem for him; that makes me fairly sure he's working on green fields with a closed stacks; certainly he's not in my world. I think if i could assume "just RDF", things would be ok; but I can't. I would liken RDF/XML issues I encountered to dealing with Unicode - great idea, but the implementations suck, especially if you try and use them in concert. It's the kind of death by a thousand cuts syndrome that doesn't show up on the whiteboard.

The WG had an opportunity to clean this up years ago, and wouldn't do it under the constraint of a "bug-fixing"charter - under the same charter the RDF model had a ground up rewrite. Go figure.

James Governor
(November 17, 2006 11:20 AM #)

show me something cool, or provide code that web developers want to use. thats the challenge for semweb. no revolution succeeds from the top down.

Bill de hOra
(November 17, 2006 03:20 PM #)

"no revolution succeeds from the top down. "

James - yep. That's the "if the technology can't be deployed organically, that's the technology's problem, not the ecosystem's." bit.

The semweb I think will be Cool-As-In-Swarfega, not Cool-As-In-Wow - by keeping the grime off you as you work. But the toolchains won't open up until semwebbers treat syntax as first class problem, like the microformats crowd do.

Then again, these guys are all over RDF:

theveniceproject

That looks cool. Finding TV is Teh Suck.

Btw, I do honestly think RDF metadata, syntactically appopriate, and designed to work with current systems not outside them, can drive out mucho waste in the enterprise.

Post a comment

(you may use HTML tags for style)




Remember Me?

Trackback Pings

TrackBack URL for this entry:
http://www.dehora.net/mt/mt-tb.cgi/1981