Snowflake APIs

Speculation: for Data APIs in 2009 there will be two developments and one debate. All are centered around an important technical principle in web design - uniform interfaces (an idea that goes back quite a bit in distirbuted systems).

You are not a beautiful or unique snowflake.

First idea: putting links into API data. The REST community call this 'Hypermedia as the engine of application state' or HATEAOS. Yes. Worst. Abbreviation. Ever*. I tend to call it "links in content". Nonetheless the idea is simple - put links in your format data. Heavy Atom and HTML users do this already, almost subconsciously but a lot (most) proprietary data APIs fall down here and in doing miss out on a number of things. First are the network effects of being able pass along URLs. The very essence of a "web" is linking, to almost the point where operationally a "good" webapp is own that use plenty of links. Second are decoupling of clients from your servers -if you describe in your format both where links can be found and what their "type" is via a metadata qualifier you are free to refactor on the server side, relocate the server, introduce a CDN, whatever. For example, Atom qualifies links using the "rel" and "type" attribute in a way that will work for every web site on the planet.  Clients extracting links out of the data and constructing an absolute minimum of URLs are loosely coupled to server structures - URL parsing and generation being important coupling point in API design.  Third is simplification of client code - just pull out the links them, render them in the UI or call them using HTTP methods. You don't even need to design the link elements - steal Atom's link element or lace your current XML with "src" attributes that contains URLs.

For more detail on how links in content can work I recommend reading Mark Baker's "Hypermedia in RESTful applications" and Subbu Allamaraju's  "Describing RESTful Applications", both on Infoq.

Second idea: "standardisation" of feed metadata. Recently Dare Obsanjo and Dave Winer have blogged about inconsistencies in MediaRSS that complicate data consumption. Dave Winer on adding a phot site into friendfeed:

"I always assumed you should just add the feed under "Blog" but then your readers will start asking why your pictures don't do all the neat things that happen automatically with Flickr, Picasa, SmugMug or Zooomr sites. I have such a site, and I don't want them to do anything special for it, I just want to tell FF that it's a photo site and have all the cool special goodies they have for Flickr kick in automatically."

Dare Obasanjo:

"We have a similar problem when importing arbitrary RSS/Atom feeds onto a user's profile in Windows Live. For now, we treat each imported RSS feed as a blog entry and assume it has a title and a body that can be used as a summary. This breaks down if you are someone like Kevin Radcliffe who would like to import his Picasa Web albums. At this point we run smack-dab into the fact that there aren't actually consistent standards around how to represent photo albums from photo sharing sites in Atom/RSS feeds."

I went a few rounds last year with MediaRSS and have to agree I'd much rather have something as well specced as Atom is for syndication, or RFC5005 is for pagination. And it's not limited to media. Same goes for Geo Data (main criteria being, does it support WSG84?) , contacts, activity/events, representing arbitary site metadata, even Exif. Making things up or having to choose between competing formats is a real pain. There are two problems - where to place the data, because all the popular formats have sufficiently arbitrary structure that something like MediaRSS can appear in multiple places (the difference between Picasa and Zommr as Dare outlined in his post) and how to notate it (the difference between MediaRSS and Smugmug again as Dare outlined).

If you are a semwebber around long enough to remember the syndication wars, you will be having a good old chortle as this problem is arguably solved better by RDF/XML than any syndication or markup format. It's an interesting turnaround, since one of the arguments against RDF adoption for syndication back then was that clients and servers had common internal object models for syndication data and thus a formal model on the wire didn't matter that much - the parse/lex layers could switch. Extension metadata it seems is a bit different - varaibility has a cost.  Whether you agree or not re RDF, if this impacts you, having a look at how RDF or even RSS1.0 modules work get described and how they are supposed to be parsed into a data structure is no harm at all.

RDF is worth learning for a different reason — the profound enlightenment experience you will have when you finally get it. That experience will make you a better format and data API designer for the rest of your days, even if you never actually use RDF itself a lot. (You can get some beginning experience with RDF fairly easily by writing and modifying simple files like FOAF and DOAP for social networks and software projects, or RDFa extensions for XHTML.)

The debate: should there be that many custom formats? Via Kevin Marks and Aristotle Pagaltzis, I came across the "precious snowflake" analogy for APIs which to me describes the situation perfectly both across hundreds of websites and within content domains such as geo/contacts/media. Here's Aristotle:

" There are a lot of good existing choices once you get over the idea that your domain is a unique and precious snowflake."

There are probably hundreds of publicly available APIs today, all different, each their own "SiteML", and you have to be able to mash them all. The big but smart companies, such as GOOG and MSFT that have application suites and not just individual web silos have adopted common syntax, posting and extension models that allow for consistency and evolvability over time - individual API offerings might seem suboptimal and indirect, even obtuse, but the overall product portfolio makes a ton of sense - as well as lowering consumer costs it allows them to ship client APIs with less hasssle. This is basic platform and product architecture - reduced variability at one layer allows for increased offerings with lower costs at higher layers. Standalone web properties just don't do this today; each individual API is like a precious snowflake, but being in the snowball business is expensive, and so is keeping that snowflake preserved (when you designed that API did you think about encoding, escaping, empty v not-present, namespaces, timestamps, bidi, versioning, extensions, content-negotiation, cacheability, required v optional, new formats, input sanitation? Didn't think so ;). This creates a new market for web integration providers such as Friendfeed and Gnip ("making data portability suck less") or silo publishing providers such as Mashery. We call them aggregators in the web consumer space but when you get to scores of providers it effectively requires the "EAIfication" of mashups, or if you prefer, the introduction of Value Added Networks (VANs) for consumer data. Others like EBay and sf.com seem to have become subject to X.Y.Z versioning issues which are maintenance nightmare* (I find these tend to beassociated with SOAP style processing models - YMMV). So how API families like DISO and OpenSocial, or specific formats like Portable Contacts, Activity Streams and Atom Media Extensions develop will be important this year. That or we start taking microformats and RSS/Atom/JSON extensibility a lot more seriously than we do today, or the number of APIs will soon be in their thousands.


 
*  X.Y.Z for software binary compatibility, sure, but X.Y.Z in data formats is arguably missing the entire concept of web data APIs - when clients are out of your administrative control, lockstepped upgrades are practically speaking, impossible.

Tags:

    tags:

6 Comments


    "...reduced variability at one layer allows for increased offerings with lower costs at higher layers."

    It's a very good point (alluded to in your "Magnificent Seven" post as well). While much activity revolves around things like activity, social network apps, etc., there is so much to be gained from a properly RESTful view of the web (guided, I'd say, by the example of Atom/AtomPub) at more mundane levels: simply considering how any web page might be "syndicated" forces a real separation of content from presentation and at least opens the possibilty of a write-back story where there previously had been none. It make perfect sense for the digital library/open access repository world in higher ed. (the area in which I work), yet we are still not quite getting that yet. And while we think of Facebook/MySpace as the "use cases" for social platforms, the opportunities they open up in the areas of education (which is, or should be, all about creating new means of access for and repurposing of content) could be tremendous.

    As far as extensibility goes, my own preference would be to see how far we can take simple Atom itself, allowing it to work as it does already in browsers and feed readers (of course we *do* need a solid media extension), and start looking to atom:category as a built-in extension point for all sorts of RDF-ish possibilities. I prefer a controlled explosion of diversity there (based on @scheme "ontologies") than an uncontrolled explosion of extensions/namespaces that may or may not be commonly adopted.


    Great post. re: "RDF is worth learning for a different reason..." The same goes for REST.

    Of all the topics I've explored and experimented with over my time working on web applications those two have had the greatest pay-off in terms of changes of my development as an architect and an engineer.


    ok first of all that is the best abbreviation ever and the only thing that would improve it if we had
    Hypermedia As The Engine of Uniform State
    instead.


    Links In Content - best acronym ever! You can tell people to LIC it.


    Bill,

    Very nice post that brings up a few thoughts -

    * Atom is a powerful format for four reasons: it recognizes that linking exists for reasons that have nothing to do with web page display, it provides a consistent mechanism for providing categorization, it can serve as a transport protocol for content as well as being reflective of content summaries, and it is fundamentally tied into a RESTful view of the world.

    * I ultimately expect AtomPub to dominate as the de facto publishing mechanism by 2014, following a Cambrian extinction event of nearly all other publishing APIs that is already underway. AtomPub is still rough around the edges, but I also see it being quietly adopted at all the right places. The emergence of XML Databases and XQuery will have a lot to do with that as well, and that will unfold over the next two years.

    * Prediction for 2010 - RDFa + Atom will be established as a W3C working group. The central problem of mixed namespace content (such as GeoRSS) is precisely as Dare described - there is no clean or effective mechanism for either discovering an ontology or processing it. One possible solution to this is to establish an ontology link to an RDF/OWL description at either the feed or the entry level of an Atom document and then treat domain specific extensions on Atom as being discoverable rather than fixed. If functionality exists within the user agent for a given extension element, then that functionality is invoked. If not, then the discover process at least provides a consistent mechanism for establishing the meaning and documentation of the elements in question, and possibly would include a transformation, service or similar resource to generate the appropriate response. This becomes preferable to ignoring them outright.

    * Until SemWeb recognizes that syndication feeds will end up carrying the bulk of the services within the next decade, it will continue to be a voice in the wilderness. They two HAVE to be linked in some fashion, because the bulk of all semantics on the web is increasingly being tied up in syndication, not web pages.

    Anyway, once again, you have a very thought-provoking post.


    @Rob "You can tell people to LIC it."

    The original was Links in Content are King :)