Extensions v Envelopes

Here's a sample activity from the Open Social REST protocol (v0_9):

<entry xmlns="http://www.w3.org/2005/Atom">
   <id>http://example.org/activities/example.org:87ead8dead6beef/self/af3778</id>
   <title>some activity</title>
   <updated>2008-02-20T23:35:37.266Z</updated>
   <author>
      <uri>urn:guid:example.org:34KJDCSKJN2HHF0DW20394</uri>
      <name>John Smith</name>
   </author>
   <link rel="self" type="application/atom+xml"
        href="http://api.example.org/activity/feeds/.../af3778" />
   <link rel="alternate" type="application/json"
        href="http://example.org/activities/example.org:87ead8dead6beef/self/af3778" />
   <content type="application/xml">
       <activity xmlns="http://ns.opensocial.org/2008/opensocial">
           <id>http://example.org/activities/example.org:87ead8dead6beef/self/af3778</id>
           <title type="html"><a href=\"foo\">some activity</a></title>
           <updated>2008-02-20T23:35:37.266Z</updated>
           <body>Some details for some activity</body>
           <bodyId>383777272</bodyId>
           <url>http://api.example.org/activity/feeds/.../af3778</url>
           <userId>example.org:34KJDCSKJN2HHF0DW20394</userId>
       </activity>

    </content>
</entry>


It's 1.1 kilobytes. I'll call that style "enveloping". Here's an alternative that doesn't embed the activity in the content and instead use the Atom Entry directly, which I'll call "extending":

<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:os="http://ns.opensocial.org/2008/opensocial>
   <id>http://example.org/activities/example.org:87ead8dead6beef/self/af3778</id>
   <title
type="html"><a href=\"foo\">some activity</a></title>
   <updated>2008-02-20T23:35:37.266Z</updated>
   <author>
      <uri>urn:guid:example.org:34KJDCSKJN2HHF0DW20394</uri>
      <name>John Smith</name>
   </author>
   <link rel="self" type="application/atom+xml"
        href="http://api.example.org/activity/feeds/.../af3778" />
   <link rel="alternate" type="application/json"
       href="http://example.org/activities/example.org:87ead8dead6beef/self/af3778" />
   <os:bodyId>383777272</os:bodyId>
   <content>Some details for some activity</content>
</entry>


It's 686 bytes (the activity XML by itself is 460 bytes). As far as I can tell there's no loss of meaning between the two. 545 bytes might not seem worth worrying about, but all that data adds up (very roughly 5.5Kb for every 10 activities, or 1/2 a Meg for every 1000), especially for mobile systems, and especially for activity data. I have a long standing belief that social activity traffic will dwarf what we've seen with blogging and eventually, email. If you're a real performance nut the latter should be faster to parse as well since the tree is flatter. The latter approach is akin to the way microformats or RDF inline into HTML, whereas the former is akin to how people use SOAP.

Ok, so that's bytes, and you might not care about the overhead. The bigger problem with using Atom as an envelope is that information gets repeated. Atom has its own required elements and is not a pure envelope format like SOAP. OpenSocial's "os:title", "os:updated", "os:id", "os:url", "os:body", "os:userId" all have corresponding Atom elements (atom:title, atom:id, atom:link, atom:content, atom:url). Actually what's really interesting is that only one new element was needed using the extension style, the "os:bodyId" (we can have an argument about os:userId, I mapped it to atom:url because the example does as well by making it a urn).  This repetition is an easy source of bugs and dissonance. The cognitive dissonance comes from having to know which "id" or "updated" to look at, but duplicated data also means fragility. What if the updated timestamps are different? Which id/updated pair should I use for sync? Which title? I'm not picking on Open Social here by the way, it's a general problem with leveraging Atom.

I suspect one reason extensions get designed like this is because the format designers have their own XML (or JSON) vocabs, and their own models, and want to preserve them. Designs are more cohesive that way. As far as I can tell, you can pluck the os:activity element right out of atom:content and discard the Atom entry with no information loss, but this begs the question - why bother using Atom at all? There are a couple of reasons. One is that Atom has in the last 4 years become a platform technology as well as a format. Syndication markup now has massive global deployment, probably second only to HTML. Trying to get your pet XML format distributed today without piggybacking on syndication is nigh on impossible. OpenSocial, OpenSearch, Activity Streams, PSHB, Atom Threading, Feed History, Salmon Protocol, OCCI, OData, GData, all use Atom as a platform as much as a format. So Atom provides reach. Another is that Atom syndicates and aggregates data. "Well, duh it's a syndication format!", you say. But if you take all the custom XML formats and mash them up all you get is syntactic meltdown. By giving up on domain specificity, aggregation gives a better approach to data distribution. This I think is why Activity Streams, OpenSearch and Open Social beat custom social netwoking formats, none of which have become a de-facto standard the way say, S3 has for storage - neither Twitter's or Facebook's API is de-facto  (although StatusNet does emulate Twitter). RDF by being syntax neutral is even better for data aggregation but that's another topic and a bit further out into the future.

So. Would it be better to extend the Atom Entry directly? We've had a few years to watch and learn from social platforms and formats being built out on Atom, and I think that direct extension, not enveloping, is the way to go. Which is to say, I'll take a DRY specification over a cohesive domain model and syntax. It does means having to explain the mapping rules and buying into Atom's (loose) domain model, but this only has to be done once in the extension specification, and it avoids all these "hosting" rules and armies of developers pulling the same data from different fields, which is begging for interop and semantic problems down the line.

I think in hindsight, some of Atom's required elements act against people mapping into Atom, namely atom:author and atom:title. Those two really show the blogging heritage of Atom rather than the design goal of a well-formed log entry. Even though author is a "Person" construct in Atom, author is a fairly specific role that might not work semantically for people (what does it mean to "author" an activity?). As for atom:title, increasingly important data like tweets, sms, events, notifications and activities just don't have titles, which means padding the atom:title with some text. The other required elements - atom:id, atom:updated are generic constructs that I see as unqualified goodness being adopted in custom formats (which is great). The atom:link too is generically useful, with one snag, it can only carry one value in the rel attribute (unlike HTML). So these are problems, but not enough to make me want to use an enveloping pattern.

Just a little work

 

Tim Bray : "I'm pretty sure anybody who's been to the mat with the Android APIs shares my unconcern. First of all, a high proportion of most apps is just lists of things to read and poke at; another high proportion of Android apps are decorated Google maps and camera views. I bet most of those will Just Work on pretty well any device out there. If you’re using elaborately graphical screens you could do that in such a way as to be broken by a different screen shape, but it seems to me that with just a little work you can keep that from happening."

Tim might want to live through a few real handset projects to understand portability costs. All that little work adds up and is sufficient to hurt the bottom line of a company or an individual, perhaps enough to keep them with Apple. Even if you could develop a portable .apk through disciplined coding, the verification testing alone will hurt, especially as the Android ecosystem of hardware and versions grows.

"Oh, and the executable file format is Dalvik bytecodes; independent of the underlying hardware."

I've heard the same said about J2ME bytecode. 

Activity Streams extension for Abdera

The next time I see someone saying XML is inevitably hard to program to, I'll have a link to some code to show them:

public static void main(String[] args) {
  Abdera abdera = new Abdera();
  abdera.getFactory().registerExtension(new ActivityExtensionFactory());
  abdera.getFactory().registerExtension(new AtomMediaExtensionFactory());

  Feed feed = abdera.newFeed();               
  ActivityEntry entry = new ActivityEntry(feed.addEntry());
  entry.setId("tag:site.org,2009-01-01:/some/unique/id");
  entry.setTitle("pt took a Picture!");
  entry.setVerb(Verb.POST, false);
  entry.setPublished(new Date());
  Photo photo = entry.addTypedObject(ObjectType.PHOTO);
  photo.addThumbnail(
    "https://example.org/pt/1/thumbnail",
    "image/jpeg", 16, 32);
  photo.addLargerImage(
    "http://example.org/ot/1/larger",
    "image/jpeg", 1024, 768);
  photo.setTitle("My backyard!");
  photo.setDescription("this is an excellent shot.");
  photo.setPageLink("http://example.org/pt/1");
}

That generates Activity Streams (AS), an extension to Atom - you can read about it here - http://activitystrea.ms. I think Activity Streams are going to be an important data platform for social networking.

The scalability of programming languages

Ned Batchelder: "Tabblo is written on the Django framework, and therefore, in Python. Ever since we were acquired by Hewlett-Packard two and a half years ago, there's been a debate about whether we should start working in Java, a far more common implementation language within HP. These debates come and go, with varying degrees of seriousness."

For anyone coming from Python and looking at the type system side of things, and not socio-technical factors such as what particular language a programming shop prefers to work in, I would recommend Scala over Java. It has a good type system, allows for brevity, and some constructs will feel very natural (Sequence Comprehensions, Map/Filter, Nested Functions, Tuples, Unified Types, Higher-Order Functions). Yes, I know you can run Django in the JVM via Jython, I know there's Clojure, and Groovy too. This is just about the theme of Ned's post, which is the type system. And Scala has a better one than Java.

James Bennett: "The other is that more power in the type system ultimately runs into a diminishing-returns problem, where each advance in the type system catches a smaller group of errors at the cost of a larger amount of programmer effort"

Sure, maybe at the higher order end of the language scale. But in the industry middle, there's less programmer effort around Scala than Java, modulo the IDE support but that changes year by year.

The Boy Hercules strangling a snake

Anyway, the real problem with Python isn't the type system - it's the GIL ;)

 

Bug 8220

"The TAG requests that the microdata feature be removed from the specification."

RDFa is preferred by the W3C TAG over the Microdata spec made up in HTML5.

How this one plays out will be interesting.  Pass the popcorn!

 

Java Software Foundation

Joe Gregorio: "Does the ASF realize that subversion isn't written in Java?"

Better not tell them about Buildr, Thrift, CouchDB, Etch, TrafficServer et al.

 

Copier Heads

Robert Scoble: "Every month longer that this deal takes is tens of millions in Google’s pockets. Why? Well, the real race today isn’t for search. Isn’t for email. Isn’t for IM. It’s for ownership of your mobile phone."

That was back at beginning of 2008, at the height of the Microhoo excitement. It's interesting to revisit these things.

Scoble said that this because he "met the guy who runs China’s telecom last week in Davos. He’s seeing six million new people get a cell phone in China every month."

That was 138 per minute, about twice the growth rate of internet/web adoption. In terms of world wide adoption of phones, Scoble was probably off by an order of magnitude. It's not the "next big game" as one commenter put it (Tim O'Reilly). It is the big game.

Steve Jobs: "Basically they were copier heads that just had no clue about a computer or what it could do. And so they just grabbed, eh, grabbed defeat from the greatest victory in the computer industry. Xerox could have owned the entire computer industry today. Could have been you know a company ten times its size. Could have been IBM - could have been the IBM of the nineties. Could have been the Microsoft of the nineties."

I read 99 comments back then. About half a dozen picked up on the mobile point. Everyone was talking about property rights on social graphs and inferred information, or web2.0 ad models, or search, or the importance of email. Google still seems to the webco that understands best the importance of mobile.

Concision

David Pollack: "We can argue about whether Scala is syntactically simpler than Java. I find that my code - I did two years of Ruby On Rails work before I did Scala and I've done Java work since 1996 - I find that Scala is as concise as Ruby. I also find that Scala's type system doesn't often get in my way. Where I have to declare types in Scala is where I should document types in Ruby, and I can also use the type system as a way of defining tests. Scala's design lead to Lift's design. Lift's design lead to abstracting away HTTP in a way that we can do real time, or what appears to be real time, push, in a way that's syntactically and semantically pleasing for the end developer."

My observation here is that the last language I found as interesting as Scala was Python.  They reel you in.

 

Process and individuality are not exclusive

update 2009/08/22: Jason Yip also recommended "Lean Product and Process Development" by Allen C. Ward.

Gadi Amit: "Against that "unreliable" branded-personality design management, multidisciplinary agencies push the notion of large teams and a rigid process. The message of the process crowd is simplistic, "have a few more disciplines in place and we can create the winning product with the right design." Here comes the ethnographer and the strategist and the focus-group studies and the 500-page dissertations, and so on. I have yet to see any hard proof that these large processes yield higher rates of success in design. I have met more than a few large organizations that will not take this any longer. The process method managed to stifle creativity and nourish argumentative myopics while exhausting corporate budgets and personnel. The case of Doug Bowman, Google's just-resigned lead designer and the 41 shades of Blue sounds painfully familiar. As you churn out more creative work, more data-points and more "scientific" validation, your design never gets better. The process method justified large design budgets yet never reliably delivered. It catered to the corporate ladder that is now gone. It required time and the ability to commit resources that we've probably lost for the next decade. "

Steven Keith responds: " However, I cannot see how what you advocate can work, in reality. 

I believe one of those incessant problems with design management is that everyone who cares or seeks methods to address their own challenges is too unique or idiosyncratic to borrow workable insights, processes or anecdotes from all the great thinkers and practitioners out there. I wish I had a dollar for every time I had a client proclaim they're going to do the Sapper or Ives thing. Whatever that is. Is this the consiglieri you speak of?

In the final analysis, I see a gorgeously articulated avalanche of design mgmt ideas, methodologies and articles anchored to edge cases like Apple, Google, IDEO and the like. They're easy to swan over and even fun. But, these edge cases are so disconnected from what will work for most. In fact, I see them doing potentially more harm than good. How many presentations at conferences do you see about really average companies that believe in the promises of design thinking but are struggling because they cannot bridge their "today reality" with the mythological fully integrated design business of tomorrow? There are good reasons why."


I see an ugly fact getting in the way of both these positions and it's called the Toyota Product Development System (TPDS), which marries a strong process and measurement culture with Amit's concept of a consiglieri role. In Toyota that position called the Chief Engineer. To argue either end of the process v creativity spectrum in product design, you have to be able to explain away the Chief Engineer role in Toyota and Toyota's phenomenal market success, which has made them one of the largest companies in the world (10th in the Global Fortune 500). In my sector, when people talk about design, they tend to obsess on Apple. They should also look closely at Toyota, who have a design system than transcends both individual brilliance and process.

I think most software people know about Toyota's methods through Lean Software Development and the Toyota Production System (TPS). TPDS is just as important to understand the product development angles. To get a grasp on the TPDS this book, "The Toyota Product Development System: Integrating People, Process And Technology" is highly recommended as is "Product Development for the Lean Enterprise"

 

 

35 internet years

zeldman

2001: "Thousands of new sites premiere every day. Most of them are built to support bad browsers intead of standards. It’s an epidemic. Enough already. We finally have good browsers. Let’s use them."

 

Zeldman

2008:"One accommodates Microsoft as one’s ancestors accommodated Imperial Rome. As a wiser man than me said, 'Render unto Caesar.'"

Installing buildr trunk (1.3.4 pre) on Ubuntu 8.10

Update 2009/04/11: Assaf has a better way:

"There's a snapshot of 1.3.4 you can gem install from apache.org without all the excessive dev dependencies.

sudo gem source —add http://people.apache.org/~assaf/build...
sudo gem install buildr"

WFM.

 

Buildr documentation:

To install Buildr from the source directory:

$ cd buildr

$ rake setup install

I got some errors doing that. This worked for me on Ubuntu 8.10

# cd /tmp
# wget http://rubyforge.org/frs/download.php/45905/rubygems-1.3.1.tgz
# tar xzf rubygems-1.3.1.tgz
# cd rubygems-1.3.1
# sudo ruby setup.rb
# sudo apt-get install python-setuptools
# sudo gem install echoe
# sudo gem install cucumber

# git clone git://github.com/buildr/buildr.git
# cd buildr
# rake setup install
# buildr --version
Buildr 1.3.4

This was to get to a post-1.3.3 Buildr to setup a Scala/Java project structure, as Buildr supports Scala compilation, plus I gather there's lots of good stuff on trunk. I still had to add require "buildr/scala" to the buildfile. As much as I prefer Buildr/Ivy for bootstrapping a project over Maven2, I wonder about needing a cross-language dependency chain (or gems) like this for doing Java/JVM stuff (such as having to install easy_install to get a gem set). Having never used it in a production/industrial setting it's hard to say. Otherwise, I do like Buildr.

Naked CSS Day

It's naked css day; at least for the web pages here that are not html on the filesystem. Part of me thinks this matters less and less each year - for me at least since most weblog information I consume through feedreaders.

A reasoned response to Scala/Ruby at Twitter...

Alex Payne: "Make things, measure them, have reasonable and respectful conversations about them, improve them, and teach others how to do the same." - Mending The Bitter Absence of Reasoned Technical Discussion.

as far as the current Ruby/Scala "debate" goes - I would say always bet on protocols and formats, the web being the prime example. Because as someone who likes Twitter immensely, I like that I don't have to care too much what Twitter is written in or what it runs on. I like that behind the server, the entire stack can be swapped out or ground up rewritten as the service owner sees fit, and as seems to happen with many popular Web services as they grow. That the Twitter API can persist across such internal upgrades is a wonderful thing. This is possible because on the Web, programming languages are an implementation detail. Including javascript/actionscript code on demand.

The Format Of The Long Now

Mark: "HTML is not an output format. HTML is The Format. Not The Eternal Format, but damn if it isn’t The Format Of The Now."

If that doesn't jibe with you, follow the link and view source on the markup around those statements.

Related. Now, view source on that link. Savor the irony.

Feature Creep

Joe: "The ultimate destination of programming language evolution is lisp-without-parentheses"

...with optionally typed function arguments.