« BinSpeak | Main | Home network with ADSL »

Information ownership: it's not yours until you can move it

I completely agree with Tim O'Reilly's assessment of the privacy concerns around Google GMail. Mark Nottingham has made similar observations. It's a red herring, that can be dispelled by explaining how email works. What I wanted to pick up was an almost casual observation Tim made on data migration:

The big question to me isn't privacy, or control over software APIs, it's who will own the data. What's critical is that gmail makes a commitment to data migration capabilities, so the service isn't a one way door to the future. - Tim O'Reilly

The free flow of data across applications just isn't happening today. It is essential, and I think inevitable that the way we manage our information changes, given the way we work and live.

Outside the consumer space of GMail I work in integrating systems, using web technology, XML and SOA. "Systems Integration" is not really systems integration; it's information itegration. When you move, reroute and repurpose information, a business can benefit from either a new service or an existing service at lowered cost with heightened quality. That's what it's all about. It's expensive, but becoming less so, in part due to the commoditization of software through open source and standards. Most business systems are not designed with data accessibility in mind. Yes, we've had XML for years, but XML's real impact so far has been down in the protocol/plumbing space. It's true value for data has not been realized - all that XML lying around is as yet, untapped potential.

I'm telling you this, because I think what's happened in business systems over the last 5 years can inform what's going to happen in the consumer space. To appreciate the consumer application space, replace "systems" with "applications" and "business" with "customer" in the above paragraph and we might see that things are going to change. The same commoditization of data formats will have consequences for current consumer software business models, just as the commoditization of plumbing and networking has had for business systems.

Tim again:

The ability to search through my email with the effectiveness that has made Google the benchmark for search. How many times have people asked, "When can I have Google to search my hard disk?" That's a hard problem, as long as it's just your disk, on your isolated machine. But it's solvable once Google has lots and lots of structured data to work with, and can build algorithms to determine patterns in that data. Gmail is Google's brilliant solution to that problem: don't search the desktop, move the desktop application to a larger, searchable space where the metadata can be collected and made explicit, as it is on the web.

I don't know - this is what stops GMail looking like innovation to me. I agree with Jeremy Zawodny that this is incremental improvement. Google are ultimately loading your email into a database, as they've done with the Web. It's a centralized model, not an edge model. To me that's highly dissonant with the network OS vision. And it's just email - personally, email is a fraction of the information I need coherency for. We need search that ranges across protocol and application information space.

I believe anyone who can distribute search away from the centre to the edges will win big.

Where Google is showing huge innovation is in technology management - by the sounds of things the ratio of admins to servers over there is impressive. You really, really, want these guys running your data centres. This is much like pointing out that where Amazon innovated was not technology but logistics management combined with a new sales channel. And there's only so much excitment to be had in optimizing the data centre ;)

Tim mentions Chandler with regard to the desktop and sounds almost disappointed. But open source could be a driver in making data application independent because open source is where the momentum for change can be maintained. What needs to happen here as much as anything is to get the open source community (especially the Java community given the portable nature of Java), off its focus on server-sided networking and start looking at users and their needs. We don't need any more web frameworks. Projects like Chandler, and less directly Eclipse, are the start of that. Like using open source for middleware, this will be only advantageous for some prime movers - if everyone does it then the economics change drastically and companies whose model is traditional product and not services will start to feel the pain of lower and lower margins until they rethink what it is they do.

Further innovation I believe will come from open source. There's no incentive to do this in the commercial world, except to gain a temporary advantage over the competition or make some soothing marketing noises to consumers. To really do it, to really make the strategic decision that the application franchise is secondary to the users information needs and execute on that vision, will require a ground up rethink, new technology, new business models, new partners, the lot. The industry is not going to do this of its own accord - data lock-in is a cash cow. If Microsoft ever moves against Google it may be in part because moving your data to a Google cluster from the desktop has implications for the Windows and Office franchise. But all that's really happened is a switch from one centralized model to another.

As for all that space. A GB of disk space is roughly a dollar (and who knows how cheap at the bulk Google will buy at). This makes even more sense given Google's management innovations - they could probably have gone to 10Gb. 1GB, when you have it, it isn't enough.

And then there's the structure of the data itself. If you want your data to travel into the future with you, look for RDF or Topic Map compatibility. Those formats are independent of the highly transient application formats and the less transient protocol formats. Yes, they're associated with all that Semantic Web handwringing about ontologies and the like, but in that event a transformation or a script will do and is better than agonizing about the perfect information model.

April 18, 2004 10:20 AM


Trackback Pings

TrackBack URL for this entry: