" /> Bill de hÓra: August 2006 Archives

« July 2006 | Main | September 2006 »

August 31, 2006

Disambiguation and Homnyms

hakia search for "Jaguar"

It didn't ask me what I meant by "jaguar", or provide a "did you mean" option. Maybe someday.

August 29, 2006

links for 2006-08-29

August 28, 2006

Eight Fallacies of Distributed Information Systems

Essentially everyone, when they first build a distributed information system, makes the following eight assumptions about the data. All prove to be false in the long run and all cause big trouble and painful learning experiences.

  1. People tell the truth.
  2. Content is independent of presentation.
  3. Syntax doesn't matter.
  4. Identifiers are reliable.
  5. Metadata and data are consistent.
  6. Schema ensure interoperation.
  7. All the data must be available.
  8. Canonical models can be determined.
  9. Index latency is zero.

links for 2006-08-28

August 26, 2006

Zope, Java Content Repository, and Web Architecture

Roy Fielding: "With the JSR170, and what's currently JSR283, we're trying to take the same engineering principles and architectural principles that we apply to the web, and try to see what's applicable to inside the application server, or inside the server based implementation of server based applications"

[the podcast entire is highly recommended for people who want to get some background history on the Web and REST architecture.]

Zope/ZODB is the most mature implementation of JSR170 I know of. The JSR170 content model is hierarchical, just like ZODB (and any interesting content model built on top, like CMF and Archetypes).

Here's the diagram from the JSR170 spec:


Anyone who ever had to explain Zope will be familiar with that kind of picture. The desirable characteristic about a tree based model is that you can store any data format in it, without hardcoding domain models and rulesets. By extension you can do this also with labelled graphs, which is how RDF gets to claim its high flexibility, being a graph based model. In an RDBMS you need to know the domain model to define tables and then you need to know the rulesets to decide what are integrity constraints and what are application specific concerns. To get a sense of the engineering work involved in trying to make an RDBMS flexible read Scott Ambler's Agile Database Techniques. It's possble, maybe, but not cheap to do. To be clear - preferring to manage content *as content*, instead of lossy extraction of domain entities is about preferring flexibility and independent evolution of systems. There's plenty of room for an argument that says RDBMS is suboptimal for managing content. And good luck with versioning or translation on top of relational databases while trying to manage the form and flexibility of the content over time. Yes, you can do it, but it's highly specialised and diffiicult work.

I doubt the backend storage is easily commoditised for tree based content. Like JCR, ZODB is ultimately an API but physical storage is difficult to abstract away for hieriarchal content models. It's not clear, at all, that RDBMS can sanely support a hieriarchal content model for the long haul. RDBMS is the gold standard for enterprise data management and JCR170 as a Java API plays squarely in the enterprise space. "Everyone" wants an RDBMS backend - not for the data itself, but for operational simplicity, enterprise IT rationalisation, and the predictable performance characteristics. It's tough to know what the right tradeoff is. Ignoring RDBMS tech suggests not taking enterprise concerns seriously and risking IT ops and predictability in systems design and management. Ignoring the content and data longevity concerns suggests you end up preferring expensive non-repurposable silos than allowing the business to function well. I think these are not mutually exclusive positions, but getting the balance right is tough.

August 24, 2006

Getting to the Point

Amazon Elastic Compute Cloud (Amazon EC2) - Limited Beta:

"Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers. Just as Amazon Simple Storage Service (Amazon S3) enables storage in the cloud, Amazon EC2 enables "compute" in the cloud. Amazon EC2's simple web service interface allows you to obtain and configure capacity with minimal friction. It provides you with complete control of your computing resources and lets you run on Amazon's proven computing environment. Amazon EC2 reduces the time required to obtain and boot new server instances to minutes, allowing you to quickly scale capacity, both up and down, as your computing requirements change." Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use.


"Dad, will you turn on Google. I want to play miniclips"

August 22, 2006

Take The Hint

Flickr image roll:
take the hint.png

Town planning


Screen real estate


August 20, 2006

Software Dependency Antipattern

"There's a Google Summer of Code project for that"

I hope all those projects work out ;)

August 15, 2006


Ian Hickson:

I did a short study recently checking only for _syntax_ errors in HTML documents, and the results were that of the 667416 files tested, 626575 had syntax errors. Over 93%. That's only syntax errors in the HTML, not checking the CSS, the content types, the semantic errors (e.g. duplicate IDs -- 86461 of those files had duplicated IDs), or any other errors.

August 09, 2006

links for 2006-08-09

August 04, 2006

links for 2006-08-04

August 03, 2006

Where are all the REST toolkits?*

Aaron Johnson: "Version 2.2 of WebWork introduced the ActionMapper interface and a class called RestfulActionMapper, which gives you the ability to create URLs that might look something like this: http://bookstore.com/books/category/java/keyword/webwork"

* I remember Pat Lightbody asking this a few years back; Pat works on WebWork :) RestfulActionMapper is very cool if you don't get hung up on parameter ordering. For example, JIRA 4 might be able to stop using .jspa extensions and have cacheable ticket queries and Confluence's feedbuilder could use this to produce clean URLs.

one trillion words from public Web pages

"Watch for an annnouncement at the LDC, who will be distributing it soon, and then order your set of 6 DVDs"

Quick, somebody parallelize ispell. Suffice to say, Google aren't using dictionaries.

links for 2006-08-03