« Jabber and alternate REST architectures | Main | Assume we don't have an identity card »

Jon Udell meets metacrap

In the assumptions and bias of a single form, Jon discovers two key problems with structured symbolic metadata and ontologies:

  1. cost of data entry
  2. classification bias

Jon focused on our inability to derive his interests from existing data on the network:

Now clearly I lead a much more public life than most, and I create a much more complete document trail for Google to follow. But is that a difference in degree, or a difference in kind? I suspect the former. And if that's true, then I'm skeptical as to the benefit of a parochial reputation system such as LinkedIn, which requires extra effort to join, to feed with metadata, and to use. If we have (or are rapidly evolving) a global reputation system that can absorb and contextualize our routine communication, then parochial systems will need to deliver huge amounts of extra value.

but in this case I find the issue of bias more telling. It's pretty clear where the classification bias is resulting from:

Jon makes it sounds as if he is stuffed, but really it's the end consumer of the collected data that is stuffed. All those relationships are fiduciary or work based, probably hacked out of some sales/marketing breakdown that make sense for those contexts alone, not for Jon's. The bias is evident as should be the end result - the collated data is virtually useless as basis for making inferences. And if you're not familiar with machine learning or search technology, it might interest you to know that bias is a well understood, mathematically appreciated phenomena in those fields. The immediate problem is that bias and absence of context always results in junk data unless everyone does what Jon did (take a raincheck), rather than just pick an arbitrary one. The overarching problem is that you cannot eliminate such bias, no more than you could eliminate latency from the Internet - it's something you manage explicitly.

No matter how we good we get at this or how popular classification systems become, we'll always need to add some statistical and probabilisitic data in the mix to keep things slack. Any classification over you ultimately should only approach 1 or 0, not be 1 or 0 - these things are not certain. Hand crafted logical ontologies are not sufficient precisely because they want to be certain. They don't drift with your interests over time, they're rigid, they're deterministic, they can only see around so many corners. In short they age badly, and they evolve badly.

January 4, 2004 02:28 PM


Trackback Pings

TrackBack URL for this entry: