« Be Clear | Main | An RDF-backed Movable Type hack »

Customize Me

Dare Obasanjo on attention.xml and collaborative filtering:

"Once one knows how to calculate the relative importance of various information sources to a reader, it does make sense that the next step would be to leverage this information collaboratively. The only cloud I see on the horizon is that if anyone figures out how to do this right, it is unlikely that it will be made available as an open pool of data. The 'attention.xml' for each user would be demographic data that would be worth its weight in gold to advertisers."

Collaborative filtering alledgedly only works if you have a critical mass of items of interest and users to cross-reference. I heard once this needed to get to the low 1000s to ensure reasonable precision. That was back in 2000, by which time people had figured out how to process large in-memory datacubes in close to real time (ie updates occuring between user sessions).

That's on the server.

What we're not doing is considering how filtering might work on the client. When more specific information about the user is available, it's possible to optimize these algorithms to work with much smaller data sets, and in general to think about different algorithms or hybrid approaches. And it's probable the results can have higher relevance for the user. Commercially, collaboration has worked best for targeting mass goods for individuals, which is why it works well for Amazon.

But the choice of algorithm varies based on the nature of the data (a lot of this stuff tends to be fantastically sensitive to the data and how the data is represented). Think about how useless a Bayesian spam filter would be aggregated across a 100,000 user data set up on Bloglines. It could be much better to work against a couple of users you trust and some candidate data of your own to seed the algorithms.

"By the way, why does every interesting wide spanning web service idea eventually end up sounding like Hailstorm?"

Probably the reason they all start to sound like Hailstorm is because they all work on the basis that the computation has to be done on the server against large aggregate datasets. One place, one owner. Cue the consequent privacy concerns. A few years ago, when asked how the trust problem could be solved, a senior executive from Egg bank had an immediate answer - "Branding". The extent people will trust your organisation with their information is largely based on their current perception of your organisation. That's not quite the same thing as branding, but you get the idea.

What do you do with all that information you're generating 24x7? How do you convert it to value? Today's answer is to sell it to the people who have something to sell or messages to tell. The money's not in whatever it is you're offering to users to gather up the data in the first place (like search) - the money's in the side effects. And while converting the data into value for you or for those who want to sell something, the users must not think they're being sold out. Or they're gone. Something of a highwire act - and you only get to fall once.

One way to allow highy specific user information to inform the filters on the user's device, not someone's VC-backed server farm. Really, that's a social solution.

It could be much more interesting to sell this technology directly to users for 5 dollars and let them run it on their phones against the data of their choice. To do that requires a certain amount of letting go of ways of doing things, right through from client-server technology to business models based on TV and print media. The current situation is hopelessly dependent on those systems of buying and selling.

The social networking phenomenon is interesting insofar as it attempt to join users to users or rather than users to services to advertisers. The next step is to get those lumbering servers out of the way and let people interact directly. That will require more imaginative and disruptive business models.

April 5, 2005 02:40 AM


Post a comment

(you may use HTML tags for style)

Remember Me?

Trackback Pings

TrackBack URL for this entry:

Listed below are links to weblogs that reference Customize Me:

» SPARQL at Home from Raw
With the SPARQLing Days coming up next week I've been keeping my eyes open for novel ideas that might lend themselves to cool-SPARQL-apps. Because of the similarities between SPARQL-capable triplestores and SQL DBs, it's hard not to think in the same... [Read More]

Tracked on April 5, 2005 11:48 AM

» Attention information is personal information from Ted Leung on the air
Various people have been discussing the "attention problem" and attention.xml. The basic idea is that the advent of RSS means that we have too much information competing for our attention, and that we need a way to record "attention" data so which [Read More]

Tracked on April 8, 2005 08:07 AM