« Windows XP reinstall list | Main | A registry for one click feed subscription, anyone? »

Deprecating Metacrap

Dare Obasanjo is coming around to structured metadata:

One thing that is clear to me is that personal publishing via RSS and the various forms of blogging have found a way to trample all the arguments against metadata in Cory Doctorow's Metacrap article from so many years ago. Once there is incentive for the metadata to be accurate and it is cheap to create there is no reason why some of the scenarios that were decried as utopian by Cory Doctorow in his article can't come to pass. So far only personal publishing has provided the value to end users to make both requirements (accurate & cheap to create) come true.

The fact that those who are working with RSS are getting a sense of the value of metadata, especially aggregated metadata, is all upside. Despite what some people might still believe, there is a growing set of metadata out there where the burden of creating it is close to zero. Creation is a side effect of using a computer - the only interesting cost is bothering to deploy tools to collect it. There is other metadata again that is cheap to create, such as Movable Type categories, Wiki backlinks, or recently Technorati Tags. Start to mix and match this stuff with statistical techniques and you have the basis for powerful ways to organize information.

When Google asks for more metadata, all bets are off.

The Metacrap article has been given too much credence over the years. When you've been working with Wikis, semantic CSS hacks, or RSS, it's hard to ignore the benefits of metadata, so if that meme is fading, perhaps it's not a bad thing. Clearly not everyone requres a rigorous approach. If RDF or Topic Maps can come down to integrate with loose and fast approaches, things could get interesting.

January 25, 2005 07:22 PM


Mike Champion
(January 26, 2005 12:26 AM #)

I think you're using that Metacrap article as a strawman. Consider the conclusion:

" Metadata can be quite useful, if taken with a sufficiently large pinch of salt. The meta-utopia will never come into being, but metadata is often a good means of making rough assumptions about the information that floats through the Internet.

Certain kinds of implicit metadata is awfully useful, in fact [discussion of Google].

Taken more broadly, this kind of metadata can be thought of as a pedigree: who thinks that this document is valuable? How closely correlated have this person's value judgments been with mine in times gone by?."

So how is that inconsistent with the success of RSS? It sounds exactly like "Creation [of accurate metadata] is a side effect of using a computer - the only interesting cost is bothering to deploy tools to collect it" to me. The strawmen that Doctorow was flaming -- carfully crafted metadata using standard taxonomies added by authors and taken seriously by search engines -- are still cinders, are they not? An exception might be the various categories in blog syndications, but those seem to be used as much for humor as anything else. (Mark Pilgrim's "Those that tremble as if they were mad" category comes to mind ... )Technorati et al are for the most part making more use of the implicit metadata available in the somewhat more structured blog/RSS idiom, not Deprecating Metacrap.

Bill de hOra
(January 26, 2005 01:31 AM #)

Hi Mike,

"I think you're using that Metacrap article as a strawman."

My issue with Metacrap is the credence it's been given and the way said 'meme' propagated and became a way to obliterate discussion - "nah, that's metacrap". It notably wasn't called SomeMetaCrap or TaxonCrap - strawman building on strawmen methinks!

"Carefully crafted metadata using standard taxonomies" - sure that's of limited vaue between administrations or cultures. Aggregate cases can leverage statistical methods to great effect to smooth out garbage or gaming (Technorati tags should be a good example). But give it a few years of loose and fast tagging - someone is eventually going to subset RDF or Topic Maps to close the gap, if only to provide a query language for non-aggregate cases. I can't look at RDF for too long anymore without thinking about SGML.

John McCormac
(February 23, 2005 09:16 AM #)

The whole meta data argument is an interesting one but it would seem that it is not taken seriously by web developers here in Ireland. My website and search engine spiders index the Irish web on a regular basis and if anything, the number of Irish sites with proper meta data is terrifyingly low. The figures for .ie cctld below illustrate the problem:

Estimate Of Live .ie Websites: 29913

Websites Without Page Title: 5349

Websites Without Page Keywords: 18295

Websites Without Page Description: 18848

Websites With Page Title: 23561

Websites With Page Keywords: 10615

Websites With Title and Keywords: 10532

Websites With Title, Keywords and Description: 9592

Meta data was great in the 1990s when technology was expensive. It was easier for search engines to strip the meta data from a page and use it instead of the body text of the page. Google and its link based algorithm changed the emphasis. The academic slant is now towards the Semantic Web. However I am not sure that this will work either because it requires the web developer to automatically integrate the meta data into the webpages.

I took a look at the Metacrap article referred to above and as a search engine operator, most of it is 100% accurate. People will try to game any system - the Technorati tags is on that will be an interesting study with this. Links were not that important before Google implemented its links based algorithm. Meta data is a nice idea and it will be great when people implement it.