« links for 2006-05-02 | Main | Bubbles! Shiny! »

There's no such thing as neutral content

Rick Jelliffe on the "The Self-Defeatingness of XML's Media Independence":

"On a project today one of XML's paradoxes struck me: we adopt XML for publishing often because we want to re-target our documents to different publications and media; however we then find it useful if the information is organized or formatted similarly on different media and applications, in order to reduce gratuitous differences, ease processing and to increase branding. So our books end up getting PDA-isms such as small sections, or HTML-isms such as page focus, or RSS/Atom-isms such as chapter and section summaries (It was worth writing this blog for the pleasure of saying 'RSS/atom-ism'.) And our HTML pages get book-isms, such as the familiar TOC and next/previous/up buttons. The initial movement for media and publication independence is met by a counter movement for cross-media and cross-publication homogeneity."

Rick nails it. There is no such thing as channel neutral content, or device independent authoring.

I should qualify that. You can and should go some way to device and channel neutrality, XML is good for that, decoupling channel output from content storage is good for that, neutrality is an admirable goal. But after a point you realise that you're aiming for goal that isn't totally achievable. Compromises are neccesary. As a goal it's in the "world peace" category. Push things too far and the risk is that by trying to be all things to all people and all media you end up with a cacophony of styling issues, unwieldy content models, and jumbled content. Worst case, you end up trying to compute context from excerpts and syntactic markup, essentially trying to solve an artifical intelligence problem.

As writers and publishers, we've internalised "know your audience" as an unquestioned maxim. As writers and publishers, we're much less reluctant to take the other thing onboard - "know your medium" isn't even a maxim, never mind an unquestioned one.

For example lets look at how I quoted Rick. I started with his name and post title, followed by the quote. That style of quoting - "someone at this link said this: - " is quite deliberate. I call it "Ruby Quoting", since I picked it up from reading Sam Ruby's weblog. The reason I use it is because this post will be pumped through any number of aggregators including ones that will strip out the structure and indentation that makes the quote visually distinctive, as is the case with traditional media like print, or with HTML web pages. At quick glance or topline scanning (typical reading modes for feeds), the quote can be mistaken as your words. The lead in helps avoid that from happening.

Here's the more traditonal quoting style, as might be advocated by academia or print publishing guidelines:

On a project today one of XML's paradoxes struck me: we adopt XML for publishing often because we want to re-target our documents to different publications and media; however we then find it useful if the information is organized or formatted similarly on different media and applications, in order to reduce gratuitous differences, ease processing and to increase branding. So our books end up getting PDA-isms such as small sections, or HTML-isms such as page focus, or RSS/Atom-isms such as chapter and section summaries (It was worth writing this blog for the pleasure of saying 'RSS/atom-ism'.) And our HTML pages get book-isms, such as the familiar TOC and next/previous/up buttons. The initial movement for media and publication independence is met by a counter movement for cross-media and cross-publication homogeneity. - Rick Jelliffe

That style doesn't work as well for RSS syndication, especially when the quote is a lead-in to the rest of the post. People tend to think it's you writing and might misattribute the statement, even when you surround it with quotation marks and mark it up with <blockquote>. Second, use of quotes for lead-ins is an idiomatic form for weblogs - posts are often reacting to what someone else said (just like this post is). It also affects traditional essay and chapter techniques, like the use of pithy opening quotes and aphorisms - again, those techniques don't work out that well over RSS - I think someone complimented me once for something Dr. Johnson said. After being misquoted (or even flamed) a few times you learn to adapt the content to the medium by using a different authoring technique. That's what media does to content. It's unavoidable.

I remember being at WWW9 in Amsterdam in 2000 and hearing someone from the BBC explain that massive volumes of content had to be repurposed by hand more or less for the web. And then it had to be repurposed again for mobile devices. I think they were going as far as keeping multiple copies of the content for broad class of devices and channels (some from the Beeb is welcome to confirm/deny that). As it turns out, mobile devices have characteristics not a million miles away from link or excerpt based feeds - which may have something to do with why the BBC seem to understand web syndication more than any other broadcaster. That kind of medium/device might well represent a lowest common denominator for organising digital content.

The autumn after WWW9, I got to review a position paper written by Miles Sabin, which stated bluntly that device independent authoring was a pipe-dream. Here's an excerpt outlining his position:

"The proliferation of new delivery media for web content has brought an old problem to the fore in a new context: how to produce content which is suited to diverse media consistently with constrained cost and time budgets. The range of new devices is very broad, from traditional PC browsers, through mobile, embedded and consumer devices, to speech only devices and devices specialized for accessibility needs. This breadth, combined with comparative novelty, might make it appear that we have a radically new problem to solve, and that a search for radically new solutions might be worthwhile. We believe that this appearance of novelty is deceptive and that, regrettably, the search for novel solutions will be fruitless."

The key idea that paper tried to debunk was that of "primordial content" - it's a necessary assumption for single source* authoring solutions to be achievable. I think the position holds up well, and would go as far as saying the last five years have born it out. Today I would extend that idea to any notion of "primordial markup" - a single metadata set which you can syntactically transform to arbitrary devices. I didn't go to the workshop then, but what we back heard from Miles who went to Bristol to present it, was that people agreed with his take (a surprise: we thought in might be a controversial view), but were pretty glum about the situation.

Update: Miles sent me a link to some slides - http://www.w3.org/2000/10/DIAWorkshop/sabin/slide1.html. he also gently refreshed my faulty memory as to who did the bulk of work - certainly not I! Miles' now properly attributed.

It's not a new issue. A "news story delivered via radio is not the same as the story delivered by television but without pictures", or via a news site with a graphic and excerpt. Consider powerpoint - it has fundamentally altered how business people communicate, in particular how they buy and sell services. No-one expects to be able to autogenerate powerpoints from written documents - they're always crafted by hand from existing content - best case some old deck slides get reused. Email has had a similar impact, although gets less flak than powerpoint. IM and SMS will go as far as changing how we speak.

The medium really is the message.


Colophon: I should say that none of this is meant to be an engineer throwing up his hands and saying "Impossible!". Some amount of channel independence is both possible and desirable, but there's a limit to what can be realistically achieved without having a human author re-appropiriate the content. I think this slide, http://www.w3.org/2000/10/DIAWorkshop/sabin/slide13.html, indicates some of the limitations.

* Thanks to Scott Mark for reminding me of the term "single source".



May 3, 2006 12:34 AM

Comments

Scott Mark
(May 2, 2006 10:30 PM #)

Amen - great post (and nice McCluhan ref). This is the line for me:

adapt the content to the medium

I think single-sourcing content really only exists in narrow lanes, and with more constraints than are advertised - nowhere near the broad solution many people think, for the reasons you cover.

Not sure where you come down on this, but I have the same problem with JSF in the Java presentation tier - what's so evil about producing HTML? Not everything needs to be so abstracted - after all most of us are just delivering web content... I wholeheartedly agree that it's more about device and medium targeting rather than device independence.

Miles Sabin
(May 2, 2006 10:33 PM #)

Thanks for reminding me of this, and giving it a little (somewhat belated) attention. You missed the slides, which make a reasonable stab at illustrating the general thesis,

http://www.w3.org/2000/10/DIAWorkshop/sabin/slide1.html

Bill de hOra
(May 2, 2006 10:54 PM #)

Hi Scott,

Thanks for reminding me of the "single source" term.

"what's so evil about producing HTML?"

Nothing at all, so long as you leave some flex in the code for alternate outputs. Howard Lewis Ship notably has the taken the HTML first view in Tapestry. That said I do find that it's not always as simple as allowing a RSS/an.other view. IME the presentation requirement tends to bled back into the content model, and then back to the authors (support for excerpts, summaries and taglines are prime examples with RSS).

Bill de hOra
(May 2, 2006 10:56 PM #)

Miles!

Yes, it's aged quite well (you did a good job there ;). Thanks for the link to slides, I'd completely forgotten about them.

Scott Mark
(May 3, 2006 01:25 PM #)

"flex in the code"

I think we're agreeing - I think the flex you're referring to is farther upstream than a single set of templates or even the controllers right behind them.

Something like a service layer (overloaded term, but works) needs to be more agnostic, but to me once you implement a controller you have already picked your target.

The idea that the distribution medium affects the content model and content itself is an idea that needs dissemination.

Post a comment

(you may use HTML tags for style)




Remember Me?

Trackback Pings

TrackBack URL for this entry:
http://www.dehora.net/mt/mt-tb.cgi/1842