« Anyone recommend an Irish data recovery service? | Main | The sky is falling in, more after this »

Not a whit

Norm Walsh, Validating microformats: "I'm still not satisfied. And I remain convinced that this problem has to be solved before microformats can be considered a reliable way to encode data."

I wonder if anyone cares*. Straight up take it or leave it validations are often only really useful on administrative boundaries and at the point of publication. Most validations are in fact partial and stateful ("in the brief state, verify that these 3 fields are populated before proceeeding."). If anything's going to "validate" uF's it'll be technologies that can deal with partial information and perhaps some being able to assume that 3 or 4 fields will always be there. You don't need a full valdiation language for that. You need a dictionary of xpaths, or RDF, or code that is dictionary driven acting as a content firewall. Maybe full validation as the default stance is enterprisey? And if you really need it, write a better pipeline to convert from the uF to some XML format that is more responsive to validation tools?


* on re-reading this sounds like a snipe at Norm. I should say then, that it wasn't meant to be. I'm dissonant on the idea that basic stuff like validation and well-formedness can't be assumed, but people and applications seem to cope and be fine with that.


April 13, 2006 07:46 PM

Comments

Norman Walsh
(April 13, 2006 08:48 PM #)

I don't get it then. What's the point of marking up an appointment in hCal if, in fact, you've marked it up wrong and applications are simply going to missinterpret it?

I don't expect consumers to do full validation, but I don't see how I can hope to get my data right if I can't validate it when I'm publishing.

Brian Donovan
(April 13, 2006 11:59 PM #)

Yeah, Im going to have to side with N. Walsh on this one. I think the necessity of validators for formats is pretty much self-evident. I wouldn't try to share information in FormatX if I would have no way of being sure that my chunk of FormatX is formatted properly and can be used by consumers of FormatX.

Bill de hOra
(April 14, 2006 12:05 AM #)

"What's the point of marking up an appointment in hCal if, in fact, you've marked it up wrong and applications are simply going to missinterpret it?"

People are marking up this kind of data as HTML every day. Heck we don't even have wF XML half the time and we're getting by. The thing with uF is that it's human readable first and if you app can make sense of it, that's gravy.

Publishing can be dealt with via a pipeline. Convert the uF to markup, validate the markup, if the markup's good, let the uF out the door. Do some model checking.

I think the question is - what should we optimize for, people or machines? uF's seem to make the application bit harder, but the human bit easier. The tradeoff is very like that of entities v codepoints in XML. On the web the human readable stuff seems to win out, and the engineers get to suck up the slop. If something as basic as wf XML isn't a realistic assumption, then uF slop surely beats HTML slop and counts as significant forward progress.

The price to pay to let meaningful markup on the web get a *foothold* might be validation.

Bill de hOra
(April 14, 2006 12:12 AM #)

"I think the necessity of validators for formats is pretty much self-evident."

I don't. RDF people have been saying this sort of thing for meaningful web data for nearly a decade. But we don't even get to assume wF XML or HTML. So there's this requirements overshoot that dooms ideas like "semantic web" and "XML validation" and "well-formedness" to failure because they're much too ambitious for the web as we find it. Applications that needs such things to function won't see broad adoption. It's just not happening at the level we'd like. uF then seems like a good local maximum to aim for in this decade.

Brian Donovan
(April 14, 2006 12:44 AM #)

>RDF people have been saying this sort of thing for meaningful web data for nearly a decade. But we don't even get to assume wF XML or HTML.

And yet ... the W3C HTML and CSS validators see a lot of traffic. Browser extensions that give information about validity (like the Tidy and Firebug extensions for Firefox) are also quite popular. In a lot of instances, people may elect to put out non-valid HTML ... after they've run their HTML through a validator and assured themselves that the only stuff causing it to fail validation is the bit that they feel they've got a good reason to put in there. Validation is useful because it allows us to take stock of where we are. We may want to break the rules, but we need to know what the rules are first.

I think that not personally seeing the necessity of providing a means of validaion is fine, but if you stick with that attitude, it's not going to help adoption/spread of the formats you're promoting since other people who will be writing software that generates and consumes those formats and even a lot of people who will be hand-coding in those formats are going to want to have validators available to make sure that they're handling things properly.

It would seem easier to roll up your sleeves and whip up a validator or suite of micro-validators for your micro-formats than to keep boldly opining that validators aren't necessary/useful. What's the obstacle?

[] Lack of time and manpower, but we'll get to it eventually.
[] Too hard. We don't think that we can agree on the semantics of the formats / nail them down well enough to write a validator.
[] Ideologically, we're opposed to validation.

If it's the first, fine, but why the attitude? If the 2nd, then I would urge you to reconsider and try harder. If the third, which is what it sounds like so far, then I really think you're making a mistake.

Norman Walsh
(April 14, 2006 11:10 AM #)

"Heck we don't even have wF XML half the time and we're getting by."

Well, sure, I suppose. If "getting by" is your target...but I like to
aim just a little higher than that. I recognize that what I actually
want, and what I'd be willing to put up with to get there, puts me
more than a couple of standard deviations off the mean, but still, the
argument that anything that gets deployed is fine as long as we'll be
able to "get by" doesn't appeal to me.

"The thing with uF is that it's human readable first and if you app
can make sense of it, that's gravy."

Yes, I get the human readable first argument, but if the data isn't
machine readable second, what bother with any extra markup? Surely,
easiest of all is just to make it human readable only. And if your
microformat markup is broken, all you've really done is made it human
readable first and machine misunderstood second. Actively
promoting misunderstanding seems unwise to me.

Note that I'm not saying consumers must validate. I'm not
even saying they should. I'm not saying producers
must validate. I'm not even saying they should (even
though I do think so). All I'm saying is that validation ought to be
possible. So that if I care about making sure that my microformat
markup actually encodes the information I intended, I can tell if I'm
right or not.

Bill de hOra
(April 14, 2006 12:07 PM #)

Brian,

"It would seem easier to roll up your sleeves and whip up a validator or suite of micro-validators for your micro-formats than to keep boldly opining that validators aren't necessary/useful. What's the obstacle?"

"Boldy opining"? Nonsense. Don't you have a a better argument?

I'm not saying this is a good state of affairs. But it is the state of affairs. With respect to microformats I see the mV (must validate) position as akin to saying what's the point of not having a perfect and complex eye, because we *really* need to see. Seeing is a *requirement*. I'm very curious to see how this plays out, and whether validation is like the backlink, something that gets pushed to one side.

"Ideologically, we're opposed to validation. If it's the first, fine, but why the attitude? If the 2nd, then I would urge you to reconsider and try harder. If the third, which is what it sounds like so far, then I really think you're making a mistake."

Whoa. Please don't miscontrue what I'm saying. I think you need to go back and read what I wrote again because your perspective above on what I said doesn't make *any* sense to me. On self-examination, I'm fairly sure I have no ideological opposition to validation. I use validation tools a lot. What I am wondering is whether we are going to see widespead adoption of uFs with all the associated validation issues irrespective of what we think the value of doing things "properly" is. I think it's very possible. So. I'm not saying Norm's evaluation is wrong - I agree with it. I am wondering whether it will matter.

Anyway, the core problem with your position aside from some accusations, is that I'm pretty sure I had this kind of argument about software agents and then about RDF and then about REST and then about feed formats and it just didn't matter. I haven't been around long enough to remember the TCP and OSI stack and CORBA wars, but there's a pattern to technology adoption - the "problematic" approaches steamroller everything. If we can retroactively apply this "properness" argument to formats that tends to get adopted on the web like RSS and HTML, I think that's an awkward position to be in.

Bill de hOra
(April 14, 2006 12:25 PM #)

"Yes, I get the human readable first argument, but if the data isn't machine readable second, what bother with any extra markup? Surely, easiest of all is just to make it human readable only. "

Back in my RDF days, I would have agreed with you. Actually, hang on, I don't need to go back to my RDF days. I think I was saying exactly what you just said there about greasemonky scripts less than 12 months ago. Steve O'Grady and Sam Ruby convinced me that they still had merit even tho' they were sloppy.

Now, I think "people", as in "not programmers" are happy enough to reverse the tradeoff by pushing all the toolchain complexity under the water and letting the programmers deal with it, instead of trying to work with and consume XML formats in custom editing environments. That you and I are cool with something like nxml mode I think make us outliers. I'm sorry if the original post came off as hard with the "I wonder if people will care" bit, but I'm not optimistic they will.

Which means I think we could end up running transformations backwards - from presentation formats to information formats. It's not much better than screenscraping, but it is better than screenscraping.

"All I'm saying is that validation ought to be possible."

Maybe toolchains that refuse to work with invalid or partial data will get left on the shelf. I wonder if Mark P's universal feed parser is the future of software development.

Aristotle Pagaltzis
(April 14, 2006 03:35 PM #)

Every developer of consuming applications will have to make sense of all the crap floating about, anyway, so to put data out there, just view-source on one of the pieces of floating crap, make some conjectures, and copy-paste. The consumers will muddle through it anyway, somehow, if your data is important enough. If not, well then noone but you will care anyway; no big loss.

Why even have standards at all, really? Reverse engineering popular implementations is good enough.

Robert Sayre
(April 15, 2006 03:17 PM #)

Standards should be all about reverse engineering popular implementations.

Danny
(April 15, 2006 05:35 PM #)

I reckon a validator would be extremely nice to have, though I don't the lack of one is a showstopper. I'm pretty sure the Feed Validator has had a strong beneficial effect on the quality of feeds, it's easy for a tool developer to check their output. What's more I believe it helped inform choices (and spot bugs) in the design of Atom.

A possibly relevant point is that by publishing a piece of data you are in effect asserting the information it contains. The GRDDLing of microformats into RDF is a pretty direct machine-oriented approach. But if the markup cannot be said to valid, how is anyone to tell if that's actually what you meant to say? I guess this is the old XML-Draconian thing, but whatever, avoidable mechanical errors with legal documents (e.g. licenses) or bank statements aren't desirable.

Aristotle Pagaltzis
(April 16, 2006 02:21 AM #)

Robert: the way to write is a standard is not the way to write an implementation. Standards should absolutely be about reverse-engineering popular implementations, but implementation should not be about reverse-engineering other popular implementations.

Bill de hOra
(April 16, 2006 08:40 PM #)

"I reckon a validator would be extremely nice to have"

Should it stop the ship?

"But if the markup cannot be said to valid, how is anyone to tell if that's actually what you meant to say?"

This is the least of your worries - authorative protocol metadata trumps this. I publish some OWL as application/xml or some hCard as text/xhtml. Some client reads the data as OWL or as hCard. There's a Terrible Inferencing Mistake on the client. Did the client screw up by reading more into my data than I delcared?

Reverse the situation for form upload.

So I think that's a hole in webarch that people working with semweb or uF are eventually going fall into. Today no-one is going care about the consequence of unjustified inferences. We'll need another level of automatation before media types really start to look antiquated.

Post a comment

(you may use HTML tags for style)




Remember Me?

Trackback Pings

TrackBack URL for this entry:
http://www.dehora.net/mt/mt-tb.cgi/1801