« links for 2006-05-13 | Main | Dead Planet »


Eric Meyer is flummoxed by frameworks:

"But I just don't get all these new-fangled programming frameworks. Is something wrong with me? Seriously. I have this grumpy, churlish feeling that I suspect is rather similar to the way SGML experts felt when they saw HTML becoming so popular, and that scares me."

I just don't know. I'm trying to remember a time when there weren't frameworks - back when CGI supported "rm -rf" maybe. Still there are many sites made out of flat files, hand edited.

There is a point when the framework will either get in the way or show its true value. I think it's the point when you need to produce some new behaviour with code as opposed to rearrange things on the screen. Sometimes you need to build a porch, not paint the wall or add some new flatpack thingie to the middle room.

[ I'm finding that with Movable Type these days, I wish it was a framework instead of a product - there are things I want to do, but altering it seems tricky. It doesn't strike me as "for alteration". It'd also mean I have to learn passable Perl - not MT's fault, but as Perl has always gone whoosh right past me, I'm not hopeful about staying on MT.]

Eric's problem is that if you want total control, you will have to do a lot of work. In the web/bloggy space alone - i18n (maybe), tagging, commenting, archiving, slugging, login, preview, managing, spam filtering, templates, rpc, feed generation, scheduling jobs, who knows what else. You'll also have to make sure that things can be added later in a sane way. True, there is nothing worse than a bad framework, except maybe a bad programming language, but you'll be left in the dust otherwise, either reinventing what others can assume will "just" be there, or writing out all your content again in a way that can be processed by software. Total control implies total effort.

Speaking of content rewriting. Recently, I've been porting a website of flat files (spread over half a decade or so). I've been doing this to test a web framework - frameworks that only function in greenfield or closed situations are not especially interesting. The really interesting tools are often ones that introduce simplicity and order in already complex and highly entropic systems, as oposed to avoiding them and declaring victory. Also (more importantly) the goal is to make the content flexible and available for future needs. Now, the flat file thing itself is great - 100% scalable, precomputed, self-archiving, and so on. But that's not the issue - it's that these files have been written out by hand, where each was cut and pasted from an older one.

The most interesting thing about this site is how the flat files have changed over time. The older pages are different from the recent ones, you can see the copy errors, mutations and evolution take place, but any two side by side in time are almost indistinguishable, exhibiting very subtle alterations.

From a certain viewpoint the site more or less looks like a grand canyon, each internet year producing a new seam.

It's remarkable that something like a website can have a geology, layer after layer of frozen accidents. Integrators will be used to seeing that in non-web systems. But the difficulty is you can't terraform this kind of site - reskinning means editing every single page. Despite the fact that each html file can be read in, they're different enough as collection to mean that they can't easily be uniformly processed. To understand what it would take uniformly process them means analysing all of them, which is to say, if we could spec the code in advance they'd be a priori uniform.

Part of what I've been doing is normalising the files and reverse engineering the latent templates. Now, this kind of work would drive many people insane, but because I deal a good bit in legacy systems (as in I'll admit they exist and need to be linked up), it feels like a workout, a free chance to learn something about how systems old and new need to work. There's no magic shortcuts here, but it's not all eyeballing either - there are also tricks and techniques you can use to determine how variant the collection is - sort of data mining for common structure.

This kind of work also can also drive the framework insane, which is exactly why it's valuable. A good framework will deal gracefully with stress resulting from variation, which means dealing with the structural, syntactic and semantic exceptions, and not insist the data is just so. The green field is wafer thin. In the trenches you need to know if and when the tool will fail you, or whether it will be a force multiplier.

Ok, so we were talking about frameworks. Here's the thing, if you want to be able to terraform, to alter the presentation of content after the point of creation, you need a web framework. Now that framework can be as simple as a few scripts to inject some text into a one up, one down, three across html layout, or as complex as as high end CMS, but it's still a framework. What the web framework does is provide a rendevous point between some code and some content. As soon as you want to do something like reskin, output html and rss side by side, or associate a comment with a post, you're in framework territory. Which is to say to capture behaviour over data in repeatable form, you need code and a place to run it*. Frameworking is programming.

Some things can be made simpler via declaring the relation between some code and some data instead of writing out the code directly, (CSS is a good example of this) but not all things. Contrariwise for uniform processing you need to write out the content in a uniform manner as possible, which places pressure on content writers and designers to conform to the code. Historically, that pressure has been deemed to be excessive, which is perhaps why Eric Meyer sees frameworks as straightjackets.

On the bright side for the content focused - authors, designers, folks who are more like Eric Meyer or Jeff Croft, and less like me, "modern" web frameworks seem much more interested in catering for people who aren't programmers and have broader use cases than automating drudge coding work or making it easy to bind to a database. That's clearly the case with frameworks like Django and Plone and seems to be where the various web developer communities are headed. And that's a good thing.

* I had something in here originally about code then being an existential quantifier over a data structure (or an operator over a type), but I suspect it would have ruined the post :)

May 13, 2006 01:47 PM


Aristotle Pagaltzis
(May 13, 2006 10:54 PM #)

Do note that in his follow-up post, Eric clears up the confusion as a very specific hang-up that I’d never seen before: he thought frameworks are some sort of preprocessor or code generator stage that hides the programming language under some kind of layer.

Koranteng Ofosu-Amaah
(May 14, 2006 02:52 PM #)

It strikes me that a large part of your current exercise is about inferring structure where there is none, or little shall we say. That you are trying to discern patterns like all those glue layer people.

It is interesting to ponder a related question namely what are the design characteristics of systems that would be most adept at these things? And at what stage do these characteristics apply? For example an arbitrary breakdown

  1. up front during the creation of content
  2. during initial tweaks of content (mostly styling)
  3. after the content has been created and style mostly settled

All the big framework people will talk your ear off about "programming models", separation of concerns, user roles etc. In practice they worry mainly about phase 1 which they try and mostly fail to keep as lightweight as possible. The problem is that the software is very unforgiving about structure upfront.

The most interesting bit that you have highlighted is the need for flexibility in evolving schema which relates mostly to phases 2 and 3. That's the bit about structured and semi-structured data and trying to impose order.

Alex Russell in a passing comment about Jotspot says

"Instead of forcing you to think about some kind of MVC fuss-and-bother, you build what you were after in the first place, usually a form, and then start iterating on the implied structure of that data. You don’t change a model and upgrade a schema, you just add the property you wanted to add."
That's that "long tail of software" concern, that's the Holy Grail that everyone is pursuing. From the blue screens on, from spreadsheets and word processors, from Lotus Notes on, from blogging platforms on, from Atom store dreamers on etc.

The web by bringing this ungodly number of "content creators" to the table is putting lots of pressure on those writing these frameworks and on the emphasis they place. There is a sweet spot for someone who gets the balance right.

I'm not sure who will 'win', from the older legacy applications to the new web-native apps, to jotspot, to wikicalc, to dabbledb, to notes, to infopath to a finally refocused MS Office for the web, to the blogging platform things (wordpress, moveable type etc), to the google data thingimijig, to whatever comes out of "dehora's roll your own wrangler". There's a law of large numbers that will surely focus the mind.

I have one of those long, winding pieces incubating, this data stuff is very interesting...

Bill de hOra
(May 17, 2006 11:25 PM #)


"The most interesting bit that you have highlighted is the need for flexibility in evolving schema which relates mostly to phases 2 and 3. That's the bit about structured and semi-structured data and trying to impose order."

Yep. I think people who have adopted XML for interchange and integration, but who have not come from a publishing/document background are starting to realise that there is more to data contracts and interop than a shared schema and some generic type pixie dust - indeed it's naive to hope you can capture all the nuances and processing cases. Tim Ewald's recent "make it all optional" is a typical reaction to getting burned by one schema fits all systems (the wrong one imo, which I'll get to).

The semistructured way has always been interesting because it lowers the costs of producing content and shifts the burden of costs onto developers to infer structure. The uptake of microformats and RSS indicates a new economic model is coming into play. It seems yet another part of software industry is being disrupted.

Even though it's more effort to tease out the data from a uF, most code taking in uF data won't function so well if all the fields are not there - in that sense the application layer code hasn't changed much under the hood from other formats -depsite all the on wire action. Exceptions will still get thrown and software will still choke on the assumptions.

The ultmate expression of this style then is RDF, not microformats, because RDF not only has a built-in mustIgnore model, it has a mustNotAssume model - RDF systems allow you to query over partial and incomplete data sets - yet despite the semi-structured bit, it's all highly formalised under the hood.

That's extremely attractive, but it turns out to build to that model means throwing away a *lot* of art based on imperative/OO programming techniques, what we might consider basic development skills. That's a huge huge difference in the bowels of systems - almost all app code has to adapt to this notion that only some of data might arrive. I see zero chance of that happening any time soon (eventually yes, but not soon). That's one reason why there are no killer apps dependent on RDF - the toolchain needs ripping out to support it.

When people ask what's the point of RDF, or look to where the meat is, the answer is not in the RDF data. It's in throwing out all that fragile imperative code for something more robust and more flexible. RDF is only a forcing function. In fact without adapting the app programming techniques, RDF use is going to make things appear even more fragile rigid, and broken than they already are - ultimately what data contracts do is protect you from your code's inability to cope. It's not exactly the silver bullet any programmer is hoping for. RSS and uF have much less drastic impact and should (do) have better adoption characteristics as a result.

So, here's a hypothesis. If you can cope with RDF partial data sets, you can cope to a large degree with schema evolution issues and variant quality in the data arriving. It's not the same approach as make everything optional, which is why I think Tim Ewald is off the mark.