« We, The Observers | Main | Java get/set - not that harmful. Version control for refactoring - harmful »


Here's a quote from an interesting paper/manifesto from Gregory V. Wilson entitled Extensible Programming:

Language extensibility has been around for years, but is still largely an academic curiosity. Three things stand in the way of its adoption: programmers' ignorance, the absence of support in mainstream languages, and the cognitive gap between what programmers write, and what they have to debug.

Wildon talks about code generation and the above very much nails the concern I have with code generators, even while liking the idea of code generators quite a bit.

The paper has some good analysis of existing mixed-in languages such as JSP and XSLT. It concludes that we'll end up storing our programs as XML but looking at transformed representations, which is something of a "tools will save" us argument. Wilson addresses a list of possible objections, though one important one is missed - how is that going to work with version control, specifically version diffs? XML is tree based and diffing algorithm stend to line based, which makes for dissonance. More tools? Better XML diffing algorithms?

Jon Udell reckons Wilson skewered the objection, 'I Want to See My Programs As They Really Are', but I found the answer somewhat specious. I don't know anyone who wants to see disinterred programs in the manner suggested - they use debuggers and profilers for that kind of thing. What they do want to see is source code - because the code is the technical specification of the program. And the fact that the two (program and specification) are being conflated here makes me wonder - is that what happens when you spend too much time looking at abstract models of syntax?

As for XML representations - why not go further? Why use XML when you could use Lisp expressions - you'd have the added benefits of being able to manipulate the parse tree directly if you wanted - plus diffs would be sane. The argument for XML seems to be an ad populum one And yes, there are better (i.e. more succinct, and hence easier to process) ways to represent the semantics of programs than XML, but we believe that will turn out in practice to be irrelevant. XML can do the job, and is becoming universal; it is therefore difficult to imagine that anything else will be so compelling as to displace it. There's nothing inherently wrong with a worse is better thinking approach, but as an XML advocate, I'm a tad wary of XML world domination arguments :) If it came down to it, I think I'd rather have source in pyxie syntax than XML.

As for advances in tools, the most signficant mainstream advance I have seen is in IntelliJ IDEA, which treats the codebase as a tree of syntax trees rather than as a collection of flat files. That makes semantically sound transformations of code possible - allowing the immediate automation of refactorings and restructurings that once could have taken hours, or days - or simply not be done at all for fear of breaking something. Subversion does something similiar for version control when compared to CVS.

I would love to see tools developers eventually shift focus from code organisation to code runtime and offer semantically sound refactorings based on profiling and analysis of hotspots - debuggers tend to get all the attention, but I think profilers have more to offer. Anyway, in practice I tend to use multiple editors when working with source code, and my sense is that many others do too; which is one reason why I don't think standardizing on a single IDE is neccessarily a productivity win for a team.

The real issue with extensibility of this kind talked about in this paper is not so much the suggested ignorance of tools and techniques, but a lack of appreciation of how difficult it is to define extensible rules of evaluation and a supporting syntax. In most programming languages semantic extensions can only be achieved through new syntax, usually new operators - 'new' insofar that they are not defined in terms of existing language primitives. Eventually the language gets bogged down in its tokens or the semantic inconsistencies introduced by new evaluation rules for those tokens - saving the the language is exactly what kills it... until the another language is created to replace and we start over. This leads to cycles of reinvention. We're not so much building on the programming state of the art as continually have each generation of programmers rediscover it.

January 21, 2005 01:06 AM


Adam Rosien
(January 21, 2005 02:21 AM #)

See Subtext Demo for some ideas about one future of IDEs and the sorts of issues you mention in your post.

Greg Wilson
(January 21, 2005 01:20 PM #)

Hi Bill; thanks for your comments. I agree that everything I describe could be done with s-expressions, but after forty years, I think it's time we all acknowledged that most programmers aren't ever going to adopt them. XML is far from perfect, but it does support mixed content in a sane way, and would allow us to recycle existing representations (e.g. MathML). W.r.t. the comment in your third paragraph about "disinterred programs", I'm not sure I understand what you mean: as far as I'm concerned, drawing the MathML embedded in a program as math is no different than drawing the \t embedded in the program as indentation...


(December 7, 2005 10:25 PM #)

At first blush Subtext looks like Knuth's 'literate programming' CWEB program from the early Nineties.