Refactoring: Extract Method to Function

You have a method whose purpose is not cohesive with the containing class. 

Turn the method into a function that can later be moved to a collaborating object or standalone fragment.

For example suppose you had a method that created the rendering output for an API request on a user class -

def to_api_hash
  {
    id: id,
    email: email,
    name: name,
    avatar: get_avatar
  }
end

(this could as equally be a method that computed a value such as a bill payment or performed a validation). The first step is route the method to another "function method" on the class which takes the instance as an argument and has no references or state other than its inputs.   

def to_api_hash
  to_api_hash_for(self)
end
  
def to_api_hash_for(user)
  {
    id: user.id,
    email: user.email,
    name: user.name,
    avatar: user.get_avatar
  }
end

This protects all the class callers and allows you to test the new function in isolation. The next steps are to move the function to its intended destination and update callers away from the domain class. 

See also: Extract MethodMove Method and Replace Method with Method Object.

Discussion

It's common for younger codebases, especially web based systems, to refactor code out of controllers and views into model or domain classes in order to centralise and organize functionality such as computed business logic, output preparation or validations. As the codebase evolves over time the model can accumulate a large number of methods, resulting in what's sometimes called a "fat model", which then justifies moving code into collaborating objects or service objects. The latter can often be the case when the domain class has functionality that manipulates multiple domain objects such that there's no natural home in the domain model, as it represents an application usecase. 

The domain class may have many callers for these methods. This can be tricky in larger codebases that use generic names in the case of dynamic languages, or are using runtime reflection in statically typed languages - a generic name might be in common use across the codebase. In the case where callers are hard to detect, the option remains to leave the old class method in place as a delegate to allow time to determine the callsites using the old method.

Another reason to extract to functions is establish a 'clean core' where application logic can be more easily maintained and tested independently of framework or database storage concerns, eg as described by Bob Martin in his post 'Clean Architecture'.

Favourite Software Books: Part 1, Programming, Development and Computer Science

This is the first of three posts about my favourite software books with the emphasis on favourite - I'm not claiming these are the best books in their respective areas, and certainly not the best books for you, only that I've found them useful, and most importantly enjoyable to read. Every book here has had a lasting beneficial influence on me.

The post is broken into three loose sections - Programming, Development and Computer Science, and if you want skip wading through the descriptions, there's a Goodreads list here

 

Programming

Agile Software Development

There's a tension between larger scale software development and coding practice. Bob Martin's book does a fine job, probably the best job in print, of mixing high level software design with implementation execution. Worth it alone for a good utilization of inheritance, one the most abused techniques in object oriented programming. You'll also get tests, implementation detail, even non-asinine UML that might just get you pick up a pen and paper sometime. Most of what are now called the SOLID principles are detailed here, as such it's the best book written about those principles. For me ASD hits a sweetspot in Martin's work, just midway between the design precepts put down in the The C++ Report and his later thinking on testing and professional conduct

Programming Pearls

The open parts on problem characterization and getting to a solution are pure gold. Later on, the book has a chapter binary search showing how easy it is to get wrong, as an exercise in understanding both detail and correctness against input. It's no small irony that years later, the program in the book was discovered to have a bug for large integer values. This book as much as any other told me I'd probably like programming, the followup is pretty good too. Aside from that Bentley is a wonderful writer who makes the details fun.

Framework Design Guidelines

For such an important aspect of the industry there's very little material on developing software APIs. This one doesn't seem to get much traction outside the .NET world, I suspect in part due to antipathy towards MSFT - that's a great pity because this is a wonderful book on software API design. Aside from very clear technical thinking, what makes this book exception is the commentary from the framwork team around the tradeoffs they made.  Highly recommended, even if you're not on the .NET platform and I think very valuable if you're designing software APIs for other people to use.   

Structure and Interpretation of Computer Programs

SICP is arguably the best programming book ever written. It's 30 years since the first edition, but it remains relevant, remains essential and will continue to do so long into the future. The chapter on state as the introduction of time into programming is a masterpiece. I like to say that I'll never finish it, because I keep coming back to it over the years. If there's a 100 year book on this list, it's this one.

 

Development

By 'development' rather than 'programming' I mean books that have a stronger emphasis on organisation and approach towards commercial software.

 

Extreme Programming Explained (2d ed)

The first edition kickstarted a revolution in how software teams organised and approach themselves. The second edition dials it all back down to 10, truly reflects on what Beck learned between editions, and is much the better book for it. Essential reading for delivering software. 

Refactoring

Despite its age and focus on a specific technical practice, remains hugely relevant to this day. Not only has influenced how developers write code - refactoring is an expected part of your job - but it's also influened tooling - almost all IDEs now support some form of automated refactoring. The contributions by Kent Beck, especially 'putting it all together' are a welcome addition. It's extremely well written and unusually engaging for what's essentially an encyclpedia of techniques although I worry sometimes that like a classic novel, it's not as widely read as it is referenced. 

The Practice of Programming

A practical, intelligent book that teaches you how to think about software. The examples are C/Unix but the knowledge you'll gain from this book is highly applicable. For a period of time, very early in my career, The Practice of Programming, went everywhere with me - I can remember reading it on Saturdays meandering around the supermarket doing shopping. The chapter on debugging is probably the best you'll read anywhere. Kernighan's a brilliant writer and this generation will know Pike primararily from the Go language, which is to say, if you like Go, I think you'll like the philosophy and thinking in The Practice of Programming.

The Pragmatic Programmer

1999 was an extradinarily good year - Extreme Programming Explained, Refactoring, The Practice of Programming, and The Pragmatic Programmer were all published.  The Pragmatic Programmer is good to read when starting out, because it focuses on overall tradecraft and has plenty of immediately applicable advice - if nothing else you'll want to read because a lot of people you will work with have, and value its lessons greatly - basically it's on everyone's list. 

Clean Code

Despite the title I've found this more of a development process book than a programming one, and one which is better known than Martin's Agile Software Development. Where Clean Code becomes invaluable is how to orient yourself and get into the right frame of mind for professional software development. Bob Martin's got strong opinions, many of which are on the money and brilliantly articulated in this book. Before this was written, McConnell's Code Complete was the go to book and influenced me greatly and I'd still warmly recommend it, but today Clean Code is the one you should read first. It's probably the best book of its kind written in the last decade.

Working Effectively with Legacy Code

This is the best book I've read on getting to grips with and wrangling existing large codebases.  Nothing really prepares you coming out of education for production software. What's not always appreciated, perhaps because of the focus on new development or the never-ending software delivery crisis, is that most code in production is de-facto code in maintenance and it takes a duty of care to keep it healthy. If you missed this book, or were put off because of words like 'maintenance' or 'legacy', do please pick it up - it's really about developing, sustaining and improving mid to large size production codebases.

 

Computer Science

Artificial Intelligence, a Modern Approach (3rd ed)

I'm chancing my arm here by putting an AI text into the CS bucket, but this was a gateway book for me - any interest I had (or have) in general areas like algorithmics, grammars, computation came from AIMA, whether it was wanting to understand more about natural language parsing, the pseudo-code and algorithms, or a bit more of the math behind things like neural networks. I adore this book, it's still probably the most accessible introduction to AI and Machine Learning in print.

The Algorithm Design Manual (2nd ed)

I should probably recommend a more comprehensive text like Introduction to Algorithms or Algorithms and a long time ago I spent many hours pouring over Sedgewick's books. But Skienna is the one I tend to pull down even for casual read. There are lot of excellent algorithm texts but I wish more books in this aread were goal driven like Skienna's. It has that special property that very few CS-oriented books manage to achieve, which is to make the reader feel at ease and somewhat smarter for reading it. 

Concepts, Techniques, and Models of Computer Programming

A personal favourite due to the way it provides an overview of programming models and paradigms in a way that doesn't kill you with formalisms. There are chapters on object orientation, message passing, the relational model and prolog, constraint based programming - it even has a chapter on UI programming. The authors use a technique they call the kernel language approach to build up programming styles from a base and focus on underlying models rather than any particular language (making it different from a number of books on this list).

Programming Language Pragmatics

Prior to this book I would have had the dragon book as a favourite, but PLP is for me at least, better written and more accessible. It's a good companion book to CLTM above due to its focus on the mechanics of language implementation, overview of compilers and that way it dips into specific languages. It can be heavy going in places, but it's not a math textbook about automata. There's a new edition coming out at the end of 2015. 

Types and Programming Languages

I wasn't sure about putting this on a list. I'll admit to finding parts of it hard going; I've bounced off it more than once. But it's here because typed functional programming is poised to become a dominant software paradigm much the way object orientation and dynamic programming are and have been. I can't see a future where typed FP doesn't go mainstream, this is as good a foundational book on the paradigm as any. My older self regrets that my younger self didn't work harder, much harder, to read and understand this aspect of computing :)  

 


 

Next Up

Part 2: Systems, Networks and Architecture.

Part 3: Languages, Text, and Artificial Intelligence. Although, I wanted to throw in a few now that I've found great despite being oriented around a single language (plus I'm covered a bit if I don't get round to posting the next posts soon enough!).

The C Programming Language

Crisp, well written classic. There probably isn't a superfluous word in this book. C is less widely used in my domain (web and server systems) that it used to be, and generally I've had to read C rather than write C (that's probably a good thing, I wish I was better). But this book remains valuable over the years, because it provides clear explanation of lower level programming. Even if you never had to write a line of C in your life, it's still a great book to have read.

The Little Schemer

This (along with another in the series, A Little Java, a few Patterns) is one of the first books I read that made me appreciate programming and code for its own sake and as a medium, rather than a means to an end. There really isn't anything else like it, the book is organised completely around question and answer dialog and takes a succession of tiny steps, gradually boiling you in functional and recursive programming techniques. If it sounds good, get it and a grab a peanut butter sandwich, you're in for a treat. 

Functional Programming in Scala

Many of the books here are old, a decade or more. I had thought for some time that perhaps good books were dying out - there seemed to be fewer and fewer memorable books each year - a lot of the high quality material has moved online often in short form as posts and papers. Then this one turned up. Functional Programming in Scala is the best programming book I've read in years. Even if Scala is not your thing, it's worth a read - it uses Scala to teach FP much the same way Refactoring uses Java or SICP uses Scheme. 

 

 

On Scala

This is one of a series of posts on languages, you can read more about that here.

I've been messing around with Scala for a while. For no particular reason I let it slide but have come back to it with increased interest in the last few years. Scala's blending of programming paradigms in a single language is impressive. It has the most powerful type system of the 'mainstream' JVM languages [1]. It's fun to write code in and is a good language for leaning into the functional plus static typing paradigm. As much as I like the language, I usually qualify recommending it outright, due to some noticeable externalities and the very need to adopt new paradigms. And so, Scala is the language I find myself most conflicted over - the part of me that likes programming loves it, the part of me that goes oncall wants to see more work done on the engineering. 

What's to like? 

Herein a drive by of some parts of the language that I like and have found useful. For a broader overview a good place to start is scala-lang.org's Learning Scala page material, which also has an excellent overview of features.

She's gone rogue, captain! Have to take her out! 

Scala has a repl. Every language should have a repl -

dehora:~[macbook]$ scala
Welcome to Scala version 2.9.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_09).
Type in expressions to have them evaluated.
Type :help for more information.
scala> println("hello, world!")
hello, world!

The repl enables method discovery (hit tab after the dot operator).

Like Go (and Groovy) Scala makes semi-colons optional. Also like Go, declaration order is reversed compared to C/C++/Java. The type comes after the variable/method name allowing you to say "a of type int", "foo returning int". You can define something as a val (cannot be modified), a var (can be modified) or a def (a method/function) - val/var declarations in classes result in getters and setters being implicitly created (the private[this] variant can be used to force direct field access). Using vals is preferred for thread safety and comprehension reasons. Scala supports type inference, which removes a lot of boilerplate relative to Java. Strings in Scala are immutable and can be multilined using triple quotes and support interpolation using a $ notation (as of 2.10). Scala strings can be enhanced with StringOps, which provides a raft of extra methods compared to java.lang.String.

In Scala each value is an object. The root class is Any. Scala doesn't have predefined primitives as per C++ and Java; these are modelled as a sub-class of Any called AnyVal, everything else is a sub-class called AnyRef, which maps to Java's Object class.  Because numbers are objects their arithmetic operations are methods - 1 + 2 is the same as 1+(2) (but not 1.+(2) which will give a you a double instead of an int ;). As a result, the keywords 'true' and 'false' are objects - true.equals(false) is a legal statement. Technically because we're dealing with methods this is not operator overloading, but the term is often used. Since Scala allows symbolic methods for binary/unary operations, you are free to define something like def +(that:Thing):Thing for your class.  You are also free (with some restrictions) to define methods with unicode and ascii symbols -

def ||(dowop:ThatThing):ThatThing = ...
def ψ(dowop:ThatThing):ThatThing = ...

And so it goes - for example, list operators such as '::' (append) and ':::' (prepend) are just methods. Scala can be understood as strongly object oriented when compared to Python or Java and notably gets two things right in its approach - don't have primitives, and, preserve precedence rules. 

Classes and construction in Scala are noticeably different to Java, but also better in more or less every respect.  Classes in Scala have a different syntax to Java, thanks to the constructor signature being part of the class definition -

class Song(songName: String, songLength:Double, songArtist:String) {
  private val name = songName 
  var length = songLength 
  val artist = songArtist 
}

Constructor arguments can be assigned, and are also visible through the class. A var field can be re-assigned, vals cannot, protected fields are available to sub classes. Private members are not accessible outside the class with the exception of a companion object (more on those soon). Default and named arguments are available on constructors and methods. 

Scala also has objects, which are singleton instances rather than a template declaration; they are also where main() methods are declared. Objects with the same name as a classes and placed the same file are called companion objects. It's common enough to see such object and class pairs in Scala. Companions can access each other's private methods and fields. Companion objects enable the use of the apply() method, which pretty much eliminate the factory and constructor chaining gymnastics seen in Java. An apply call is dispatched to via construction - roughly, new Song(name) goes to Song.apply(name). There's also unapply(), which enable extractor objects that work with pattern matching - 'case Song(f)' will dispatch to Song.unapply(f), which will return whether the argument is matched. Companion objects are a clean way to package up declaration and creation concerns.

Scala's case classes remove the need to declare fields, accessors, copy contructors, equals/hasCode pairs. Case classes are about as concise as Groovy POGOs, with the advantages of not maintaining type safety and providing private scope for fields  - they're a huge win compared to JavaBeans or POJOs.

case class Song(name: String, length:Double, artist:String)
val s1 = Song("Align", 3.56, "Mowgli") 
s1.name
res72: String = Align

val s2 = s1.copy(length=3.57)
s2.length
res73: Double = 3.57

s2.equals(s1)
res75: Boolean = false

val s3 = s1.copy()
s3: Song = Song(Align,3.56,Mowgli)
s3 == s1
res77: Boolean = true

s3.hashCode
res78: Int = -882114443

Case classes integrate well with pattern matching (more below) and can be treated as functions (note that s1 above didn't require a 'new').

Scala traits allow multiple inheritance of implementations using a well defined ordering. Traits also support default implementations unlike C++ virtual voids or Java interfaces, and which are similiar in style to static methods in Java (strictly speaking Scala doesn't have statics). In conjunction with the sealed modifier you can define a 'sealed trait' and case class subtypes as a neat way to declare abstract data types such as trees and grammars, and which can be dispatched over using pattern matching. While I would be happy to see inheritance removed from the language, Scala traits are a practical way to avoid the diamond problem

At least it's an ethos

Scala's Option allows you to wrap a value as being either Some or None. An Option[T] is a container for a value of type T. If the value is not null it's a Some, otherwise it's a None. This pushes null handling into the type system making it a compile time concern - 

val a = Some("a")
a: Option[String] = Some(a)
 
val nope = None 
nope: Option[String] = None

Using Option construction delegates None/Some selection, giving you None for a null or a Some for a non-null -

 val whoKnows = Option(null)
 whoKnows: Option[String] = None

 val mightCouldBe = Option("aliens looking fer gold")
 mightCouldBe: Option[String] = Some(aliens looking fer gold)

 println(whoKnows.getOrElse("Nope"))
 Nope

Options are one of the things I like most in Scala. For many of us, null is simply reality, and it can be hard to imagine life without it, but if you can appreciate why returning an empty list is better than returning a null, or why null shouldn't ever be mapped to the domain, I think you'll come to like Options. Options are handy for expressing the three valued logic (there, not there, empty) that is common when dealing with maps and caches. Code working with data stores may gain robustiness benefits using Some/None in conjunction with pattern matching or higher functions. Options can be treated as collections, so foreach is available against them and collection functions like filter deal with them. Subjectively I prefer Scala's Option to Groovy's elvis and Clojure's nil largely because you can drive Option into API signatures. Option is also a gateway concept for monads.

Scala pattern matching allows you dispatch against a structure or the type of objects instead of reasoning about an instance. Pattern matching in Scala  is powerful and shouldn't be compared to case/switch blocks in Java/C. You can match a populated list using a 'head :: tail' pattern; match against regexes; match against or an XPath-like option if you are working with XML, and so on. Pattern matching also helps with expressing business logic, such as doing something with an Account when it has a certain age or is from a certain location. Case classes are designed with pattern matching in mind (if you are familiar with and like Linda tuplespace style matching, chances are you'll like pattern matching over case classes).  You can get compiler warnings on non-exhaustive matches in conjunction with the sealed modifier, helping to address problems with switch block fall-through and visitor pattern code (incidently, Yaron Minksy has a great talk that touches on this subject in the context of OCaml). I'm not entirely sure, but as far as I can tell pattern matching eliminates the visitor pattern and so the need for replace conditional with visitor/polymorphism workarounds. That's nice because that helps decouple structure from processing, avoiding some of the gnarlier maintenance issues with orthodox OO.

If pattern matching is a bit low level for handling input, you can try parser combinators. You define an object (or a set of classes) that is the 'parser combinator' (Scala provides some helpers you can sub-class), and each method in the object becomes a parser that returns Success, Failure or Error. You then call parseAll with the root parser and the input. Here's the stupidest example I could think of -

import scala.util.parsing.combinator.RegexParsers

object EpicNameParser extends RegexParsers {
  val name: Parser[String] = """[a-zA-Z]*""".r
 
// bind object to a handy val
val enp = EpicNameParser
enp.parseAll(enp.name, "Donny")
res26: enp.ParseResult[String] = [1.6] parsed: Donny

enp.parseAll(enp.name, "Walter")
res27: enp.ParseResult[String] = [1.7] parsed: Walter

enp.parseAll(enp.name, "The Dude")
res28: enp.ParseResult[String] = 
[1.5] failure: string matching regex `\z' expected but `D' found
The Dude
    ^

This probably won't be the fastest way to do things performance-wise, but it's concise. Pattern matching opens up possibilties for dealing with conditionals and clean code; it's also the basis for actor receive loops.

Strange enlightenments are vouchsafed to those who seek the higher places

Scala has a rich set of collections. As well as Lists, Vectors, Sets, Maps and the generic Sequence (Seq), Scala has a Tuple that can take mixed types. The classes Pair and Triple are aliases for arity 2 and arity 3 tuples. Scala has both mutable and immutable collections that are collection analog to var and val fields.  The community idiom seems to be to prefer immutable collections until you need to drop into mutability or j.u.c for performance reasons. Arrays in Scala map to Java arrays, but can be generic, and can also be treated as a Sequence.

As you might expect from a functional language, working with lists, maps and collections is nice. Scala's for loops (for comprehensions) work like Python's list comprehensions but have a syntax that allows more expressiveness. I like forcomps, but it's possible you'll move towards map/flatmap/foreach/filter methods that are available on collections as idiomatic for simpler loops (for is implemented using these methods). The foreach method accepts closures (similar to each/do in Ruby), and also functions.  You can apply functions over members using map(), eg 'Seq(1, 2).map({ n=> "num:"+n })'. The flatMap method is similar but instead, accepts a function that returns a sequence for each element it finds, and then flattens all those sequences' members into a single list. This interacts nicely with Options, since an Option is a sequence -

def stoi(s: String): Option[Int] = {
  try {
    Some(Integer.parseInt(s.trim))
  } catch {
    case _  => None
  }
}

Once you have a function you can apply it to collection using map(), using  the syntax '{ element => function(n)}' -

Seq("1", "2", "foo").map({n => stoi(n)})
res34: Seq[Option[Int]] = List(Some(1), Some(2), None) 

As we can see, map() produces 'List(Some(1), Some(2), None)', ie, a list of lists. The flatMap() method on the other hand collapses the inner lists and removes the members that are None (since None's length is 0) leaving just the Some members (since Some's length is 1) -

Seq("1", "2", "foo").flatMap({n => stoi(n)})
res37: Seq[Int] = List(1, 2)

Collections also provide a filter() method that chains nicely with map() - easy composition of functions is a feature of Scala. For gathering up results you get fold and reduce (foldLeft and foldRight, reduceLeft, reduceRight). For example you can use foldLeft to trap the largest int in a list or sum a list -

List(5, 0, 9, 6).foldLeft(-1)({ (i, j) => i.max(j) })
res24: Int = 9
List(5, 0, 9, 6).foldLeft(0)({ (i, j) => i + j })
res48: Int = 20

Although reduceLeft might be a more direct way to go about both -

List(5, 0, 9, 6).reduceLeft({ (i, j) => i.max(j) })
res38: Int = 9
List(5, 0, 9, 6).reduceLeft({ (i, j) => i + j })
res49: Int = 20

Reduces are something of a special case of folds. Folds can gather up elements and may return a different type whereas reduces return a single value of the list type. For example, you can't say this - 

List(5, 0, 9, 6).reduceLeft({ (i, j) => "" +  (i + j) })

but you can say this - 

List(5, 0, 9, 6).foldLeft("")({ (i, j) => "" +  (i + j) })
longList.foldLeft("")({ (i, j) => "" +  (i + j) })
longList.foldRight("")({ (i, j) => "" +  (i + j) })
res51: java.lang.String = 5096

Methods like fold/reduce generally handle nulls without blowing up - 

val x:String = null;
List("a", x).reduceLeft({ (i, j) => "" +  (i + j) })
res60: String = anull

Monads are coming

Functional programming is unavoidable in Scala. Scala is a functional language in the sense that you can program with functions and not just class methods, and functions are treated like values and so can be assigned and passed around as arguments. This is where Scala stops being a better Java, in the sense Groovy is, and becomes its own language.  Functions in Scala can be declared using val - 

scala> val timestwo = (i:Int) => i * 2

and can take variables from scope -

scala> val ten = 10
ten: Int = 10
scala> val mul = (i:Int) => i * ten
scala> mul(3)
res2: Int = 30

You can also create functions that accept other functions and/or return functions (aka higher order functions) -

def theSquareMullet(i:Int) = i * i
def theHoffSquareMullet({ mullet: => Int }) = mullet * mullet

The expression 'mullet: => Int' represents any function (or value) that returns an Int. All of these are valid -

scala> theSquareMullet(10)
res7: Int = 100

scala> theSquareMullet(theHoffSquareMullet(10))
res8: Int = 100

scala> theHoffSquareMullet(theHoffSquareMullet(10))
res9: Int = 10000

Functions can be anonymous, you can see some examples in the section on collections above.  Functions can be nested and multiple values can be returned from a function/method -  

def theSquareAndTheMullet(i:Int) = (i * i, i)
val (f, f1) = theSquareAndTheMullet(2)
f: Int = 4
f1: Int = 2

val bolth = theSquareAndTheMullet(2)
bolth: (Int, Int) = (4,2)

scala> bolth._1
res9: Int = 4

bolth._2
res10: Int = 2

The slight downside being the syntax for unpacking a discrete val isn't that pleasing (and are default limited to size 22 if you're going to return the world). On the upside accessing a tuple out of bounds is a compile time error and a tuple preserve each member's type.

Endofunctor's Game

The type system is powerful. Like functional progrmming support, the type system marks Scala its own language instead of a better Java. Scala has generics, like Java, and supports type variance, +T (covariant), -T (contravariant), :> (sub type, or Java 'extends'), <: (super type, or Java 'super'). Classes can be parameterised like Java, but unlike Java so can types via what's called a type alias. These allow you genericise members independently of the class signature -

abstract class Holder {
  type T // alias
  val held: T


def IntHolder(i: Int): Holder = {
    new Holder {
         type T = Int
         val held = i
       }
}

One nice feature is the compound type, which allows you to declare for example, that a method argument is composed of multiple types - 

def allTheThings(thing:ThatThing with ThatOtherThing):ThatThing = {...}

I can't do justice to Scala's type system in a single post. I doubt I know enough to appreciate what it can do - diving into it is very much a red pill thing. What I will say is the type system is Scala is deeply impressive in what can be achieved. The type system allows things that are either impractical or impossible in Java  and gives good ways to scrap your boilerplate.  

What's not to like

Onto some things that bother me about Scala.

Here are eighteen ways to say, roughly, add one to each member of a list creating a new list, remove any members in the new list that aren't greater than one creating another new list, and show that final list as a string -

 List(0, 1) map { _ + 1 } filter { _ > 1 } toString
 List(0, 1) map { _ + 1 } filter { _ > 1 } toString()   
 List(0, 1) map ( _ + 1) filter (_ > 1) toString()
 List(0, 1) map ( _ + 1) filter (_ > 1) toString 
 List(0, 1) map ({ _ + 1 }) filter ({ _ > 1 }) toString 
 List(0, 1) map ({ _ + 1 }) filter ({ _ > 1 }) toString()  
 List(0, 1) map ({ n => n + 1 }) filter ({n => n > 1 }) toString() 
 List(0, 1) map ({ n => n + 1 }) filter ({n => n > 1 }) toString 
 List(0, 1) map (n => n + 1) filter (n => n > 1) toString()
 List(0, 1) map (n => n + 1) filter (n => n > 1) toString 
 List(0, 1).map( _ + 1 ).filter( _ > 1 ).toString() 
 List(0, 1).map( _ + 1 ).filter( _ > 1 ).toString 
 List(0, 1).map({ _ + 1 }).filter({ _ > 1 }).toString()   
 List(0, 1).map({ _ + 1 }).filter({ _ > 1 }).toString 
 List(0, 1).map({ n => n + 1 }).filter({n => n > 1 }).toString()
 List(0, 1).map({ n => n + 1 }).filter({n => n > 1 }).toString  
 List(0, 1).map(n => n + 1).filter(n => n > 1).toString()
 List(0, 1).map(n => n + 1).filter(n => n > 1).toString  

Scala provides syntactic flexibility. This flexibility is not limited to methods. Two ways to express for-comprehensions; three collections libraries (four if you count Java's); methods foo and foo() are not the same; XML assignment demands whitespace around '='; careful placement of suffix invocations such that they are not treated as infix.  Closure captures can be tricky. In Scala '_' means everything and anything - there's no real way to avoid it in practice (imports, existential types, exceptions and closures).  In particular, implicit conversions strike me as tricky to reason about. Scala generics have other bounds, 'view' and 'context' that allow you work with types beyond traditional hierarchies, where implicit conversions are made available. 

Symbolic methods on top of the features above provide the base capability for DSLs. DSL support in a language is not an obvious win to my mind.  Even a relatively restrained library like Gatling offers enough variation to catch you out and a symbol heavy project like SBT can be hard work. Scala's not unique in this - I have a similar beef with Ruby and Groovy.

Maintainable, comprehensible Scala requires restraint, in the form of disciplined practice and style guides. That you don't have to use all the features isn't sufficient to wave the problem away. If you are maintaining code, you don't get to ignore all the variation teams have added over time (I do not see the concept of Scala levels helping much). Dependency on open source amplifies the matter. Scala has a simple core with orthogonal concepts (the language spec is pretty readable as these things go). However that says nothing about the interaction of features, which is where complexity and comprehension resides. I'm not trying to conflate understanding of functional and type-centric constructs with complexity - simple things can be hard, but that doesn't make them complex.  

As a systems grunt, I spend a lot of time with running code, sifting through stacktraces, heapdumps, bytecode, metrics, logs and other runtime detritus. You learn from bitter experience that correctness capabilities provided by a type system do not always trump having comprehensible runtime behaviour (the same can be said for the outcomes enabled by dynamic languages). Scala generates plenty of classes; as a result Scala stacktraces can be a bit of a bear, although better than Groovy or Clojure's. An artifact of Scala's integration with Java is that you can't catch checked exceptions, you catch all and extract the type with a case statement. Compilation is slow and becomes dead slow with the optimise flag. The expressivity of the language results in an abstraction gap when it comes to reasoning about runtime cost. Option, a great feature, incurs the cost of wrapper classes. Scala introduces a fair bit of boxing overhead. For comprehensions have overhead such that you may need to swap with a while/do loop. Closures naturally have overhead (at least until the JVM can optimise them). To ensure unboxed class fields, you need to qualify them with [this]. Immutable collections may need to be replaced with mutable or j.u.c, to deal with memory overhead, or performance (sometimes there are wins, like Scala's HashSet). The foldRight method will happily blow up when foldLeft wouldn't. Some serious runtime bugs get kicked down the road.  That said, you can build high performance systems in Scala. 

My least favourite aspect of Scala is probably backwards compatability. Scala has broken compatability three, maybe four times, in five years. This does not seem to be considered a problem by the community. I understand how this frees things up for the future, but the effort needed to bring all the dependencies forward yourself or wait until the maintainers rebuild is an area where the language does not strike me as remotely scalable. Cross-compilation support is available in in SBT, but is something of a workaround.

Conclusion

If a grand unified theory of programming language existed, its implementation would be called Scala. But enough of the modh coinníollach. Scala is neither a better Java/C# nor a worse Haskell/OCaml; it is its own language, and the most advanced on the JVM today. How much you are willing to invest in and work with functional and type-centric constructs will have a bearing on how much value you can get out of the language. And I find I need to weigh up three externalities - runtime comprehension, maintaining a clear reader-comes-first codebase, and dealing with language/binary incompatibility.

I feel it's inevitable that functional programming plus Hindley/Milner/Damas flavor static typing will become a dominant paradigm. If that's right, what are today considered exotic constructs like closures, function passing, higher order functions, pattern matching and sophisticated type systems are going to be mainstream. It also means the more mathematical abstractions, such as monads and functors will be common understanding, certainly by the 2020s (they are closer than you think!). As such spending time with Scala (and Rust/Haskell/OCaml) is a worthwhile investment. There will still be places for dynamically typed and close to the metal languages, but orthodox OO as practised today, is a dead end. Scala is this paradigm's breakout language and the one mostly likely to drag us half-way to Haskell.

Complexity is a criticism that is frequently levelled at Scala, one which gets pushback from the community. It's a sore spot, as there is no shortage of FUD. My take is that Scala's complexity is typically justified for the broad set of applications and paradigms it can encompass.  

Reading 

Scala has a relatively small number of books. Unfortunately since the language is moving quickly a number of them are out of date, as good as they may be. 

  • Scala for the Impatient, Cay Horstmann. This is probably the closest thing to an up to date text; as of mid-2013 it's the go to introductory book on the language.  Written by a professional writer who's covered many languages in previously, and it shows.
  • Scala in Depth, Josh Sureth. Targetted at people familiar with Scala but looking for idiom. It covers Scala 2.9 and by definition is out of date, however a lot of available open source is still targetting 2.9, so it remains relevant at least up to 2014. 
  • Programming In Scala 2nd Edition, Martin Odersky, Lex Spoon, Bill Venners. This has been the standard reference for the last number of years and a very good book. It's out of date by virtue of covering Scala 2.8, but worth picking up to understand the language's core, especially if you see it discounted.  
  • Programming Scala, Alex Payne, Dean Wampler. Part textbook, part evangelism, I credit this book with getting me interested in Scala again. Sadly, it's too out of date to recommend by way of targetting 2.7 and giving mention of 2.8 features, which hadn't been released at the time.
  • Akka Concurrency, Derek Wyatt. Not strictly a Scala book, but a fun read on programming with Akka, actors, and for concurrency in general with Scala. It targets Akka/Scala 2.10.
  • Functional Programming in Scala, Paul Chiusano, Rúnar Bjarnason. Not out untill Sept 2013. Teaches functional programming using Scala, as opposed to being a book about Scala. I have high hopes for this one, there's a gap in the market for a pragmatic book on functional programming.

Scala's flexibility is such that idiomatic Scala isn't always obvious. I've found these two guides helpful -

 


 

[1] Java, Groovy, Jython, JRuby, Clojure. 

On Go

This is one of a series of posts on languages, you can read more about that here.

Herein a grab bag of observations on the Go programming language, positive and negative. First, a large caveat - I have no production experience with Go. That said I'm impressed. It's in extremely good shape for a version 1. To my mind it bucks some orthodoxy on what a good effective language is supposed to look like; for that alone, it's interesting. It makes sensible engineering decisions and is squarely in the category of languages I consider viable for server-side systems that you have to live with over time.  

You can find more detail and proper introductory material at http://golang.org/doc. And this is my first observation - the language has great documentation.

What's to like?  

I’m gonna go get hammered with Papyrus

Syntax matters. In all languages I've used and with almost all teams, there's been a need to agree syntax and formatting norms. And so, somebody, invariably, prudently, fruitfully, gets tasked with writing up the, to be specific, company style guide. Go eschews this by supplying gofmt, which defines mechanically how Go code should be formatted. I've found little to no value in formatting variability with other languages - even Python which constrains layout more than most doesn't goes far enough to lock down formatting. It's tiring moving from codebase to codebase and adjusting to each project's idiom - and open source now means much of the code you're working with isn't written to your guidelines. Another benefit of gofmt is that it makes syntax transformations easier - for example the gofix tool is predicated on gofmt. This isn't as powerful as a type driven refactoring but is nonetheless useful. So while I gather it's somewhat controversial, I like the decision to put constraints on formatting. 

yak

As is the way with recent idiom in languages, semi-colons are gone, no suprise there. Interestingly, there's no while or do, just for, something I like so far. If statements don't have parens, which took me a while to get used to, but have come to make visual sense. If statements on the other hand, don't have optional bracing, which I really like. A piece of me dies every time someone removes the carefully crafted if braces I put around one-liners in Java. Go doesn't have assertions - having seen too much sloppy code around assertions, specifically handling the failure itself I think this is a good decision. The import syntax uses strings (like Ruby's require statement), something I'm not crazy about; but what's more interesting is that these are essentially paths, not logical namespaces.

Declaration order reverses the usual C/C++/Java way of saying things.  This is similar to Scala and is something I like because it's easier when speaking the code out - saying 'a of type Int' is less clumsy than 'the Int named a' - the latter sounds like a recording artist wriggling out of a contract. Go has simple type derivation when using the ':=' declaration and assignment operator, albeit somewhat less powerful than again say, Scala inference.

Syntactically the language reminds me of JavaScript and Groovy, but feels somewhat different to either. The end result is a comprehensible language. Go has minimal variation and no direct support in the language for DSL-like expression. The hardest part I've found to be reasoning with pointers, but even those get manageable. What you see is more or less what you get - if you value readability over self-expression (and not everyone does, but I do), this is a big win.

You had just one job

There are no exceptions in Go and no control structures at all for error handling. I find this a good decision. I've believed for a long time that checked exceptions are flawed. Thankfully this isn't controversial anymore and all's right with the world. Runtime exceptions are not much better. Maybe try/finally is ok, but as we'll see, Go has a more concise way to express code that should execute regardless. 

So you handle errors yourself. Go has an inbuilt interface called error - 

type error interface {
    Error() string
}

and allows a function to return it along with the intended return type, which has a surface similarity with multiple return values in Groovy/Python. Now, I can see this kind of in place error handling driving people insane -

if f, err := os.Open(filename); err != nil {
 return err
}

but I prefer it to try/catch/finally, believing that a) failures and errors are commonplace, b) what to do about them is often contextual such that distributing occurrence and action is rarely the right default. Pending some improved alternate approach or a real re-think on exceptions, it's better not to have the feature. 

Because there are no exceptions there is no finally construct - consequently you're subject to bugs around resource handling and cleanup. Instead there is the 'defer' keyword, that ensures an expression is called after the function exits, providing an option to release resources with some clear evaluation rules. 

Go has both functions and methods and they can be assigned to variables to provide closures. Functions are just like top level def methods in Python or Scala and can used directly without an enclosing class structure. Methods are defined in terms of a method receiver. A receiver is instance of a named type or a struct (among others). For example the sum function below can be used against integers whenever it is imported -

package main
import "fmt"
func sum(i int, j int) (int, error) {
  return i + j, nil
}
 
func main() {
    s, err := sum(5,10)
    if err != nil {
     fmt.Println("fail: " + err.Error())
    } 
    fmt.Println(s)
}

Alternatively, to make a method called 'plus' the receiving type, is stated directly after 'func' keyword -  

package main
import "fmt"
type EmojiInt int
func (i EmojiInt) plus(j EmojiInt) (EmojiInt, error) {
  return i + j, nil
}
 
func main() {
    var k EmojiInt = 5
    s, err := k.plus(10)
    if err != nil {
     fmt.Println("fail: " + err.Error())
    } 
    fmt.Println(s)
}

This composition drive approach leads us into one the most interesting areas of the language, that's worth its own paragraph.

Go does not have inheritance or classes. 

If it helps at all, here's a surprised cat picture -

While OO isn't clearly defined, languages like Java and C# are to a large extent predicated on classes and inheriteace. Go simply disposes of them. I'd need more time working with the language to be sure, but right now, this looks like a great decision. Controversial, but great. You can still define an interface -

type ActorWithoutACause interface {
    receive(msg Message) error
}

and via a structural typing construct, anything that implements the signatures is considered to implement the interface. The primary value of this isn't so much pleasing the ducktyping crowd (Python/Ruby developers should be reasonably ok with this) and support of composition, but avoiding the premature object hierarchies typical in C++ and Java. In my experience changing an object hierarchy is heavyweight, but it requires effort to avoid creating one early. These days I'm reluctant to even define Abstract/Base (implementation inheritance) types - I'll use a DI library to pass in a something that provides the behaviour. I'd go as far as saying I'd prefer duplicated code in early phase development to establishing a hierarchy (like I said it requires effort). Go lets me dodge this problem by providing functions that can be imported but no way to build up a class hierarachy.

This ain't chemistry, this is art.

 

Dependency management seems to be a strong focus of the language. Dependencies are compiled in, there's no dynamic linking, and Go does not allow cyclic dependencies. Consequently build times in Go are fast. You can't compile against an unused dependency - this won't compile -

package main
import "fmt"
import "time"
 
func main() {
    fmt.Println("imma let u finish.")
}
prog.go:3: imported and not used: "time 

which may seem pedantic but scales up well when reading existing code - all imports have purpose.  You can use the dependency model to fetch remote repositories, eg via git. I have more to say on that when it comes to externalities.

I mentioned that Go builds are fast. That's an understatement. They're lightning fast, fast enough to let Go be used for scripting. 

Package visibility is performed via capitalization of the function name. Thus Foo is public, foo is private.  I'll take it over private/public/protected boilerplate. I  would have gone for Python's _foo idiom myself, but that's ok, it's obvious what's what when reading code. 

Go doesn't have an implicit 'self/this' concept, which is great for avoiding scoping headaches a la Python, as well as silly interview questions. When names are imported, they are prefixed such that imported names are qualified, all unbound names are in the package scope -  

package main
import "fmt"
import "time"
 
func main() {
    time.Sleep(30)
    fmt.Println("imma let u finish.")
}

Note how I still have to qualify Sleep and Println with their time and fmt packages. I love this - it's one of my favorite hygenic properties in the language. If you dislike static imports in Java as much as I do and the consequent clicking through in the IDE to see where the hell a name came from, you may also like what Go does here.

Go allows pointers. For the variable 'v', '&v' holds the address of v rather than its value.

 

package main
import "fmt"
func main() {
 v := 1; 
 vptr := &v
 fmt.Println(v)
 fmt.Println(vptr)
 fmt.Println(*vptr)
 fmt.Println(*&v)
}
1
0xc010000000
1
1

 

 

which enables by reference and by value approaches - for some usecases it's useful to avoid a data copy (technically, passing a pointer creates a copy of the pointer, so ultimately its all pass by value). Thankfully there's no pointer math, so JVM types like myself don't need to freak out on seeing the '*' and '&' symbols (and in case you're wondering arrays are bounds checked).

Go has two construction keywords for types - new and make. The new keyword allocates memory but doesn't initialise and returns a pointer. The make keyword performs allocation and initialization and returns the type directly.

There aren't any numeric coercions, so you can't do dodgy math over different scalar types.  This isn't really a feature because languages that allow easy coercions like this are broken. Still, given Go's design roots in C and C++, I'm happy to see that particular brokeness wasn't brought forward. It still won't stop anyone using int32 to describe money however.  

You have just saved yourself from a fate worse than the frying pan

The Go concurrency model prefers CSP to shared memory. There are synchronization controls available in the sync and atomic packages (eg CAS, Mutex) but they don't seem to be the focus of the language. In terms of mechanism, Go uses channels and goroutines, rather than Erlang-style actors. With actor models the process receiver gets a name (in Erlang for example, via a pid/bif), whereas with channels the channel is instead the thing named. Channels are are sort of typed queue, buffered or unbuffered, assigned to a variable. You can create a channel and use goroutines to produce/consume over it. To get a sense of how that looks, here's a noddy example - 

package main
import "fmt"
import "time"
 
func emit(c chan int) {
    for i := 0; i<100; i++ {
        c <- i
        fmt.Println("emit: ", i)
    }
}
 
func rcv(c chan int) {
   for {
        s := <-c 
        fmt.Println("rcvd: ", s)
    }
}
 
func main() {
    c := make(chan int, 100)
    go emit(c)
    go rcv(c)
    time.Sleep(30)
    fmt.Println("imma out")
}

Channels are pretty cool. They provide a foundation for building things like server dispatch code and event loops. I could even imagine building a UI with them.  

As you can see above it's easy enough to use goroutines - call a function with 'go' and you're done. Once they're created goroutines communicate via channels, which is how the CSP style is achieved. That said you can use shared state such as mutexes but the flavour of the language is to work via channels. Goroutines are 'lightweight' in the Erlang sense of lightweight rather than Java's native threads. Being 'kinda green' they are multiplexed over OS threads. Go doesn't parallelize over multiple cores by default as far as I can tell; like Erlang it has to be configured to do so.  It is possible to allot more native threads to leverage the cores via the runtime.gomaxprocs global, which says 'this call will go away when the scheduler improves'; it will interesting to see what happens in future releases. Go's default is closer to Node.js with the caveat it can be made multicore whereas Node can only run single threaded. Otherwise, the approach to scaling out seems to be to use rpc to dispatch across multiple go processes for now.  As best as I can tell a blocking goroutine won't block others as the other goroutines can be shunted onto another native thread, and it seems that blocking syscalls are performed by spawning an additional thread so the same number of threads are left to run goroutines, but my knowledge of the implementation is insubstantial, so I might have the details wrong. 

Externalities

Onto some things that bother me about Go. 

Channels are not going to be as powerful as Actors used in conjunction with pattern matching and/or a stronger type system. Mixing channel access with switch blocks seems possible if you want to emulate an actor style receive model, but it'll be lacking in comparison Erlang and Scala/Akka. That said, channels seem more than competitive in terms of the concurrency-sanity-pick-one tradeoff when compared to thread based synchronization. I can't imagine wanting to drop into threads after using channels.

The type system in Go is antiquated. If you're bought into modern, statically typed languages such as Haskell, Scala OCaml and Rust and value what they give you in terms of program correctness, expressiveness, and boilerplate elimination, Go is going to seem like a step backwards. It is probably not for you. I'm sympathetic to this viewpoint, especially where efforts are made to match static typing with coherent runtime behaviour and diagnostics, not just correctness qualities. On the other hand if you live in the world of oncall, distributed partial failures, traces, gc pauses, machine detail, and large codebases that come with their very own Game of Code social dynamics, modern static typing and its correctness assurances don't help much. Perhaps the worst aspect with Go is that you are still subject to null pointers via nil; trapping errors helps but not as much as a system that did its best to design null out, such as Rust

Non-declaration of interface implementation feels in a kind of chewy yet insubstantial way, like a feature that isn't going to scale up. I imagine this will be worked around with IDE tooling providing implements up/down arrows or hierachy browsers. Easier composability via structural typing is possibly a counter-argument to this if the types in the system interact with less complexity than sub-type heavy codebases, something that's achievable to a point with Python/Scala but impractical in Java. So I'm ready to be wrong about this one. 

Go concurrency isn't safe, for example, you can deadlock. Go is garbage collected, using a mark/sweep collector. While it's probably impossible for most of us to program concurrent software and hand manage memory, my experience with Java is a lot of time dealing with GC, especially trying to manage latency at higher percentiles and overflowing the young generation. Go structs might allow better memory allocations, but I don't have the flight time to say GC hell will or won't exist in Go. It would be very interesting to see how Go holds up under workloads witnessed by datastores such as Hadoop/HBase, Cassandra, Riak and Redis, or modern middlewares like Zookeeper, Storm, Kafka and Rabbitmq.

The 'go get' importing mechanism is broken, at least until you can specify a revision number. I'd hazard a guess and say this comes from Google's large open codebase, but I've no idea what the thinking is. Having worked in a codebase like that I can see how it makes sense along with a stable master policy. But I can also see stability will suffer from an effect similar to SLA inversion, in that the probability of instability is the product of your externally sourced dependencies being unstable. It's important to think hard about your dependencies, but in practice if you have make an emergency patch and you can't because you can't build your upstream you are SOL. A blameless post-mortem that identified inability to build leading to a sustained outage is going to result in a look of disapproval, at best. I don't see how to protect from this except by copying all dependencies into the local tree and sticking with path based imports, or using a library based workaround. I don't see how bug fix propagation and  security patching are not at best, made problematic with this model. Put another way using sneaky pincer reasoning - if Go fundamentally believed this was sane, the language and the standard libraries wouldn't be versioned. Thankfully it's a mechanism rather than a core language design element and should be something that can get fixed in the future. 

Conclusions

Go seems to hit a sweetspot between C/C++, Python JavaScript, and Java, possibly reflecting its Google heritage, where those languages are I gather, sanctioned. It seems to be trying to be a more effective language rather then a better language, especially for in-production use. 

Should you learn it and use it? Yes, with two caveats. How much you like static type systems, and how much you value surrounding tooling.

If you really value modern, powerful type systems as seen in Haskell, Scala and Rust, I worry you'll find Go pointless. It offers practically nothing in that area and is arguably backwards looking. Yes there is structural typing, and (thankfully) closures, but no sum types, no generics/existentials, no monads, no higher kinds, etc - I don't think anyone's going to be doing lenses and HLists in Go while staying sane. 

An issue is whether significant investment into the Go runtime and diagnostic tooling will happen outside Google. Tools like MAT, Yourkit, Valgrind, gcviz etc are indescribably useful when it comes to running server workloads. The ecosystem on the JVM for example, in the form of the runtimes, libraries, diagnostic tools, and frameworks, is like gravity - if anything was going to kill Java, it was the rise of Rails/Python/PHP in the last decade - that didn't happen. I know plenty of shops are staying on the JVM, or have even moved back to Java, mostly because of its engineering ecosystem. This regardless of the fact the language has ossified. JVMs have been worked on for nearly two decades, by comparison Go's garbage collector is immature, and so on.  

A final thought. Much code today is written in a language that can be considered to have a similar surface to Go - C, C++, C#, Python JavaScript, and Java. If you buy into the hypothesis that programming language adoption at the industry level is slow and highly incremental, then Go's design center is easy to justify and makes broad adoption possible. A killer application (a la Ruby on Rails for Ruby, browsers for JavaScript, UNIX for C) or a set of strong corporate backers who bet on the language (a la Java and C#) are often what drives a language but in Go's case I think it will be a general migration of services infrastructure.  Aside from generics and possibly annotations, there's a reasonable argument to be made that Go is sufficiently more advanced than Java and JavaScript without being conceptually alien, and good enough compared to C#, Python and C++ for 'headless' services. Plenty of shops don't make decisions based on market adoption, but for larger engineering groups it's inevitably a concern.  


 

2013/06/02: updated the Node/Erlang concurrency and clarified pointer observations with feedback from @jessemcnelis. Somehow in this wall of text, I forget to mention readability, thankfully was reminded by @davecheney

On Languages

 

 

In my first degree I had to learn a language called Pascal, with a big old blue Walter Savitch book  and a lecturer who said you can make a computer do anything. He was wrong, or simply didn't understand how little he, or I, knew about computers at the time. This was an industrial design degree and programming was part of the technology/science block given in first year.

Fast forward 8 years and I'm doing an AI degree, with the data structures/101 course being taught in Modula-2, a fix-upped Pascal and I'm thinking not much has changed. After that we had to learn C and Prolog in different courses at the same time. The course lecturers I gathered did this deliberately to keep us flexible. This clearly could work - C and Prolog might as well be the Zerg and Protoss of languages. On reflection I think this is one of the better things that happened to me, education-wise.  Since then I have tried to learn the basics of new languages in pairs or more, and if possible digest languages that are reasonably different in approach. 

I have been playing around with Erlang, Scala and Go more and more, to the point where I'm able to draw conclusions about them as production languages.  I don't consider myself expert in any of them, but like most people in my line of work I have to be able to make judgments in advance of deep understanding. I've also spent time to a lesser extent with Rust, not enough to decide anything much. I've been foostering with Erlang and Scala on and off for years, Go and Rust are more recent.

I wanted to write down what I've found to be good about the languages so far. For context, the production languages I work with most these days are Java, Python and Groovy, indirectly Clojure and Scala (the latter due to Storm and Kafka, platforms I operate with my current employer). 

I also want to talk about what's 'wrong' with these languages, and by what's wrong I mean two things. First, absence of features; an example might be the limited static typing features in Go. Second, what I'm going to call 'negative externalities' - aspects that distract from or even outweigh the direct benefits of the languages. Externality concerns are usually more important to me than missing features because of the way they impact engineering quality and cost. In Java an example might be GC pauses or boilerplate. In Groovy or Clojure it might be comprehending stacktraces or runtime type failures that other languages would catch earlier. In Scala it might be compatability or readability. The longer I work in this industry the more I find myself concerned with ergonomic/lifecycle factors and I believe understanding the more indirect aspects of a language helps make better decisions when it comes to selection.

I'll add links here as I write each one up -

 

Switching from TextDrive to Rackspace and Linode

 Some notes and observations on moving hosting providers.

Why?

Well, Joyent said August last year that people on lifetime deals from TextDrive hosting, of whom I'm one,  would come to be no longer be supported. Then they changed their minds somewhat, and their deadlines. That had me looking around. Not because Joyent don't want to support years old BSD servers rusting in some cage somewhere, I have a realistic definition of lifetime when it comes to commercial offerings. Over six years I've been using TextDrive it has been a great deal, all things considered, so no complaints there.  

Then TextDrive was restarted and said they would honor the accounts. That's beyond admirable, but communication has been less than ideal - simply figuring out what to do to get migrated and what I'd be migrating to has been too difficult (for me).  The new TextDrive situation should work itself out in time - people are working hard and trying to do right thing. But I wanted something with more certainty and clarity. 

Aside from incidentals like old mercurial/subversion repositories and files, the two main services to move were this weblog and email. 

Email

Email was hosted on a server called chilco, and had been creaking for some time. IMAP access had slowed down noticeably in the last 12 months with increasing instances of not being to connect to the server for a period of time. That the server certs haven't been updated is a constant irritation. I decided I'm past the point of running my own mail server and it was time to pay for a service, although Citadel was tempting. After whittling it down to Google and Rackspace, I went with Rackspace Email, for four reasons -  pricing, solid IMAP capability, just email and not an office bundle, and their support. It probably helped that I don't use GMail much.

Setting up mail with Rackspace was quick. DNS Made Easy lost me when I couldn't see my settings after my account expired, so I moved DNS management over to Rackspace as well and routed the MX records to their domain.  Rackspace have an option to auto-migrate email from existing servers. This didn't work, I strongly suspect the certificates being out of date on the old server were the cause. The answer was to copy over mail from one account to another via a mail client. That wasn't much fun, but it worked. 

Rackspace webmail and settings are proving very usable. IMAP access is fast. The sync option via an Exchange server is good. There's plenty of quota. Rackspace support has been excellent so far, they clearly do this for a living. I'm much happier with this setup.

Cost: $2 per email account per month, $1 per sync account per month.

Linode

I wanted to avoid a shared host/jailed setup. There's a lot to be said for running on bare metal, but  virtualization for hosting at personal scale makes more sense to me.  I considered AWS and Rackspace but the pricing didn't suit, mainly due to the same elastic pay as go model that works for commercial setups. Linode offered a predictable model for personal use. Other VPS/Cloud providers were either more expensive and/or lacked surrounding documentation. Plus I got a nod from someone I trust about Linode which settled it. 

In hindsight I should have done this years ago. Getting an Ubuntu 12.04 LTS set up was easy. Payment was easy as was understanding the utilization model. Using the online admin is easy - the green and gray look really ties the site together.  Linode instances have been responsive from the shell, more so than the previous BSD hosted option I was using. Having full access to the server instance is great as it avoids workarounds that come with shared hosting (I had an 'etc' folder in my home directory for example).

Linode has good documentation. I mean really good. All the instructions to get started just worked. 

Cost: $20.99 per 512M instance per month.

Weblog

Moving to Linode meant moving this weblog, a custom engine I wrote in 2006/7 based on Django and MySQL,. The last major work here was a pre 1.0.3 Django update in late 2008 that had trivial patches to get TinyMCE to work in the admin. It did occur to me to move to a non-custom engine like Wordpress, but that would involve a data migration. And I figured having originally wrote it to learn something, fixing it up four years on might teach also teach me something. 

I upgraded Django to 1.4.3, breaking one of my cardinal rule of migrations - migrate don't modify. In my defense I had started the upgrade process on the old server, but due to the uncertainty with Joyent/TextDrive and my discovering there was a December 31st 'deadline' for shutting off the old TextDrive servers, I decided to get migration done quick (that the deadline was with no explanation pushed back to January 31st after I raised it on the TxD forum, is an example of lack of clarity I mentioned). 

It took a while but was straightforward. Django 1.4 has a more modular layout for its apps - the main app can live side by side with other apps instead of being nested. You can see the difference here between the two layouts - 

This meant changes to import paths that were worthwhile, as the new way of laying out projects is much better. Some settings data structures have changed, notably for the database and for logging (which now actually has settings). The Feed API left the impression it was the one that was most changed from earlier versions, and while it's significantly improved, I couldn't see a simple way to emit blog content. I like full text feeds as it avoids people having to click through to read a post. I had to use the Feed API's internal framework for extending data -

class AtomFeedGenerator(Atom1Feed):
    def add_item_elements(self, handler, item):
        super(AtomFeedGenerator, self).add_item_elements(handler, item)
        handler.addQuickElement(u'content', item['content'], {u'type':u'html'})

class LatestEntries(Feed):
    def item_extra_kwargs(self, item):
        return {'content': item.content}

 

Comments are still disabled until I figure out how to deal with spam which is a severe problem here, although the comments framework looks much improved. That's probably ok because the comments form UX on the blog sucked. Also, comments seem to be going away in general in favour of links get routed to social networks and the discussion happens there. I might experiment with Discqus or routing discussions to Google Plus. 

Now that django-tinymce exists, I was able to remove my own integration hack with the admin app. I used Django's cache api, which is much improved from what I remember, in conjunction with memcached. 

The old blog deployment used fast CGI with lighty. This was never a very stable deployment for me and I had wanted to get off it for some time. Years ago, mod_python was the production default for Django, mod_python in the past has made me want to stab my eyes out with a fork, hence fcgi. I switched over to mod_wsgi, which is the 1.4.3 default option for deployment, and we'll see how that goes. Certainly it requires less init boilerplate

The only Linode documentation that didn't work first time was getting Django and mod_wsgi setup, but it was from 2010 and easy to fix up. I ended up with this

WSGIPythonPath /home/dehora/public/dehora.net/application/journal/journal2
<VirtualHost dehora.net:80>
	ServerName dehora.net
	ServerAlias www.dehora.net
	ServerAdmin bill@dehora.net
	DirectoryIndex index.html
	DocumentRoot /home/dehora/public/dehora.net/public
	WSGIScriptAlias / /home/dehora/public/dehora.net/application/journal/journal2/journal2/wsgi.py
	<Directory /home/dehora/public/dehora.net/application/journal/journal2/journal2>
	  <Files wsgi.py>
	  Order allow,deny
	  Allow from all
	  </Files>
	</Directory>
        Alias /grnacres.mid 	/home/dehora/public/dehora.net/public/grnacres.mid
	Alias /pony.html 		/home/dehora/public/dehora.net/public/pony.html
	Alias /robots.txt 		/home/dehora/public/dehora.net/public/robots.txt
	Alias /favicon.ico 		/home/dehora/public/dehora.net/public/favicon.ico
	Alias /images 		/home/dehora/public/dehora.net/public/images
	#
	# todo: https://docs.djangoproject.com/en/1.4/howto/deployment/wsgi/modwsgi/#serving-the-admin-files 
	# for the recommended way to serve the admin files  
	Alias /static/admin 		/usr/local/lib/python2.7/dist-packages/Django-1.4.3-py2.7.egg/django/contrib/admin/static/admin
	Alias /static 		/home/dehora/public/dehora.net/public/static
	Alias /doc 			/home/dehora/public/dehora.net/public/doc
	Alias /journal1 		/home/dehora/public/dehora.net/public/journal
    <Directory /home/dehora/public/dehora.net/public/static>
    Order deny,allow
    Allow from all
    </Directory>
    <Directory /home/dehora/public/dehora.net/public/media>
    Order deny,allow
    Allow from all
    </Directory>
    <Directory /home/dehora/public/dehora.net/public/doc>
    Order deny,allow
    Allow from all
    </Directory>
	LogLevel info
	ErrorLog /home/dehora/public/dehora.net/log/error.log
	CustomLog /home/dehora/public/dehora.net/log/access.log combined
</VirtualHost>

I was very happy a data migration wasn't needed. Not a single thing broke with respect to the database in this upgrade.I'm with Linus when it comes to compatability/regressions and in fact think breaking data is even worse. If you do it, provide migrations, or go home. Too many people seem to think that stored data is something other or secondary to the code, which is wrong-headed - data over time is often more important than code. 

It's been four years give or take since I used Django, it was fun to use it again. Django has grown in that time - apis like feed, auth and comments look much better, modularization is improved - while retaining almost everything I like about the framework. Django has a large ecosystem with scores of extensions. Django was and still is the gold standard by which I measure other web site frameworks. Grails and Play are the closest contenders in the JVM ecosystem, with Dropwizard being what I recommend for REST based network services there. One downside to Django (and by extension, Python) worth mentioning is that the stack traces are less than helpful - more than once I had to crack open the shell and import some modules to find my mistakes.

Observations

The main lessons learned in the move were as much reminders as lessons. 

First if it ain't broke don't fix it, is a bad idea that doesn't work for web services. You should as much as possible be tipping around with code and keeping your hand in so that you don't forget it. Most of the time I spent migrating this weblog was due to unfamilarity with the code and Django/Python rustiness (but see the fifth point as well). Never mind the implications of running on ancient versions, I simply wish I had checked in on this project a couple of times a year.  Not doing so was stupid.

Second, when a framework comes to support a feature you added by hand, get rid of your code/workaround - if it was any good you'd have contributed it back anyway. 

Third, divest of undifferentiated heavy lifting; in my case this was email. If you don't enjoy doing it, learn nothing from it, gain no advantage by it, let it go.

Fourth, excess abstraction can be a future tax. In my case the weblog code was intended to support multiple weblogs off a single runtime. Turns out after five years I don't need that, and it resulted in excess complexity in, the database model, view injections, url routing, and path layouts for files/templates. All these got in my way - for example dealing with complex url routing cost me hours with zero benefit. I'll have to rip all this nonsense out to ungum the code. I mention this also because I've been critical recently of handwaving around excess abstraction - claims of YAGNI and other simplicity dogma are ignorable without examples.

Fifth, and this is somewhat related to the first point. Online service infrastructure has come a long long way in the last number of years. I know this because it's part of my career to know this, many people reading this will know this, but when you experience moving a more or less seven year old setup to a modern infrastructure, it's truly an eye opener. My 2006 hosting setup was so much closer to 1999 than 2013.

Finally - hobby blog projects are for life.  I'm looking forward to expanding Linode usage and experimenting with the code more. When you have complete control of instances you also get more leverage in your infrastructure choices - for example I now have the option to use Cassandra and/or Redis to back the site. I don't see myself going back to a shared hosting model.

 

Level Set

Joe Stump: "It's a brave new world where N+1 scaling with 7,000 writes per second per node is considered suboptimal performance"

Spotted on the cassandra users list. This was on EC2 large instances, which should put the wind up anyone who thinks you need specialised machinery for anything but extreme performance/scale  tradeoff requirements. For many cases now, it seems you don't need to just pick one.

And check out Joe's startup, SimpleGeo: a scale out geo spatial/location platform running on (I think still) AWS and using Cassandra as the backing database, which a bunch of geospatial software built on top. It is a new world - a startup couldnt have done this even a few years ago, not without a boatload of VC funding for the hardware.

Ubuntu Lucid 10.04 on Thinkpad W500


After over 3 years at 10 hours a day it was time to retire the thinkpad T60. I replaced it with a thinkpad W500, not the most recent model but well thought of and importantly works with Ubuntu Linux. The fan was nearly worn out, the disk slow (5100rpm) and the right arrow key was dead (using emacs or programming without the right arrow key is no fun at all).


The W500 a nice machine. The reason I use Thinkpads over other machines are build quality, keyboard and that they generally just work with Linux. Not many laptops will take the consistent (ab)use my T60 has seen. The W500 keyboard is very good and the build seems fine. The single major criticism I had of the T60 was its dull screen - the W500 is much better here. Its sound quality is much better than the T60. It has a 7200rpm HDD which is noticeably better than the T60's 5100rpm and 4Gb RAM will do for now.

This was my first use of Ubuntu 10.04 Lucid. I had held off, knowing a new machine was coming.So far I'm very impressed. Installation went very well, after a clearing 270Gb of the 320Gb drive, it installed in a few minutes. This is the most straightforward installation of any Linux I've used (I go back to Redhat 4), and it matches a Windows install for simplicity. The days of deciding how big to make /home and stepping through X11 and networking stuff are long gone. Also, the steps toupdate the operating system and packages with Synaptic are very simple - my Windows 7 update on the other disk partition hung a few times, so it seems Microsoft haven't quite sorted this out (I've had a few severe problems with Vista updates).

All went well until I clicked on the proprietary drivers option for the ATI Mobility Radeon HD3650. I had read in the past that there were problems with fglrx and the ATI card, but not that on startup the screen would go dead with no possibility to switch run levels or move into recovery. After trying to manually fix up Xorg via a LiveCD, I reinstalled. Because I had already copied over a lot of data I shrunk the original partition and reinstalled to reduce copying time. Again installation was a breeze. However the steps to remove the old partition and grow the new one via gparted resulted in the /boot partition getting messed up somehow. I was't able to get grub to work so I had to blitz the partition and install a 3rd time. It was unfortunate to mess things up twice, but that's not  a criticism of the distribution. (or gparted, this is the first time I've had anything untoward happen with it)

Ubuntu 10.04 Lucid is an excellent distribution and it is very close to the goal of  a Linux for human beings. I have two criticisms of the UI. First the window controls. I don't care they're now on the left, I do care that it breaks symmetry with the previous window layout by placing the minimize button in the center instead of the right. This is bad design as the most common operation is collapse and the relative sizings not have disjoint order that makes you think. Second, I believe the mouse grab for window sizing is too fine grained - I find myself having to place the mouse very carefully to grab the box.

It's always interesting to look at the extra software you need to install use the computer. Here's the list -

ant
build-essential
cisco vpnclient
django
emacs
eclipse+pydev
gimp
git
gparted
ipython
idea9
maven
mercurial
meld
mysql server
mypasswordsafe
pidgin (can't use empathy at work)
protocol buffers
p4/p4v
scala
skype
ssh
subversion
sun-java6-jre
thrift
thunderbird
vmware (visio, word, powerpoint)
wireshark

As far as I can tell this is less software that I used to depend on, which is a good thing. The main installed software is Firefox, Bash and Open Office. I still need to use MS Ofice via VMWare for Powerpoint and Word, as Presentation and Word Processor aren't quite good enough, whereas Spreadsheet is excellent.

The biggest changes in the last couple of years have been Git, Scala, and Emacs Orgmode. I still find Mercurial more usable than Git but so much OSS is in Git now, it's neccessary to have a working knowledge. Or more accurately, so much OSS is in github - github seems to be becoming the Facebook of DVCS. Scala has become my most liked JVM language in the last few years, although most day to day work is Java. Scala makes excellent tradeoffs between type safety, performance and expressiveness but the repl is not fast enough nor that language syntactically simple enough to replace Python for off JVM work. Orgmode is the biggest change and has become vital to me. Orgmode is the only notetaking/organisation tool I use now and the only GTD style app that has not failed me when I really needed it - it's indescribedly excellent and a true productivity enhancing tool - I can't recommend it enough.

Extensions v Envelopes

Here's a sample activity from the Open Social REST protocol (v0_9):

<entry xmlns="http://www.w3.org/2005/Atom">
   <id>http://example.org/activities/example.org:87ead8dead6beef/self/af3778</id>
   <title>some activity</title>
   <updated>2008-02-20T23:35:37.266Z</updated>
   <author>
      <uri>urn:guid:example.org:34KJDCSKJN2HHF0DW20394</uri>
      <name>John Smith</name>
   </author>
   <link rel="self" type="application/atom+xml"
        href="http://api.example.org/activity/feeds/.../af3778" />
   <link rel="alternate" type="application/json"
        href="http://example.org/activities/example.org:87ead8dead6beef/self/af3778" />
   <content type="application/xml">
       <activity xmlns="http://ns.opensocial.org/2008/opensocial">
           <id>http://example.org/activities/example.org:87ead8dead6beef/self/af3778</id>
           <title type="html"><a href=\"foo\">some activity</a></title>
           <updated>2008-02-20T23:35:37.266Z</updated>
           <body>Some details for some activity</body>
           <bodyId>383777272</bodyId>
           <url>http://api.example.org/activity/feeds/.../af3778</url>
           <userId>example.org:34KJDCSKJN2HHF0DW20394</userId>
       </activity>

    </content>
</entry>


It's 1.1 kilobytes. I'll call that style "enveloping". Here's an alternative that doesn't embed the activity in the content and instead use the Atom Entry directly, which I'll call "extending":

<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:os="http://ns.opensocial.org/2008/opensocial>
   <id>http://example.org/activities/example.org:87ead8dead6beef/self/af3778</id>
   <title
type="html"><a href=\"foo\">some activity</a></title>
   <updated>2008-02-20T23:35:37.266Z</updated>
   <author>
      <uri>urn:guid:example.org:34KJDCSKJN2HHF0DW20394</uri>
      <name>John Smith</name>
   </author>
   <link rel="self" type="application/atom+xml"
        href="http://api.example.org/activity/feeds/.../af3778" />
   <link rel="alternate" type="application/json"
       href="http://example.org/activities/example.org:87ead8dead6beef/self/af3778" />
   <os:bodyId>383777272</os:bodyId>
   <content>Some details for some activity</content>
</entry>


It's 686 bytes (the activity XML by itself is 460 bytes). As far as I can tell there's no loss of meaning between the two. 545 bytes might not seem worth worrying about, but all that data adds up (very roughly 5.5Kb for every 10 activities, or 1/2 a Meg for every 1000), especially for mobile systems, and especially for activity data. I have a long standing belief that social activity traffic will dwarf what we've seen with blogging and eventually, email. If you're a real performance nut the latter should be faster to parse as well since the tree is flatter. The latter approach is akin to the way microformats or RDF inline into HTML, whereas the former is akin to how people use SOAP.

Ok, so that's bytes, and you might not care about the overhead. The bigger problem with using Atom as an envelope is that information gets repeated. Atom has its own required elements and is not a pure envelope format like SOAP. OpenSocial's "os:title", "os:updated", "os:id", "os:url", "os:body", "os:userId" all have corresponding Atom elements (atom:title, atom:id, atom:link, atom:content, atom:url). Actually what's really interesting is that only one new element was needed using the extension style, the "os:bodyId" (we can have an argument about os:userId, I mapped it to atom:url because the example does as well by making it a urn).  This repetition is an easy source of bugs and dissonance. The cognitive dissonance comes from having to know which "id" or "updated" to look at, but duplicated data also means fragility. What if the updated timestamps are different? Which id/updated pair should I use for sync? Which title? I'm not picking on Open Social here by the way, it's a general problem with leveraging Atom.

I suspect one reason extensions get designed like this is because the format designers have their own XML (or JSON) vocabs, and their own models, and want to preserve them. Designs are more cohesive that way. As far as I can tell, you can pluck the os:activity element right out of atom:content and discard the Atom entry with no information loss, but this begs the question - why bother using Atom at all? There are a couple of reasons. One is that Atom has in the last 4 years become a platform technology as well as a format. Syndication markup now has massive global deployment, probably second only to HTML. Trying to get your pet XML format distributed today without piggybacking on syndication is nigh on impossible. OpenSocial, OpenSearch, Activity Streams, PSHB, Atom Threading, Feed History, Salmon Protocol, OCCI, OData, GData, all use Atom as a platform as much as a format. So Atom provides reach. Another is that Atom syndicates and aggregates data. "Well, duh it's a syndication format!", you say. But if you take all the custom XML formats and mash them up all you get is syntactic meltdown. By giving up on domain specificity, aggregation gives a better approach to data distribution. This I think is why Activity Streams, OpenSearch and Open Social beat custom social netwoking formats, none of which have become a de-facto standard the way say, S3 has for storage - neither Twitter's or Facebook's API is de-facto  (although StatusNet does emulate Twitter). RDF by being syntax neutral is even better for data aggregation but that's another topic and a bit further out into the future.

So. Would it be better to extend the Atom Entry directly? We've had a few years to watch and learn from social platforms and formats being built out on Atom, and I think that direct extension, not enveloping, is the way to go. Which is to say, I'll take a DRY specification over a cohesive domain model and syntax. It does means having to explain the mapping rules and buying into Atom's (loose) domain model, but this only has to be done once in the extension specification, and it avoids all these "hosting" rules and armies of developers pulling the same data from different fields, which is begging for interop and semantic problems down the line.

I think in hindsight, some of Atom's required elements act against people mapping into Atom, namely atom:author and atom:title. Those two really show the blogging heritage of Atom rather than the design goal of a well-formed log entry. Even though author is a "Person" construct in Atom, author is a fairly specific role that might not work semantically for people (what does it mean to "author" an activity?). As for atom:title, increasingly important data like tweets, sms, events, notifications and activities just don't have titles, which means padding the atom:title with some text. The other required elements - atom:id, atom:updated are generic constructs that I see as unqualified goodness being adopted in custom formats (which is great). The atom:link too is generically useful, with one snag, it can only carry one value in the rel attribute (unlike HTML). So these are problems, but not enough to make me want to use an enveloping pattern.

Just a little work

 

Tim Bray : "I'm pretty sure anybody who's been to the mat with the Android APIs shares my unconcern. First of all, a high proportion of most apps is just lists of things to read and poke at; another high proportion of Android apps are decorated Google maps and camera views. I bet most of those will Just Work on pretty well any device out there. If you’re using elaborately graphical screens you could do that in such a way as to be broken by a different screen shape, but it seems to me that with just a little work you can keep that from happening."

Tim might want to live through a few real handset projects to understand portability costs. All that little work adds up and is sufficient to hurt the bottom line of a company or an individual, perhaps enough to keep them with Apple. Even if you could develop a portable .apk through disciplined coding, the verification testing alone will hurt, especially as the Android ecosystem of hardware and versions grows.

"Oh, and the executable file format is Dalvik bytecodes; independent of the underlying hardware."

I've heard the same said about J2ME bytecode. 

Activity Streams extension for Abdera

The next time I see someone saying XML is inevitably hard to program to, I'll have a link to some code to show them:

public static void main(String[] args) {
  Abdera abdera = new Abdera();
  abdera.getFactory().registerExtension(new ActivityExtensionFactory());
  abdera.getFactory().registerExtension(new AtomMediaExtensionFactory());

  Feed feed = abdera.newFeed();               
  ActivityEntry entry = new ActivityEntry(feed.addEntry());
  entry.setId("tag:site.org,2009-01-01:/some/unique/id");
  entry.setTitle("pt took a Picture!");
  entry.setVerb(Verb.POST, false);
  entry.setPublished(new Date());
  Photo photo = entry.addTypedObject(ObjectType.PHOTO);
  photo.addThumbnail(
    "https://example.org/pt/1/thumbnail",
    "image/jpeg", 16, 32);
  photo.addLargerImage(
    "http://example.org/ot/1/larger",
    "image/jpeg", 1024, 768);
  photo.setTitle("My backyard!");
  photo.setDescription("this is an excellent shot.");
  photo.setPageLink("http://example.org/pt/1");
}

That generates Activity Streams (AS), an extension to Atom - you can read about it here - http://activitystrea.ms. I think Activity Streams are going to be an important data platform for social networking.

The scalability of programming languages

Ned Batchelder: "Tabblo is written on the Django framework, and therefore, in Python. Ever since we were acquired by Hewlett-Packard two and a half years ago, there's been a debate about whether we should start working in Java, a far more common implementation language within HP. These debates come and go, with varying degrees of seriousness."

For anyone coming from Python and looking at the type system side of things, and not socio-technical factors such as what particular language a programming shop prefers to work in, I would recommend Scala over Java. It has a good type system, allows for brevity, and some constructs will feel very natural (Sequence Comprehensions, Map/Filter, Nested Functions, Tuples, Unified Types, Higher-Order Functions). Yes, I know you can run Django in the JVM via Jython, I know there's Clojure, and Groovy too. This is just about the theme of Ned's post, which is the type system. And Scala has a better one than Java.

James Bennett: "The other is that more power in the type system ultimately runs into a diminishing-returns problem, where each advance in the type system catches a smaller group of errors at the cost of a larger amount of programmer effort"

Sure, maybe at the higher order end of the language scale. But in the industry middle, there's less programmer effort around Scala than Java, modulo the IDE support but that changes year by year.

The Boy Hercules strangling a snake

Anyway, the real problem with Python isn't the type system - it's the GIL ;)

 

Bug 8220

"The TAG requests that the microdata feature be removed from the specification."

RDFa is preferred by the W3C TAG over the Microdata spec made up in HTML5.

How this one plays out will be interesting.  Pass the popcorn!

 

Java Software Foundation

Joe Gregorio: "Does the ASF realize that subversion isn't written in Java?"

Better not tell them about Buildr, Thrift, CouchDB, Etch, TrafficServer et al.

 

Copier Heads

Robert Scoble: "Every month longer that this deal takes is tens of millions in Google’s pockets. Why? Well, the real race today isn’t for search. Isn’t for email. Isn’t for IM. It’s for ownership of your mobile phone."

That was back at beginning of 2008, at the height of the Microhoo excitement. It's interesting to revisit these things.

Scoble said that this because he "met the guy who runs China’s telecom last week in Davos. He’s seeing six million new people get a cell phone in China every month."

That was 138 per minute, about twice the growth rate of internet/web adoption. In terms of world wide adoption of phones, Scoble was probably off by an order of magnitude. It's not the "next big game" as one commenter put it (Tim O'Reilly). It is the big game.

Steve Jobs: "Basically they were copier heads that just had no clue about a computer or what it could do. And so they just grabbed, eh, grabbed defeat from the greatest victory in the computer industry. Xerox could have owned the entire computer industry today. Could have been you know a company ten times its size. Could have been IBM - could have been the IBM of the nineties. Could have been the Microsoft of the nineties."

I read 99 comments back then. About half a dozen picked up on the mobile point. Everyone was talking about property rights on social graphs and inferred information, or web2.0 ad models, or search, or the importance of email. Google still seems to the webco that understands best the importance of mobile.