On Scala — Bill de hÓra

This is one of a series of posts on languages, you can read more about that here.

I've been messing around with Scala for a while. For no particular reason I let it slide but have come back to it with increased interest in the last few years. Scala's blending of programming paradigms in a single language is impressive. It has the most powerful type system of the 'mainstream' JVM languages [1]. It's fun to write code in and is a good language for leaning into the functional plus static typing paradigm. As much as I like the language, I usually qualify recommending it outright, due to some noticeable externalities and the very need to adopt new paradigms. And so, Scala is the language I find myself most conflicted over - the part of me that likes programming loves it, the part of me that goes oncall wants to see more work done on the engineering.

What's to like?

Herein a drive by of some parts of the language that I like and have found useful. For a broader overview a good place to start is scala-lang.org's Learning Scala page material, which also has an excellent overview of features.

Scala has a repl. Every language should have a repl -

dehora:~[macbook]$ scala

Welcome to Scala version 2.9.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_09).

Type in expressions to have them evaluated.

Type :help for more information.

scala> println("hello, world!")

hello, world!

The repl enables method discovery (hit tab after the dot operator).

Like Go (and Groovy) Scala makes semi-colons optional. Also like Go, declaration order is reversed compared to C/C++/Java. The type comes after the variable/method name allowing you to say "a of type int", "foo returning int". You can define something as a val (cannot be modified), a var (can be modified) or a def (a method/function) - val/var declarations in classes result in getters and setters being implicitly created (the private[this] variant can be used to force direct field access). Using vals is preferred for thread safety and comprehension reasons. Scala supports type inference, which removes a lot of boilerplate relative to Java. Strings in Scala are immutable and can be multilined using triple quotes and support interpolation using a $ notation (as of 2.10). Scala strings can be enhanced with StringOps, which provides a raft of extra methods compared to java.lang.String.

In Scala each value is an object. The root class is Any. Scala doesn't have predefined primitives as per C++ and Java; these are modelled as a sub-class of Any called AnyVal, everything else is a sub-class called AnyRef, which maps to Java's Object class. Because numbers are objects their arithmetic operations are methods - 1 + 2 is the same as 1+(2) (but not 1.+(2) which will give a you a double instead of an int ;). As a result, the keywords 'true' and 'false' are objects - true.equals(false) is a legal statement. Technically because we're dealing with methods this is not operator overloading, but the term is often used. Since Scala allows symbolic methods for binary/unary operations, you are free to define something like def +(that:Thing):Thing for your class. You are also free (with some restrictions) to define methods with unicode and ascii symbols -

def ||(dowop:ThatThing):ThatThing = ...

def ψ(dowop:ThatThing):ThatThing = ...

And so it goes - for example, list operators such as '::' (append) and ':::' (prepend) are just methods. Scala can be understood as strongly object oriented when compared to Python or Java and notably gets two things right in its approach - don't have primitives, and, preserve precedence rules.

Classes and construction in Scala are noticeably different to Java, but also better in more or less every respect. Classes in Scala have a different syntax to Java, thanks to the constructor signature being part of the class definition -

class Song(songName: String, songLength:Double, songArtist:String) {
  private val name = songName 
  var length = songLength 
  val artist = songArtist 
}

Constructor arguments can be assigned, and are also visible through the class. A var field can be re-assigned, vals cannot, protected fields are available to sub classes. Private members are not accessible outside the class with the exception of a companion object (more on those soon). Default and named arguments are available on constructors and methods.

Scala also has objects, which are singleton instances rather than a template declaration; they are also where main() methods are declared. Objects with the same name as a classes and placed the same file are called companion objects. It's common enough to see such object and class pairs in Scala. Companions can access each other's private methods and fields. Companion objects enable the use of the apply() method, which pretty much eliminate the factory and constructor chaining gymnastics seen in Java. An apply call is dispatched to via construction - roughly, new Song(name) goes to Song.apply(name). There's also unapply(), which enable extractor objects that work with pattern matching - 'case Song(f)' will dispatch to Song.unapply(f), which will return whether the argument is matched. Companion objects are a clean way to package up declaration and creation concerns.

Scala's case classes remove the need to declare fields, accessors, copy contructors, equals/hasCode pairs. Case classes are about as concise as Groovy POGOs, with the advantages of not maintaining type safety and providing private scope for fields - they're a huge win compared to JavaBeans or POJOs.

case class Song(name: String, length:Double, artist:String)

val s1 = Song("Align", 3.56, "Mowgli")

s1.name
res72: String = Align

val s2 = s1.copy(length=3.57)

s2.length
res73: Double = 3.57

s2.equals(s1)
res75: Boolean = false

val s3 = s1.copy()
s3: Song = Song(Align,3.56,Mowgli)

s3 == s1
res77: Boolean = true

s3.hashCode
res78: Int = -882114443

Case classes integrate well with pattern matching (more below) and can be treated as functions (note that s1 above didn't require a 'new').

Scala traits allow multiple inheritance of implementations using a well defined ordering. Traits also support default implementations unlike C++ virtual voids or Java interfaces, and which are similiar in style to static methods in Java (strictly speaking Scala doesn't have statics). In conjunction with the sealed modifier you can define a 'sealed trait' and case class subtypes as a neat way to declare abstract data types such as trees and grammars, and which can be dispatched over using pattern matching. While I would be happy to see inheritance removed from the language, Scala traits are a practical way to avoid the diamond problem.

At least it's an ethos

Scala's Option allows you to wrap a value as being either Some or None. An Option[T] is a container for a value of type T. If the value is not null it's a Some, otherwise it's a None. This pushes null handling into the type system making it a compile time concern -

val a = Some("a")
a: Option[String] = Some(a)
 
val nope = None 
nope: Option[String] = None

Using Option construction delegates None/Some selection, giving you None for a null or a Some for a non-null -

 val whoKnows = Option(null)
 whoKnows: Option[String] = None

 val mightCouldBe = Option("aliens looking fer gold")
 mightCouldBe: Option[String] = Some(aliens looking fer gold)

 println(whoKnows.getOrElse("Nope"))
 Nope

Options are one of the things I like most in Scala. For many of us, null is simply reality, and it can be hard to imagine life without it, but if you can appreciate why returning an empty list is better than returning a null, or why null shouldn't ever be mapped to the domain, I think you'll come to like Options. Options are handy for expressing the three valued logic (there, not there, empty) that is common when dealing with maps and caches. Code working with data stores may gain robustness benefits using Some/None in conjunction with pattern matching or higher functions. Options can be treated as collections, so foreach is available against them and collection functions like filter deal with them. Subjectively I prefer Scala's Option to Groovy's elvis and Clojure's nil largely because you can drive Option into API signatures. Option is also a gateway concept for monads.

Scala pattern matching allows you dispatch against a structure or the type of objects instead of reasoning about an instance. Pattern matching in Scala is powerful and shouldn't be compared to case/switch blocks in Java/C. You can match a populated list using a 'head :: tail' pattern; match against regexes; match against or an XPath-like option if you are working with XML, and so on. Pattern matching also helps with expressing business logic, such as doing something with an Account when it has a certain age or is from a certain location. Case classes are designed with pattern matching in mind (if you are familiar with and like Linda tuplespace style matching, chances are you'll like pattern matching over case classes). You can get compiler warnings on non-exhaustive matches in conjunction with the sealed modifier, helping to address problems with switch block fall-through and visitor pattern code (incidently, Yaron Minksy has a great talk that touches on this subject in the context of OCaml). I'm not entirely sure, but as far as I can tell pattern matching eliminates the visitor pattern and so the need for replace conditional with visitor/polymorphism workarounds. That's nice because that helps decouple structure from processing, avoiding some of the gnarlier maintenance issues with orthodox OO.

If pattern matching is a bit low level for handling input, you can try parser combinators. You define an object (or a set of classes) that is the 'parser combinator' (Scala provides some helpers you can sub-class), and each method in the object becomes a parser that returns Success, Failure or Error. You then call parseAll with the root parser and the input. Here's the stupidest example I could think of -

import scala.util.parsing.combinator.RegexParsers

object EpicNameParser extends RegexParsers {
  val name: Parser[String] = """[a-zA-Z]*""".r
}

// bind object to a handy val

val enp = EpicNameParser

enp.parseAll(enp.name, "Donny")
res26: enp.ParseResult[String] = [1.6] parsed: Donny

enp.parseAll(enp.name, "Walter")
res27: enp.ParseResult[String] = [1.7] parsed: Walter

enp.parseAll(enp.name, "The Dude")
res28: enp.ParseResult[String] =

[1.5] failure: string matching regex `\z' expected but `D' found

The Dude
    ^

This probably won't be the fastest way to do things performance-wise, but it's concise. Pattern matching opens up possibilties for dealing with conditionals and clean code; it's also the basis for actor receive loops.

Strange enlightenments are vouchsafed to those who seek the higher places

Scala has a rich set of collections. As well as Lists, Vectors, Sets, Maps and the generic Sequence (Seq), Scala has a Tuple that can take mixed types. The classes Pair and Triple are aliases for arity 2 and arity 3 tuples. Scala has both mutable and immutable collections that are collection analog to var and val fields. The community idiom seems to be to prefer immutable collections until you need to drop into mutability or j.u.c for performance reasons. Arrays in Scala map to Java arrays, but can be generic, and can also be treated as a Sequence.

As you might expect from a functional language, working with lists, maps and collections is nice. Scala's for loops (for comprehensions) work like Python's list comprehensions but have a syntax that allows more expressiveness. I like forcomps, but it's possible you'll move towards map/flatmap/foreach/filter methods that are available on collections as idiomatic for simpler loops (for is implemented using these methods). The foreach method accepts closures (similar to each/do in Ruby), and also functions. You can apply functions over members using map(), eg 'Seq(1, 2).map({ n=> "num:"+n })'. The flatMap method is similar but instead, accepts a function that returns a sequence for each element it finds, and then flattens all those sequences' members into a single list. This interacts nicely with Options, since an Option is a sequence -

def stoi(s: String): Option[Int] = {
  try {
    Some(Integer.parseInt(s.trim))
  } catch {
    case _  => None
  }
}

Once you have a function you can apply it to collection using map(), using the syntax '{ element => function(n)}' -

Seq("1", "2", "foo").map({n => stoi(n)})
res34: Seq[Option[Int]] = List(Some(1), Some(2), None)

As we can see, map() produces 'List(Some(1), Some(2), None)', ie, a list of lists. The flatMap() method on the other hand collapses the inner lists and removes the members that are None (since None's length is 0) leaving just the Some members (since Some's length is 1) -

Seq("1", "2", "foo").flatMap({n => stoi(n)})
res37: Seq[Int] = List(1, 2)

Collections also provide a filter() method that chains nicely with map() - easy composition of functions is a feature of Scala. For gathering up results you get fold and reduce (foldLeft and foldRight, reduceLeft, reduceRight). For example you can use foldLeft to trap the largest int in a list or sum a list -

List(5, 0, 9, 6).foldLeft(-1)({ (i, j) => i.max(j) })
res24: Int = 9

List(5, 0, 9, 6).foldLeft(0)({ (i, j) => i + j })
res48: Int = 20

Although reduceLeft might be a more direct way to go about both -

List(5, 0, 9, 6).reduceLeft({ (i, j) => i.max(j) })
res38: Int = 9

List(5, 0, 9, 6).reduceLeft({ (i, j) => i + j })
res49: Int = 20

Reduces are something of a special case of folds. Folds can gather up elements and may return a different type whereas reduces return a single value of the list type. For example, you can't say this -

List(5, 0, 9, 6).reduceLeft({ (i, j) => "" +  (i + j) })

but you can say this -

List(5, 0, 9, 6).foldLeft("")({ (i, j) => "" +  (i + j) })

longList.foldLeft("")({ (i, j) => "" +  (i + j) })

longList.foldRight("")({ (i, j) => "" +  (i + j) })
res51: java.lang.String = 5096

Methods like fold/reduce generally handle nulls without blowing up -

val x:String = null;

List("a", x).reduceLeft({ (i, j) => "" +  (i + j) })
res60: String = anull

Monads are coming

Functional programming is unavoidable in Scala. Scala is a functional language in the sense that you can program with functions and not just class methods, and functions are treated like values and so can be assigned and passed around as arguments. This is where Scala stops being a better Java, in the sense Groovy is, and becomes its own language. Functions in Scala can be declared using val -

scala> val timestwo = (i:Int) => i * 2

and can take variables from scope -

scala> val ten = 10
ten: Int = 10

scala> val mul = (i:Int) => i * ten

scala> mul(3)
res2: Int = 30

You can also create functions that accept other functions and/or return functions (aka higher order functions) -

def theSquareMullet(i:Int) = i * i

def theHoffSquareMullet({ mullet: => Int }) = mullet * mullet

The expression 'mullet: => Int' represents any function (or value) that returns an Int. All of these are valid -

scala> theSquareMullet(10)
res7: Int = 100

scala> theSquareMullet(theHoffSquareMullet(10))
res8: Int = 100

scala> theHoffSquareMullet(theHoffSquareMullet(10))
res9: Int = 10000

Functions can be anonymous, you can see some examples in the section on collections above. Functions can be nested and multiple values can be returned from a function/method -

def theSquareAndTheMullet(i:Int) = (i * i, i)

val (f, f1) = theSquareAndTheMullet(2)
f: Int = 4
f1: Int = 2

val bolth = theSquareAndTheMullet(2)
bolth: (Int, Int) = (4,2)

scala> bolth._1
res9: Int = 4

bolth._2
res10: Int = 2

The slight downside being the syntax for unpacking a discrete val isn't that pleasing (and are default limited to size 22 if you're going to return the world). On the upside accessing a tuple out of bounds is a compile time error and a tuple preserve each member's type.

Endofunctor's Game

The type system is powerful. Like functional programming support, the type system marks Scala its own language instead of a better Java. Scala has generics, like Java, and supports type variance, +T (covariant), -T (contravariant), :> (sub type, or Java 'extends'), <: (super type, or Java 'super'). Classes can be parameterised like Java, but unlike Java so can types via what's called a type alias. These allow you genericise members independently of the class signature -

abstract class Holder {
  type T // alias
  val held: T
}

def IntHolder(i: Int): Holder = {
    new Holder {
         type T = Int
         val held = i
       }
}

One nice feature is the compound type, which allows you to declare for example, that a method argument is composed of multiple types -

    def allTheThings(thing:ThatThing with ThatOtherThing):ThatThing = {...}

I can't do justice to Scala's type system in a single post. I doubt I know enough to appreciate what it can do - diving into it is very much a red pill thing. What I will say is the type system is Scala is deeply impressive in what can be achieved. The type system allows things that are either impractical or impossible in Java and gives good ways to scrap your boilerplate.

What's not to like

Onto some things that bother me about Scala.

Here are eighteen ways to say, roughly, add one to each member of a list creating a new list, remove any members in the new list that aren't greater than one creating another new list, and show that final list as a string -

 List(0, 1) map { _ + 1 } filter { _ > 1 } toString
 List(0, 1) map { _ + 1 } filter { _ > 1 } toString()   
 List(0, 1) map ( _ + 1) filter (_ > 1) toString()
 List(0, 1) map ( _ + 1) filter (_ > 1) toString 
 List(0, 1) map ({ _ + 1 }) filter ({ _ > 1 }) toString 
 List(0, 1) map ({ _ + 1 }) filter ({ _ > 1 }) toString()  
 List(0, 1) map ({ n => n + 1 }) filter ({n => n > 1 }) toString() 
 List(0, 1) map ({ n => n + 1 }) filter ({n => n > 1 }) toString 
 List(0, 1) map (n => n + 1) filter (n => n > 1) toString()
 List(0, 1) map (n => n + 1) filter (n => n > 1) toString 
 List(0, 1).map( _ + 1 ).filter( _ > 1 ).toString() 
 List(0, 1).map( _ + 1 ).filter( _ > 1 ).toString 
 List(0, 1).map({ _ + 1 }).filter({ _ > 1 }).toString()   
 List(0, 1).map({ _ + 1 }).filter({ _ > 1 }).toString 
 List(0, 1).map({ n => n + 1 }).filter({n => n > 1 }).toString()
 List(0, 1).map({ n => n + 1 }).filter({n => n > 1 }).toString  
 List(0, 1).map(n => n + 1).filter(n => n > 1).toString()
 List(0, 1).map(n => n + 1).filter(n => n > 1).toString

Scala provides syntactic flexibility. This flexibility is not limited to methods. Two ways to express for-comprehensions; three collections libraries (four if you count Java's); methods foo and foo() are not the same; XML assignment demands whitespace around '='; careful placement of suffix invocations such that they are not treated as infix. Closure captures can be tricky. In Scala '_' means everything and anything - there's no real way to avoid it in practice (imports, existential types, exceptions and closures). In particular, implicit conversions strike me as tricky to reason about. Scala generics have other bounds, 'view' and 'context' that allow you work with types beyond traditional hierarchies, where implicit conversions are made available.

Symbolic methods on top of the features above provide the base capability for DSLs. DSL support in a language is not an obvious win to my mind. Even a relatively restrained library like Gatling offers enough variation to catch you out and a symbol heavy project like SBT can be hard work. Scala's not unique in this - I have a similar beef with Ruby and Groovy.

Maintainable, comprehensible Scala requires restraint, in the form of disciplined practice and style guides. That you don't have to use all the features isn't sufficient to wave the problem away. If you are maintaining code, you don't get to ignore all the variation teams have added over time (I do not see the concept of Scala levels helping much). Dependency on open source amplifies the matter. Scala has a simple core with orthogonal concepts (the language spec is pretty readable as these things go). However that says nothing about the interaction of features, which is where complexity and comprehension resides. I'm not trying to conflate understanding of functional and type-centric constructs with complexity - simple things can be hard, but that doesn't make them complex.

As a systems grunt, I spend a lot of time with running code, sifting through stacktraces, heapdumps, bytecode, metrics, logs and other runtime detritus. You learn from bitter experience that correctness capabilities provided by a type system do not always trump having comprehensible runtime behaviour (the same can be said for the outcomes enabled by dynamic languages). Scala generates plenty of classes; as a result Scala stacktraces can be a bit of a bear, although better than Groovy or Clojure's. An artifact of Scala's integration with Java is that you can't catch checked exceptions, you catch all and extract the type with a case statement. Compilation is slow and becomes dead slow with the optimise flag. The expressivity of the language results in an abstraction gap when it comes to reasoning about runtime cost. Option, a great feature, incurs the cost of wrapper classes. Scala introduces a fair bit of boxing overhead. For comprehensions have overhead such that you may need to swap with a while/do loop. Closures naturally have overhead (at least until the JVM can optimise them). To ensure unboxed class fields, you need to qualify them with [this]. Immutable collections may need to be replaced with mutable or j.u.c, to deal with memory overhead, or performance (sometimes there are wins, like Scala's HashSet). The foldRight method will happily blow up when foldLeft wouldn't. Some serious runtime bugs get kicked down the road. That said, you can build high performance systems in Scala.

My least favourite aspect of Scala is probably backwards compatibility. Scala has broken compatibility three, maybe four times, in five years. This does not seem to be considered a problem by the community. I understand how this frees things up for the future, but the effort needed to bring all the dependencies forward yourself or wait until the maintainers rebuild is an area where the language does not strike me as remotely scalable. Cross-compilation support is available in in SBT, but is something of a workaround.

Conclusion

If a grand unified theory of programming language existed, its implementation would be called Scala. But enough of the modh coinníollach. Scala is neither a better Java/C# nor a worse Haskell/OCaml; it is its own language, and the most advanced on the JVM today. How much you are willing to invest in and work with functional and type-centric constructs will have a bearing on how much value you can get out of the language. And I find I need to weigh up three externalities - runtime comprehension, maintaining a clear reader-comes-first codebase, and dealing with language/binary incompatibility.

I feel it's inevitable that functional programming plus Hindley/Milner/Damas flavor static typing will become a dominant paradigm. If that's right, what are today considered exotic constructs like closures, function passing, higher order functions, pattern matching and sophisticated type systems are going to be mainstream. It also means the more mathematical abstractions, such as monads and functors will be common understanding, certainly by the 2020s (they are closer than you think!). As such spending time with Scala (and Rust/Haskell/OCaml) is a worthwhile investment. There will still be places for dynamically typed and close to the metal languages, but orthodox OO as practised today, is a dead end. Scala is this paradigm's breakout language and the one mostly likely to drag us half-way to Haskell.

Complexity is a criticism that is frequently levelled at Scala, one which gets pushback from the community. It's a sore spot, as there is no shortage of FUD. My take is that Scala's complexity is typically justified for the broad set of applications and paradigms it can encompass.

Reading

Scala has a relatively small number of books. Unfortunately since the language is moving quickly a number of them are out of date, as good as they may be.

Scala for the Impatient, Cay Horstmann. This is probably the closest thing to an up to date text; as of mid-2013 it's the go to introductory book on the language. Written by a professional writer who's covered many languages in previously, and it shows.
Scala in Depth, Josh Sureth. Targeted at people familiar with Scala but looking for idiom. It covers Scala 2.9 and by definition is out of date, however a lot of available open source is still targeting 2.9, so it remains relevant at least up to 2014.
Programming In Scala 2nd Edition, Martin Odersky, Lex Spoon, Bill Venners. This has been the standard reference for the last number of years and a very good book. It's out of date by virtue of covering Scala 2.8, but worth picking up to understand the language's core, especially if you see it discounted.
Programming Scala, Alex Payne, Dean Wampler. Part textbook, part evangelism, I credit this book with getting me interested in Scala again. Sadly, it's too out of date to recommend by way of targeting 2.7 and giving mention of 2.8 features, which hadn't been released at the time.
Akka Concurrency, Derek Wyatt. Not strictly a Scala book, but a fun read on programming with Akka, actors, and for concurrency in general with Scala. It targets Akka/Scala 2.10.
Functional Programming in Scala, Paul Chiusano, Rúnar Bjarnason. Not out until Sept 2013. Teaches functional programming using Scala, as opposed to being a book about Scala. I have high hopes for this one, there's a gap in the market for a pragmatic book on functional programming.

Scala's flexibility is such that idiomatic Scala isn't always obvious. I've found these two guides helpful -