Megadata. Tom White has a Hadoop book, you can get it via O'Reilly's shortcuts now, the dead tree version whill ship next summer. I knew The Facebook ran Hadoop, but maybe less well known is they do so against a 1.3Pb data warehouse - and confirms my opinion that logged data grows at a faster rate than the relational stuff you usually run ETLs against, something we knew was already true for UGC. Hive now makes sense to me. Kimball book, anyone?
Turing complete. This quarter's new cool by association language is Clojure - it is nice though if you like Lisps - macros, yummy. Ted Dziuba, El Reg's post-modern Bileblog provides a non-devasting critique of Python. Elharo in turn criticises Java's stagnation in comparison with Python 3.0. But I suspect PJ Eby nailed this discussion many years ago - see Python is not Java, and Java is not Python either. Python 3.0 shipped and without getting into the weeds the way Perl6 did or Ruby might (full credit to Reg for promoting the open classes don't scale meme with a wonderful presentation - ask any Zope veteran about monkey patching). On twitter, I wondered what Java can learn from the Py3K process - Mark Rheinhold wondered back, which fills me with hope. Speaking of Python, this weblog has been running on Django for just over a year - aside from the commenting system, I'm very happy with how it's working out (I had some pause), although I never got as far as multiblog support. Speaking of Java - IDEA8 is very, very nice, and yes indeed, generics and Builders are useful if you're writing an API - my lesson is learned. Speaking of hope - JSR277 is defacto defunct, which should mean a clear path for OSGi - looks like SpringSource chose wisely. Speaking of SpringSource, Spring 3.0.M1 has arrived - I'm looking forward to checking out their alternative API to JSR311, it needs to be good because JSR311 is awesome for building Data APIs and is a web frameworks game changer imvho - witness Jersey, Restlet (REST! OSGi! NASA!) and JBoss RESTEasy. I'll be amazed if features like @Consumes/Produces, CacheControl, Response Builders, Mappers and Exception Support don't end up being adopted - in non-Java frameworks too.
Formatting. I like text formats, enough indeed to give up a chunk of my life, but the mustIgnore principle is probably more important than binary v text argument - transitively I conclude that if you must use a binary format, then Protocol Buffers have an all important quality that reduces the normal coupling and system evolution issues I'd associate with binary formats - you can version clients and servers without a lockstopped upgrades. Cisco Etch has landed in the ASF incubator, however Thrift is already in the incubator and while I understand that IT rationalisation is not an objective of the ASF, it's hard not to conclude these projects have significant overlapping goals. A few projects seem to be lining up to use Thrift such as Hadoop, Zookeeper and Solr2, so that'll keep things interesting.
5 Comments
Link bait indeed, thanks for weaving all of that together!
I'd be particularly interested in further thoughts on Thrift and Protocol Buffers, specially if put either to real use.
Apache is happy to have inconsistent/competing projects as long as they don't get too abusive to each other, and collaborate where appropriate. There's pressure for Thrift on the Hadoop world from Facebook, as a successor to the Hadoop-home-rolled wire format/IPC layer. And some people are looking at Protocol Buffers as a format for MapReduce data.
Not only "macros yummy" - but "macros yummy and thank goodness they're not of the hygienic variety"
Sorry. I know your post is about "link bait" and not "flame bait".
"Restlet (REST! OSGi! NASA!)"
I've now helped put a few RWS into production using Restlet with Java and with Groovy, with and without a wee bit of Spring, persistence, mini-map-reduce-like distributed computations, etc. I have to say I don't understand why Spring and/or OSGi should figure into the solution.
Hopefully that's not more flame bait. More that I've found Groovy to be a good enough language for implementation, modularization, configuration, etc.
I suppose if I knew something about OSGi I'd use it. It just seems too much on the surface for a RWS.
There's a Groovy-for-Restlet on the web. Looking at it, there is a fundamental dependency on Spring. Gack. We just used Groovy accessing Restlet without a hitch.