640kB
October 17, 2007 |
co.mments
Tim Bray passed this quote on from a colleague at Sun :
"you sell them storage and they’ll be back in six months, guaranteed. Everybody’s data is out of control."
that says better than I ever could how I feel about data. Here's what I see - data volumes impacting server sided software more than bandwidth or multicore. Bob Warfield is less concerned about data growth : "the tools are set up to deal with this problem. Oracle scales beautifully to very large numbers of cores. ". I believe this to a point - the problem is that Terabytes of data is becoming normal, whereas RDBMS storage too support those volumes are at the high-end.
Bob Warfield also mentioned that most of the scaling war stories he's seen come down to "database developers doing something silly that was not that hard to fix". While I suspect it's us app developer types that do this rather than DB specialists, here's the other problem - in the online space, a lot interesting valuable data is essentially semi-structured and not always ideal or thought out well enough for leveraging relational storage. But we stuff it in there anyway as the technical properties of RDBMSes are well understood. Finally, focusing on cores is still approaching the problem as having a scale-up solution; how should I reconcile that with the design and operational brains behind the big web properties saying that scale-out on commodity hardware is the way to go?
Dare Obasanjo took a insightful look at Amazon's Dynamo paper and concluded that "Luckily, there are only a handful of companies and Web services in the world that need to operate at that scale."
I hear this often, but I think it's contingent and will become less and less true in the future. The big web properties are simply the first wave of companies that need the technology. Here's Dare again:
"I think it is still fair to say that there is a certain level of scale where practically every feature traditionally associated with an RDBMS works against you."
No argument there. But as far as I can tell, companies like Amazon can't exist without these kind of scale-out storage models. Technology like Dynamo goes beyond the usual platitudes of "cost-effective IT" or "business service alignment". This kind of technology allows the business to be. It's not just needing systems and techniques to manage that amount of data. It's that being able to operate over huge amounts of data is itself transformational.
October 17, 2007 11:46 AM