« links for 2007-07-21 | Main | links for 2007-07-26 »

Eventually consistent

Joe Gregorio:

"The problem with current data storage systems, with rare exception, is that they are all "one box native" applications, i.e. from a world where N = 1. From Berkeley DB to MySQL, they were all designed initially to sit on one box. Even after several years of dealing with MegaData you still see painful stories like what the YouTube guys went through as they scaled up. All of this stems from an N = 1 mentality."

One company I can think of off the top of my of that didn't start with N=1 for storage is Google. I think that has to do with the fact the search is one of the few problem spaces for which a relational database isn't automatically deemed the right option. Problems for which databases are reflexively "the answer" seem to be most problems; later on that requires evolution-in-production to deal with the limitations of centralized storage. This is especially so for social networking, where the graph of relations is the key data - what could be more natural than an RDBMS for that?

Hence we now have plenty of introductory material on how to scale physical n-tier architectures backed using relational database to handle planetary class traffic. An interesting takeaway is that it's clearly possible to re-architect data storage on super-busy production systems seemingly no matter where you start from.

Well defined programming models, transactions, easy master/slave replication, predictable scaling/cost characteristics, big expertise pool, framework bias, solid available solutions, SQL, DML; these count heavily in favor of relational databases as the axiomatic choice for the data layer. But I think the juice is in the promise of data consistency. Getting people to compromise the idea of consistent data and ACID semantics for something like high availability (HA) is a huge challenge. I suspect plenty of people don't realize that HA and ACID are in conflict for larger values of N and where the data is geographically distributed.

Another thing with getting peple to go relax database designs, is that an available, scalable RDBMS system where N=1 will tend (I think) to have a physical model that looks like something an application developer throw together over a weekend, not the well designed model of a professional DBA. Professional DBAs don't start with redundant data models. They don't have columns for derived data. They don't manage joins in the application layer. They don't say normalization is for sissies. So the reaction from a a data specialist to an app developer saying table joins are going to be too expensive to run is going to be skepticism at best, and I think this is understandable. "Crappy" databases become a nightmare to deal with the second the data has to be re-used. That said, it would be no harm if the DBAs stopped to take a look at the request volumes (especially reads) that app developers are expected to support, and just how much caching infrastructure is being deployed purely to stop database servers from being vaporised under load.

SOA incidently, is also committed to trading off data consistency for some other desirable characteristic, probably partitioning (in the guise of separation of business concerns) rather than HA. In SOA the consistency workarounds are called "orchestration" which sounds a lot more palatable that "application level joins".

July 22, 2007 11:07 PM


Post a comment

(you may use HTML tags for style)

Remember Me?

Trackback Pings

TrackBack URL for this entry: