Amazon announces simpleDB, Momentum Builds for Simple Databases
Adding to the momentum behind non-relational simple databases, Amazon announced its simpleDB web service product. As described on the simpleDB main page,
Amazon SimpleDB is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud. These services are designed to make web-scale computing easier and more cost-effective for developers.
Both provide a simple RESTful API, although SimpleDB looks like it is more XML based whereas CouchDB uses JSON. Both provide access to an ad-hoc database of items (or documents in CouchDB parlance) that consist of key/value pairs and each provides a mechanism to query the items by their contents. And both are implemented using Erlang (CouchDB for sure, simpleDB according to this post on Inside Looking Out).
While the similarity between SimpleDB and CouchDB is quite evident, it wasn't until I read over the Detailed Description section of the SimpleDB main-page that I realized that document databases, where documents consist of key/value pairs, are really very close to Google's BigTable concept (you just have to turn your head and squint a bit). To get a feel for this, take a look at the description of Hbase's data model (Hbase is an open source BigTable-like simple database that integrates with Hadoop) and compare to SimpleDB and CouchDB.
Aside from the buzz, I've been thinking a lot recently about simple databases and how a fast, highly scalable, and flexible key/value store is an essential component of just about any serious web application. And I've been lamenting that an open source implementation that can be used as a building block of web-scale applications doesn't yet exist (although in time, projects like CouchDB, Hbase, and ThruDB may fill this void). Despite not being free, nor open source, perhaps Amazon's SimpleDB is the building block I've been looking for. But I'm not sure.
For one thing, I'm wary of ending up with a web app that is tightly coupled to AWS. EC2 makes a lot of sense to me because the boundary between your app and AWS is clear. In that model you can run your app on your servers and deploy extra nodes to EC2 when you need more power. But without a local alternative to SimpleDB, one would have to be very careful not to end up with an app that can only run on AWS -- and that also complicates the development process since you can develop and test offline. The cost model of SimpleDB is attractive, so in the end I guess my concern boils down to not having a non-AWS local only solution...
Another aspect that I'm uncertain of is the choice of XML (or JSON for CouchDB). For real-time processing of large volumes of documents, I think it may make more sense to have a more compact data representation and have interfaces that are more integrated into the programming languages being used. For this, I really like what the Facebook developers have made available in the Thrift project. Although such an approach makes the schema somewhat less flexible, I'd really like to see a simple database that makes use of thrift and focus on speed and scalability. Sounds like a perfect project to dive into learning Erlang in the spare time I don't have :-)
Edited to add: This post has a more detailed SimpleDB vs CouchDB comparison.