It has been just over couple weeks since the launch of membase.org, along with NorthScale’s partners at Zynga and NHN. In that time, we’ve been steadily increasing the postings on the wiki and responding to questions on the mailing list, the XMPP Chat and the IRC channel. When questions come up, they tend to be about about how membase compares to other Open Source projects, what kind of client one would use or what the pieces are when deployed.
The world of NoSQL
Generally I think people get that at a high level, membase is distributed, key-value database management system which is designed to scale both up and down, doing so without interrupting data services. It is designed to deliver the same kind of performance apps need to be in the critical path of getting data for the user. Looking at the world of NoSQL, in my opinion, this was an under served area. As a new area there is a lot of experimentation with NoSQL. Some are experimenting with a mix of online and analytics, others are experimenting with looser or eventual consistency, others are adding more data structure primitives on K/V stores and yet others are looking at data in more of a document oriented way. We were aware of and even experimented with a number of these, but we ended up on a different path with membase, as we were trying to solve some very specific problems along with our partners. Apps had been built around memcached. Some portion of those apps absolutely needed SQL; for that, they already have Drizzle, MySQL or PostgreSQL or Cubrid (Cubrid is big at NHN). Another portion of the same apps really didn’t need SQL, but they did need the replication, persistence and data management that most of the SQL based RDBMSs provided, albeit with more complexity and management than was typically desired. Enter membase. We could take the existing infrastructure apps had been built around (memcached protocol and clients) and add the requisite level of durability, add rules for allowing admins and developers to control how that durability happens per item, and get smart about how it would run in a distributed fashion.
How does the Rubber Meet the Road?
We needed to inject just a bit of intelligence into the system. The beauty of memcached is that the intelligence is, for the most part, on the client so the server can just be fast and dumb. We didn’t want to stray too far from that, but if you expect the system to be non-volatile, have replication and be able to grow and shrink while online, you need the system to have some concept of where things live. That lead to vbuckets, which Dustin’s excellent blog covers. With a way to know where things live, we still need durability and replication along with those buckets. That’s where the membase engine comes in. The membase engine can be told to replicate a set of data from an alternate node. It can also be told, via a configuration map, who is authoritative for a given bucket. Further up the stack though, the clients won’t know about vbuckets. They know modulus or consistent hashing to connect to servers, so we needed something compatible. We already had moxi, which would gave very simple clients intelligence around operation deduplication, intelligent connection sharing, what to do in the case of failures and even some non-coherent caching to speed things up even further. It wouldn’t take much more to teach moxi and it’s underlying configuration engine, libconflate, to know what to do with the same vbucket map. This would allow existing clients, and even existing applications to get the fast, distributed key/value database they need, so that’s what we did.
At its core then, membase is the membase engine which implements persistence, replication and vbuckets to grow and shrink dynamically. To bring vbuckets to clients who don’t have that slight extra bit of intelligence, there is moxi. To migrate data between nodes without interrupting service, there is the vbucketmigrator. Keep an eye on membase.org as we fill out more details or come by my talk if you’re attending OSCON 2010!