June 23, 2010

NorthScale Unleashes Membase Server

Today is an exciting day for us at NorthScale. Along with our partners Zynga and NHN, we announced the formation of the membase open source project. Membase is a simple, fast, elastic "NoSQL" (absolutely hate that term, and we'll happily support anyone who can rally the world around a better moniker) database management system. Membase is currently serving data for some of the busiest web applications on the planet. The project is hosted at membase.org where source code, documentation and community information is now available.

In addition to the launch of membase.org, we announced beta availability of NorthScale Membase Server, our commercially supported distribution of the open source software. Beta 1 is available as an RPM package (certified on 32- and 64-bit Red Hat and CentOS operating systems, releases 5.2 and 5.4). Beta 2 will add support for Ubuntu, followed by support for Microsoft Windows client and server operating systems in Beta 3. There is a ton of detailed project and product information available on membase.org and northscale.com, so rather than repeating it here, I'll use this blog post to provide some historical color on the project and the motivation behind it. In early 2009, NorthScale was founded by leaders of the memcached open source software project to respond to a drumbeat of questions and requests we'd heard in that community. They made it increasingly clear that users of memcached liked the technology so much that they wanted to start using it (and, in many cases, were using it) for stuff it was never intended to be used for. Most of  that "stuff" revolved around treating memcached like it was a database; rather than the very simple, distributed cache that it is. Memcached is a cache (go figure). It is used to transiently cache data, in memory, spread evenly across a cluster of commodity servers. If a server fills up, memcached will eject the least recently used data object from memory to make room for "hotter" data. If a server is unplugged, the data, which is cached in volatile memory, will naturally be lost. That's all well and good with memcached, because applications using memcached are supposed to be in a position to reconstruct any needed data, at any point in time, from a durable database. Applications typically do this by storing the data in a relational database management system. If data is not found in the cache (or if the cache itself is not found!) then the application can simply query the database. In some cases unknowingly, but in many cases with full knowledge of the above, applications were storing data in memcached like it was going to be there for all time. Some organizations went to great lengths to protect themselves - ensuring there was always excess memory capacity on each node and investing in power management systems to ensure power outages would not take memory down. It begs the question: why on earth would smart people go to those lengths to use something that is actively trying to lose their data, to store their data? Three words: simple, fast, elastic. People like memcached because it represents a practically boundless place to easily cache data, at very low cost and with predictably stellar performance. No schemas, no tables, no sharding, no normalizing, no tuning. You want to put something in memcached, you put it in there. Why put it in two places!? Memcached is a breath of fresh air. And there are substantial economic benefits: a memcached cluster scales out (just add more commodity boxes to grow capacity), with linear cost and constant aggregate performance. To infinity, for all practical purposes. Attractive. Like a siren song. Enter membase. Without ever compromising the simple, fast,elastic part, and while guaranteeing 100% on-the-wire compatibility with memcached (now, and in to the future given our direct leverage of the memcached front end code) membase adds:

  • persistence - storing data to SSD and spinning media, on- or off-node
  • replication - providing high availability by copying data to multiple cluster members and supporting rapid fail-over
  • dynamic cluster configuration - add and remove servers, and rebalance data on a live cluster without impacting running applications

For the tens of thousands of memcached applications already running in the wild, and without changing a single line of code, membase provides a simple, fast, elastic place to store data . While relational database technology will always make sense for some classes of data, the observed desire to use memcached as a database made it clear there is a hunger for something that can store data more easily, cost effectively and with higher performance across the entire scaling spectrum. So enjoy the siren song. Membase is rock-free.

Comments