March 25, 2010

When it comes to database technology, NorthScale is pro-choice

On Monday, analyst Matt Aslett posted How will pro-SQL respond to NoSQL? on The 451 Group’s “Too Much Information” blog. Good read. The gist of the post was: There are a bunch of individuals and companies running around claiming that their particular flavor of SQL database technology, memcached, or “NoSQL” database technology is “best.” The title implies that there is a “pro-SQL” camp and a NoSQL camp at odds with each other, battling for some prize. He concludes very practically: This should not be an “us versus them” kind of thing. We couldn’t agree more. NorthScale is neither pro-SQL nor NoSQL, we’re pro-choice.

None of these technologies is “best” in some absolute sense. But each can be the best choice for a particular application, use case and environment. Relational database technology is the best choice for many applications; relational database technology combined with memcached is the best choice for many other applications; and for others, alternative database technologies represent the best choice. For some applications, a combination of these approaches is indicated. In addition to Matt’s conclusion that “much will depend on the workload in question,” it is also true that this is not strictly a technology decision. Much will also depend on developer and operations skill sets, tool investments, integration requirements, hardware infrastructure and many other factors. Inevitably these too represent inputs into the decision as to which approach, or approaches, should be used for storing the operational data behind a particular software system. Our goal is to provide users with guidance and a choice. NorthScale Memcached Server deploys alongside relational database technology where appropriate; NorthScale Membase Server provides an elastic database technology that is an appropriate choice for a large class of applications and data, and a perfect fit for cloud computing environments. We also believe it is important to provide a clear and seamless path from relational, to relational + memcached, to relational + memcached + membase. I can’t conclude this without also commenting on the opening remarks in Matt’s post. As Matt points out, memcached is a cache (thus the name) and not a key-value store. While some may dismiss that assertion as a sneer, clouding the issue by overloading the name can lead (and has led) to user confusion and data loss. One thing is clear, however: many users have expressed a desire for something that has some of the characteristics of memcached (simple, fast, and horizontally scalable), with full memcached client and API compatibility, but with the additional semantics and guarantees of a data store. The memcached community has been working on a storage engine framework to enable a sane response to just such a request. The storage engine framework has been developed openly as a published development branch, and after further review will be added to the main community development branch. It allows the “front end” of memcached (listener, packet inspector, protocol decoder, some threading support and statistics) to be paired with a back-end that performs data operations under the direction of the front-end. These back end storage engines can provide persistence, replication and countless other capabilities that would be patently counterproductive to force into a monolithic “memcached distribution." NorthScale, along with Zynga and NHN, recently announced Membase Server. This project leverages the memcached storage engine interface to build a key-value store, fully compatible with memcached, without creating a Frankenstein offering (a sorta-kinda memcached that sorta-kinda stores stuff). Probably the most important advantage of this approach is that we are fully compatible with the memcached protocol, and built from the latest code base. Users of our distribution will thus continue to benefit from the efforts of the community (of which NorthScale is the primary contributor of source code). Instead of working in the context of the community, others took a point-in-time snapshot of the memcached project, and began privately making changes. The challenge is that the resultant systems are no longer, to quote one of the leaders of the memcached project, “distributions of memcached.” They are a private fork of what once was memcached - a snapshot of an old version of memcached with proprietary changes to that old (and getting older) code. There is a serious downside to that approach. Users of those systems miss out on what makes open source such a successful model for infrastructure software development: a rich community of innovators evolving and enhancing the software, and stressing and fixing problems in the code. Because NorthScale Memcached Server is a distribution of memcached, users benefit from complete compatibility with the memcached protocol, from SASL authentication, from the bucket engine which enables secure multi-tenancy, and from many other current and future enhancements to the project. Users are not locked into an old code base. And because we are using community supported APIs and interfaces, we are able to build solutions that are compatible with memcached without creating Frankenstein offerings that are sorta-kinda memcached that sorta-kinda store data. And we are not placed in what must be a really uncomfortable position of trying to convince the world that a cache is a database.