December 12, 2012

NoSQL and the next wave in the evolution of data management

It is an exciting time to be at Couchbase. And, I get to kick off my first blog post with the announcement of the 2.0 release of our document database. It is my privilege to be a part of a team that is bridging the world of distributed/clustered systems with managing large unstructured and semi-structured datasets. And doing all this within the demanding constraints the Internet imposes – namely, always on, highly scalable, and fast database technology for millions of users, at consumer scale, that expect sub millisecond response times. The launch of 2.0 is a significant milestone for us – all made possible through the work of a talented team and a fantastic community.

Yet, while we make this big step forward, it is humbling to realize that we are still early in this space. Yes, it is just an early inning in a long game of NoSQL technology adoption – and on a grander scale, it is an early phase in what I see as the next major evolutionary wave in data management.

Many years ago, the introduction of the relational database model saw a vigorous community debate about network (CODASYL) database technology vs. the new order of relational database technology. In spite of the skepticism of the “old order,” here we are, some 35 years later, with relational databases playing a pivotal role in managing business data. History generally repeats itself, and this time the “new order” is about managing an explosion of data on the web differently than you would traditional business data. There is a sense of déjà vu. But the old battling the new is not all bad – in fact we should welcome it! I have learned along the way that this battle is necessary to balance equilibrium and sustain evolution.

Today, we have an opportunity to create and experience a generational change as new applications and systems re-shape how we define and use data. As I see it, data is being challenged and re-cast in the crucible of the Internet.

Data, which is the characterization of different forms of abstraction, has been with us from very early times – from ancient writings in cave walls to vast libraries of bound books to posts on today’s social networks. The digitization of such data and its universal access via the Internet has created an inflection point, because this data – along with data about the people creating and using it – is largely characterized as unstructured or semi-structured. Consequently, applications need the flexibility to dynamically interpret the structure and semantics of that data rather than have the database rigidly impose an inflexible data structure that does not fit the application.

We see a proliferation of such user-facing Internet applications and systems, encompassing a large range of industries and use cases including social networks, games, e-commerce, and CRM. They evolve rapidly, are very flexible, and support millions of users. At the data level, they tend to be somewhat forgiving when it comes to immediate consistency, atomicity or isolation. Publishing a blog post or a train schedule is not subject to the exacting transactionality that a payroll or financial application demands.

In contrast to user-facing web and mobile applications, which are agile, fast evolving, support unstructured data, and can forego immediate data consistency, back-office applications are inherently designed to be solid, slow moving, precise and structured, with the need to be stable and foundational. They are a great fit for the traditional relational database models. You would not want to deal with a payroll system or credit card system whose processing semantics change weekly. This tension between “old” and “new” is a good example of Stewart Brand’s pace layering . While these “old” and “new” applications must harmoniously co-exist, they need the freedom to evolve at different rates of change. And clearly, different data management technologies are required to support each.

Document database technology such as Couchbase Server 2.0 is a great fit for the growing wave of interactive web and mobile applications. It supports the need for rapid application development and modification, while providing the scalability, performance and availability needed for these applications to be ready to operate in prime time.  It’s a thrill to be part of bringing this exciting technology to market.

That said, there is a lot of work ahead of us as we continue to evolve not only our own technology but the state of the database industry itself. Query languages with formal but easy alternate definitions of transactionality based around durability. Support for ultra-large datasets with higher data density per node. Search. Support for ultra large clusters. Expanding on true master-master replication as an intrinsic property of the cluster, leading to an innovative solution to high availability. And that’s just the tip of the iceberg.

The team is all fired up to the challenge. And, yes, we are hiring.

Come join the evolution!