October 12, 2010

Membase and Cloudera Integration

Today is an exciting day for Membase. A number of us are attending Hadoop World 2010 in New York City, and if the event reception tonight is any indication of things to come tomorrow, it is going to be an event I’d have hated to miss. A very smart crowd of data scientists on the leading edge of applying Hadoop, and Membase, to solve some extremely interesting, and diverse, application and data management problems.

We’ve been working very closely with Cloudera over the last year in a number of customer environments where Membase and Cloudera have been jointly evaluated and deployed. Along the way, Mike Olson, CEO of Cloudera, and formerly CEO of Sleepycat Software (which distributed Berkeley DB and was acquired by Oracle in 2006), joined our advisory board, in the context of which he has been an invaluable contributor and a friend.

Today, we announced the culmination of all our joint work. Naturally, we chose to announce it together at Hadoop World. There are three components to our announcement: technology integration, go-to-market relationship and joint customer success stories.

On the technology integration front, we have built and are making available to customers two mechanisms for integrating Membase and Cloudera Distribution for Hadoop (CDH). The first is a Membase NodeCode module that can stream data from Membase to CDH in real-time. As new operational data enters Membase, it can be massaged in real time and pumped into a CDH cluster for processing. The second is a Sqoop-derived batch loader utility that enables loading of data from Membase to CDH, and vice versa.

On the business front, we have been working very closely with Cloudera in customer environments where Membase and Hadoop have been useful in concert. A number of specific use cases have emerged from this joint work and we’ve formalized a program to wrap those joint solutions in to offerings we will jointly market and sell. The use cases include ad, offer and content targeting; log and event stream capture and analysis; and social gaming. In each of these customer scenarios, Membase and Hadoop combine to solve a problem that would be impossible to solve with either solution on its own.
 
But the most interesting part of all this for me, is the two joint customers we announced who are successfully using Membase in concert with CDH: Aol and ShareThis. They have built two of the world’s most advanced ad targeting systems that combine the best of CDH to boil down large quantities of event information tied to users (cookies) into user profiles. These profiles are fed into Membase where they can be served with sub-millisecond latency. This combination of high-powered analytics with raw real-time performance is a powerful combination that has made a huge impact for these users.

There are many more users in the pipeline who see tremendous advantage in a joint Membase-CDH deployment.

If you happen to be at Hadoop World, drop by the session at 1:45 this afternoon to hear both Aol and ShareThis talk about their experience combining Membase with Cloudera Distribution for Hadoop.

Comments