Blog Post

MongoDB + Big Data, Almost There

Shane Johnson of Couchbase Published

When it comes to big data, MongoDB is almost there. It's a start, and that's a good thing.

Discuss on Hacker News
Discuss on Reddit

Background

I joined Couchbase in December of 2013. However, I've been passionate about big data technology for years. I started writing about NoSQL in September of 2009 (link), and I wrote about big data while I was a Red Hatter (link).

I thought MongoDB positioned itself as a big data solution. I attended a Gartner presentation on big data and NoSQL. There was a NoSQL database listed on the big data ecosystem slide. It was MongoDB. There is a NoSQL database listed on the Wikipedia big data page. It's MongoDB. Honestly, I'm surprised Matt was "blistered" by people who thought MongoDB and Hadoop were competitors (link).

Couchbase Server is not positioned as a big data solution. However, there is Cloudera certified Hadoop connector for it (link). We have customers leveraging Couchbase Server with Hadoop. We took a thoughtful approach to big data.

  • Hadoop is the foundation of big data solutions.
  • Couchbase Server is a NoSQL database.
  • Couchbase Server is not an alternative to Hadoop.
  • There is a place for NoSQL in big data.

MongoDB + Cloudera

As of April 29th, 2014, I don't think MongoDB is positioning itself as a big data solution. In fact, they're planning proper integration with Hadoop. That’s a great thing for the NoSQL / big data community. However, it represents a first generation big data solution. It relies on importing and exporting data via batch processes. In a first generation big data solution, operational performance and scalability requirements are not a concern. Nor are the requirements for real-time analysis.

Matt cited a use case in which Hadoop analyzes the crowd and a NoSQL database interacts with the individuals. The individual interactions are fed to Hadoop, and the crowd analysis is fed back to the NoSQL database. For Couchbase, this isn't just a use case. It's a customer reference. AOL leverages Hadoop and Couchbase Server to enable targeted advertising (link).

Big Data Central

Big Data Central went live on April 14th, 2014.

We believe the role of NoSQL is to enable the enterprise to meet both operational and analytical requirements, both offline and in real-time. It's to enable second generation big data solutions. The Hadoop ecosystem fulfills analytical requirements. NoSQL fulfills operational requirements. A second generation big data solution relies on integration with Elasticsearch, Storm, and more. It enables real-time analysis and search while meeting operational requirements. It requires a scalable, high performance NoSQL database.

LivePerson has integrated Hadoop, Storm and Couchbase Server to create a second generation big data solution (link). The architecture includes both batch-oriented processing and real-time processing. LivePerson evaluated NoSQL databases from Couchbase, MongoDB, and DataStax. However, only Couchbase Server was able to meet high throughput requirements.

A NoSQL database that is limited to a single lock per database per node (link) and / or difficult to scale will fail to enable a second generation big data solution. That's the difference between MongoDB and Couchbase Server. MongoDB is suitable for first generation big data solutions. Couchbase Server is ideal for both first generation and second generation big data solutions.

More Information

Big Data Central

Big Data Central is a place for the big data community to explore use cases, technologies and architectures. Discover how Couchbase customers such as LivePerson, AOL and PayPal are leveraging NoSQL and Hadoop in big data solutions, first generation and second generation.