Big Data

Apache Hadoop the big data platform. It was designed to derive value from volume. It can store and process a lot of data at rest, big data. It was designed for analytics. It was not designed for velocity.

It’s a warehouse. Is efficient to add and remove many items from a warehouse. It is not efficient to add and remove a single item from a warehouse.

Data sets are stored. Information is generated from historical data, and you can retrieve it. Pure Volume

Fast Data

Apache Storm is the stream processing platform. It was designed to derive value from velocity. It can process data in motion, fast data. It was not designed for volume.

I

t’s a conveyor belt. Items are placed on conveyor belt where they can be processed until they are removed from it. Items do not stay on the conveyor belt indefinitely.  They are placed on it. They are removed from it.

Data items are piped. Information is generated from current data, but you cannot retrieve it. Pure Velocity

The GAP

However, there is something missing. How do items placed on a conveyor belt end up in a warehouse?

Couchbase Server is the enterprise NoSQL database. It is designed to derive value from a combination of volume and velocity (and variety).

It is a box. At the end of the conveyor belt, items are added to boxes. It is efficient to add and remove items from a box. It is efficient to add and remove boxes from a warehouse.

Data items are stored and retrieved. Volume + Velocity + Variety

The Solution

A real-time big data architecture includes a stream processor such as Apache Storm, an enterprise NoSQL database such as Couchbase Server, and a big data platform such as Apache Hadoop.

Option #1

Applications read and write data to Couchbase Server and write data to Apache Storm. Apache Storm analyzes streams of data and writes the results to Couchbase Server using a plugin (i.e. bolt). The data is imported into Apache Hadoop from Couchbase Server using a Sqoop plugin.

Option #2

Applications write data to Apache Storm and read data from Couchbase Server. Apache Storm writes both the data (input) and the information (output) to Couchbase Server. The data is imported into Apache Hadoop from Couchbase Server using a Sqoop plugin.

Option #3

Applications write data to Apache Storm and read data from Couchbase Server. Apache Storm writes the data (input) to both Apache Couchbase and Apache Hadoop. In addition, Apache Storm writes the information (output) to both Couchbase Server and Apache Hadoop.

Summary

This article describes three real-time big data architectures. However, the best thing about designing a real-time big data architecture is that it is like playing with Legos. The components come in many shapes and sizes, and it is up to the architect(s) to select and connect the pieces necessary to build the most efficient and effective solution possible. It is an exciting challenge.

Join the conversation over at reddit (link).
Join the conversation over at Hacker News (link).

Examples

See how these enterprise customers are leveraging Apache Hadoop, Apache Storm, and more with Couchbase Server.

LivePerson – Apache Hadoop + Apache Storm + Couchbase Server (Presentation)
PayPal – Apache Hadoop + Elasticsearch + Couchbase Server (Presentation)
QuestPoint – Apache Hadoop + Couchbase Server (Presentation)
McGraw-Hill Education – Elasticsearch + Couchbase Server (Presentation)

AOL – Apache Hadoop + Couchbase Server (White Paper)
AdAction – Apache Hadoop + Couchbase Server (Case Study)

Reference

Couchbase Server Connectors (link)

Posted by Shane Johnson, Director, Product Marketing, Couchbase

Shane K Johnson was the Director of Product Marketing at Couchbase. Prior to Couchbase, he occupied various roles in developing and evangelism with a background in Java and distributed systems. He has consulted with organizations in the financial, retail, telecommunications, and media industries to draft and implement architectures that relied on distributed systems for data and analysis.

9 Comments

  1. Thank you, very good read. It seems to me the 2nd options is the cleanest approach, but they all are plausible.

    1. Thanks. Another approach would be to configure Apache Storm to write the analyzed data (output) in real time to Couchbase Server while writing the raw data (input) to Apache Hadoop via batch writes.

  2. hai,i have learn to easy.Thanks for that.

    Hadoop Training in Chennai

  3. Dot Net Training in chennai November 14, 2014 at 7:14 am

    Really is very interesting, I saw your website and get more details..Nice work. Thanks regards,

    Refer this link below,

    LoadRunnerTraining in Chennai

  4. Dot Net Training in chennai November 17, 2014 at 7:33 am

    Thanks to Share the LoadRunner Material for Freshers,

    Link as,

    LoadRunnerTraining in Chennai

  5. Kathiresan Muthu November 18, 2014 at 6:42 am

    Thanks to Share the QTP Material for Freshers,
    qtptrainingchennai

  6. I get a lot of great information here and this is what I am searching for Hadoop. Thank you for your sharing this valuable information about big data. I wish to be a regular contributor of your blog, Can you please update your blog with some technical information related to big data. I have bookmarked this page for my future reference. Currently I have completed Hadoop Course in Chennai at a leading IT Academy. It\’s really useful for me to make a bright career. If you are looking for Hadoop Training in Chennai reach FITA, rated as No.1 Training institutes in Chennai. For more details about this course visit this website.

    http://www.fita.in/big-data-ha

Leave a reply