That’s right. A modern big data solution requires more than Hadoop. Welcome to the data, it’s all big and fast.

Welcome to Big Data Central

Discuss on Hacker News

Discuss on Reddit

I’m excited to announce that Big Data Central is live!

It represents my big data story for Couchbase. It’s about the role of NoSQL databases in a world of big data.

There was a time when big data was Hadoop. It was offline analytics. That’s no longer the case. It’s a solution. It’s a solution that includes Hadoop but is not Hadoop. It’s a solution that meets both real-time analytical requirements and offline analytical requirements. It’s a solution that meets both analytical requirements and operational requirements.

The big data ecosystem now includes Storm for real-time processing, Couchbase Server for high performance data access, Hadoop for offline analytics, and more!

There are three big data challenges:

  1. The amount of data being generated, data volume.
  2. The rate at which data is being generated, data velocity.
  3. The rate at which information must be generated, information velocity.

Hadoop addresses data volume. It can store and process a lot of data, later. It scales out to store and process more data. Hadoop does not address data velocity. However, it meets offline analytical requirements.

Couchbase Server addresses data velocity. It is a high performance NoSQL database that can store a lot of data, now. It scales out to store a lot of data, faster. Couchbase Server does not address information velocity. It can store and process data at rest. However, it meets operational requirements.

Storm addresses information velocity. It can process a real-time stream of data. It scales out to process streams of data, faster. Storm does not address volume or data velocity. It does not store data. It processes data in motion. However, it meets real-time analytical requirements.

All three big data challenges can be met by integrating Storm, Couchbase Server, and Hadoop. By integrating Couchbase Server with storm, a real-time stream of data can be processed and stored. By integrating Couchbase Server with Hadoop, a lot of data can be processed offline.

Author

Posted by Shane Johnson, Director, Product Marketing, Couchbase

Shane K Johnson was the Director of Product Marketing at Couchbase. Prior to Couchbase, he occupied various roles in developing and evangelism with a background in Java and distributed systems. He has consulted with organizations in the financial, retail, telecommunications, and media industries to draft and implement architectures that relied on distributed systems for data and analysis.

5 Comments

  1. Shane, very nice article on Big Data. With the explosion of big data, companies are faced with data challenges in three different areas. First, you know the type of results you want from your data but it’s computationally difficult to obtain. Second, you know the questions to ask but struggle with the answers and need to do data mining to help find those answers. And third is in the area of data exploration where you need to reveal the unknowns and look through the data for patterns and hidden relationships. The open source HPCC Systems big data processing platform can help companies with these challenges by deriving insights from massive data sets quick and simple. Designed by data scientists, it is a complete integrated solution from data ingestion and data processing to data delivery. Their built-in Machine Learning Library and Matrix processing algorithms can assist with business intelligence and predictive analytics. More at http://hpccsystems.com

  2. Yes, to understand modern Hadoop, everyone need to learn Apache Storm, Spark, MapReduce, hbase etc.

    Apache Storm is an open source engine which can process data in realtime using its distributed architecture. Storm is simple and flexible. It can be used with any programming language of your choice.

    Let’s look at the various components of a Storm Cluster:

    1 – Nimbus node. The master node (Similar to JobTracker)

    2 – Supervisor nodes. Starts/stops workers & communicates with Nimbus through Zookeeper

    3 – ZooKeeper nodes. Coordinates the Storm cluster

    Both Spark and Storm can operate in a Hadoop cluster and access Hadoop storage. Storm-YARN is Yahoo’s open source implementation of Storm and Hadoop convergence. Spark is providing native integration for Hadoop. Integration with Hadoop is achieved through YARN (NextGen MapReduce). Integrating real time analytics with Hadoop based systems allows for better utilization of cluster resources through computational elasticity and being in the same cluster means that network transfers can be minimal.

    I can\’t share complete information related to hadoop, spark anh storm so please visit below lnks for informative tutorials.

    For topics to learn or understand:- http://intellipaat.com/hadoop-

    For YouTube Tutorials :- https://www.youtube.com/user/i

  3. Thanks for Sharing such a Wonderful Information….
    Learn Hadoop through Online for Details Please go through the Link
    http://www.leadonlinetraining….

  4. MindsMapped Consulting October 16, 2015 at 9:24 pm

    Good article. Love to read three challenges. http://www.mindsmapped.com/big

Leave a reply