June 2, 2014

The Internet of Things, it's not Big Data

That's right. The Internet of Things is not big data. It's continuous data. If big data is an ocean, continuous data is a tributary. And...

A tributary does not flow directly into a sea or ocean. Wikipedia

Nor does data flow directly into a big data platform. A big data platform is volume. It's not velocity, and it's not variety. It flows directly into a stream processor and / or database, relational or NoSQL, before it flows into a big data platform.

It presents two challenges for the database:

  • The rate of data flow.
  • The number of data flows.

The Rate of Data Flow

A wind turbine does not read and write to a big data platform. A big data platform is engineered for discrete, unstructured data. A wind turbine generates continuous, semi-structured data. It generates thousands of data points per second. However, it could append sensor data to a local file and the file could be imported into a big data platform. However, it's no longer real-time data. It fails to enable operational agility.

The Number of Data Flows

There are 14 billion things connected to the Internet. There are 50 billions sensors feeding things data. That's a lot of data flows.

What does this have to do with Couchbase?

It's the smart refrigerator. I want one. When I drink the last of the milk, I want my refrigerator to know it. I want it to maintain a grocery list for me. I'm willing to scan the barcard on an empty gallone of milk before I throw it in the trash with a scanner on the refridgerator door. When I go to the grocery store, I want to display my grocery list on a mobile phone. Perhaps it's my mobile phone. Perhaps it my wife's mobile phone.

Semi-Structured Data

The data should be semi-structured. Why? It's a list. It's simple. It could be stored in rows and columns. However, what if the application is updated to track inventory? I want my refrigerator to know how many bottles of water I have left. Should the developer have to submit a change request to the database administrator to modify the schema? No. What if the application is updated to show me the price of bottled water at different grocery stores so that I can add it to a specific grocery list? Should the developers have to submit a second change request to the database administrator to modify the schema, again? No. That's why the intelligent enterprises relies on Couchbase Server. The flexible data model increases developer productivity, reduces development costs and reduces time to market. It increases market agility.

Scalability

I'm assuming that everyone wants a smart refridgerator. I'll be the first customer, but what happens when there are a thousand customers, and then tens of thousands of customers, and finally millions of customers? It's going to be the best thing since sliced bread. However, how will database administrators scale a relational database to support millions of customers and billions of data points? How difficult will it be? Too difficult. How much time will time and effort will it require? Too much. That's why the intelligent enterprise relies on Couchbase Server. The distributed, shared-nothing architecture increases operational efficiency and reduces operational costs. It increases operational agility.

When I say everyone wants a smart refridgerator, I mean everyone. It mean consumers in North America, LATAM, EMEA, APAC and more. It's one thing to scale a database within a data center. It's another thing to scale a database to multiple data centers.

Why scale beyond the data center?

Data Locality

  • A smart refridgerator in California should read and write to a database in the US.
  • A smart refridgerator in Dublin should read and write to a database in Ireland.
  • A smart refridgerator in Tokyo should read and write to a database in Japan.

High Availability

  • If a node fails, the database should remain available.
  • If a rack fails, the database should remain available.
  • If a data center fails, the database should remain available.

Summary

Couchbase Server supports global data locality with cross data center replication (XDCR). It supports global availability with rack awareness.

It increases global reach.

Comments