We’re excited to announce the first Developer Preview of Couchbase Analytics, which adds parallel data management to Couchbase Server. This increases the spectrum of queries that Couchbase Server handles without compromising the basic design principles of being agile, fast, and elastic.  

Analytics Developer Preview 1 is an early sneak peek, the earliest we could release that shows the basic functionality and interface. There’s a lot yet to come, including integration with Couchbase Server so that Analytics behaves like a proper Couchbase service in the sense of Multidimensional Scaling (MDS). For now, Analytics runs alongside a Couchbase Server instance and synchronizes with the data service using DCP, but it’s otherwise standalone.

  • Couchbase Analytics Documentation – LINK

  • Download – LINK (See “Extensions”)

Hello, Analytics!

Couchbase Analytics adds parallel data management to Couchbase Server to complement the capabilities offered by the Query and Index services. Couchbase Analytics is designed to efficiently run complex queries over many records. By complex queries, we mean large ad hoc join, set, aggregation, and grouping operations, any of which may result in long running queries, high CPU usage, high memory consumption, and excessive network latency in data fetching and cross node coordination. Analytics can satisfy queries so big that they require query processing from multiple nodes working together.

Regardless of the technology used, analytic queries might be predetermined or ad hoc, and might be cheap or expensive depending on how much data processing they need. Performance challenges can arise when queries access large numbers of documents and when queries are not supported by a secondary index, as often happens with ad hoc analytics such of the type users perform using data visualization and exploration tools.

Couchbase Analytics is designed to support truly ad hoc queries in a reasonable amount of time, even when scans are required. Because Analytics supports efficient parallel query processing and bulk data handling, Couchbase Analytics is still preferred for expensive queries, even when those queries are predetermined and might therefore be supported by an index.

The Couchbase Analytics approach has significant advantages compared to alternatives:

  • Common data model: Couchbase Analytics natively supports the same rich, flexible-schema document data model used in Couchbase Server, rather than trying to force your data into an RDBMS model.
  • Workload isolation: Operational query latency and throughput are protected from slow downs due to analytical query workload – without the complexity of operating a separate analytical database.
  • High data freshness: Couchbase Analytics uses DCP, a fast memory-to-memory protocol that Couchbase Server nodes use to synchronize data among themselves. Because of this, analytics run on data that’s extremely current, without hassles or delays from ETL (extract, transform, load).

SQL++ Query Language

Couchbase Analytics is programmed using the SQL++ query language, which is a next-generation declarative query language. SQL++ has much in common with SQL, but it also includes a small number of extensions that address the different data models the two languages were designed to serve. Compared to SQL, SQL++ is much newer and targets the nested, schema-optional or even schema-less world of modern NoSQL systems.

You may wonder why Couchbase Analytics uses a query language other than N1QL, the query language used by Couchbase Server’s query service. Don’t worry, this is a temporary situation. Both SQL++ and N1QL are close to each other; in the long term, the two query languages will merge so that Couchbase Server can be queried using one single query language. In the meantime, if you’re familiar with N1QL, you should find yourself right at home in SQL++.

You can find out all about the language supported by Couchbase Analytics by consulting the SQL++ Language Reference

Join us at Couchbase Connect

We invite you to join us at Couchbase Connect for more on analytics. We welcome your feedback. Want to learn more? Drop by and visit us at the kiosks, or catch a session:

  • SQL++: SQL for NoSQL by Professor Yannis Papakonstantinou, University of California, San Diego (Wednesday, November 9, 3:10 pm – 4:00 pm)
  • From SQL to NoSQL: the fourth time’s the charm by Professor Mike Carey, University of California, Irvine (Tuesday, November 8, 9:00 AM –  9:50 AM)
  • Sneak peek: Couchbase Analytics by Till Westmann & Yingyi Bu, Couchbase (Wednesday, November 9, 2:00 pm – 2:50 pm)

 

We hope to see you there!

Author

Posted by Will Gardella, Director, Product Management, Couchbase

Will Gardella is Director of Product Management for analytics at Couchbase. Previously, he was a product manager in the big data platform team at HP, a senior director of product management for SAP HANA, and the senior director of SAP Research's global Big Data program focused on big data and machine learning.

Leave a reply