Couchbase Engagement Database Platform
Woo-Hoo!!! Have you heard of Engagement Databases? I am sure, “No”, not until Couchbase announced it with 5.0 release. In today’s digital economy, success of a business is dependent on its ability to leverage successful digital transformation to provide superior customer engagement & experience. DNA of all this lies in data, and that calls for a powerful database platform which can tackle the data problems in Systems of Engagement at scale, at ease, and at the roots. These problems are pretty unique and hence the solutions should be too. Superior understanding of the problems in this space and elegant & efficient solutions to those problems led to the discovery of a new category of databases, we call Engagement Databases. Such a database is complementary to the Transactional and Analytical databases you might have in your data fabric infrastructure. Following picture shows how they fit together.
N1QLis the SQL extended query language designed to enable your applications leverage the power of Engagement Databases & JSON. N1QL has all sophistication and bells & whistles necessary to deal with flexible schemas and the hierarchically structured JSON data. You can write simple declarative queries using N1QL to naturally deal with such data.
Couchbase 5.0 brings a number of rich query functionality and performance optimizations in N1QL to meet the demands of customer engagement focussed applications. Many of the functionality and performance improvements also include corresponding enhancements in GSI Indexes to help N1QL fully leverage the optimizations. See Couchbase Server 5.0 what’s new for complete list and details.
Adaptive Indexes are special type of indexes in Couchbase that can index all or a set of specified fields of a document in extremely adaptable way. Unlike composite GSI indexes, such an index is generic in nature. Let me explain.. Adaptive Indexes can efficiently lookup any of the index-key values without restrictions such as prefix index-key ordering, in typical composite indexes. This enables efficient ad-hoc search queries that can use WHERE clause predicates on arbitrary fields without requiring to create multiple composite indexes or different combinations of index keys.
For example, consider a user profile use case. A person’s profile may need to be searched based on any of the personal attributes such as first name, last name, age, city, address, job, title, company, foaf, etc. Similarly, a hotel room availability may be searched based on wide criteria, such as room facilities, amenities, price, location, distance from specific points of interest etc. In such scenarios, traditional secondary composite indexes can’t be used effectively (see section Contrast with Composite Indexes to understand the issues). Adaptive indexes solve this problem by creating one generic index that can run such ad hoc search queries effectively and efficiently.
For example, following figure(1) depicts how three different indexes I1, I2, I3 are needed for the 3 queries Q1, Q2, Q3. Further, a query on any other non-indexed fields is not possible without creating respective indexes. Moreover, a query on just age, such as Q4, will not be able to use the index I2, as it needs the prefix index-key id aswell. Figure(2) shows how a single adaptive index can help queries on all fields irrespective of the index-key order.
Note that, adaptive indexes is not a panacea. These figures may give an impression that adaptive indexes are better than simple/composite indexes. It is not entirely true. If you know queries (and index-keys) apriori, then it is always better to use normal secondary indexes. Because, they would be faster than adaptive indexes. Remember that, adaptive indexes can be much bigger in size as they index multiple/all index keys and corresponding values. So, they may not perform as fast as simple secondary indexes. However, adaptive indexes solve the problems that secondary indexes can’t with little overhead.
Life is all about trade-offs!! For more details and examples, refer to the Adaptive Index documentation.
Powerful Subquery expressions
N1QL is a powerful query language. It is SQL for JSON, and it has all necessary language extensions and expressions to deal with many subtleties in the JSON data modeling, such as:
- Flexible schema
- Missing fields and data values.
- Multi values array values
- Nested data where a field of a document can be another fully formed JSON object
These kind of things don’t exist in relational databases, and hence SQL din’t have to deal with it. However, by design, N1QL is required to address all nuances in working with JSON data. In fact, these are the kind of complex data characteristics and flexibility, and scalability, and simplicity, that has gravitated modern day applications to an Engagement Database Platform like Couchbase.
One such functionality to elegantly simplify working with array data is provided by N1QL’s support for Subquery expressions on Arrays. This allows you to treat an array as a keyspace of documents, and enable you to use all power of N1QL on the array treating the array elements as documents of the keyspace. You may not realize, but that’s much more powerful than how it reads 😉
For example, consider following customer order data:
Here, the array field
products contains the order details about all the products ordered. Now, let’s try to write a query to find the total value of the order. The query should run through each element of the
products array, and compute the sum of
quantity * price. Without using this feature (for ex: when using Couchbase 4.5), the query would look like:
((SELECT RAW SUM(d2u.price * d2u.quantity)
FROM default d2
USE KEYS meta(d1).id
UNNEST d2.products d2u
WHERE d2.customerId = d1.customerId
GROUP BY meta(d2).id)) AS total
FROM default d1 ;
This query itself is made possible by the wonderful UNNEST clause in N1QL’s SELECT statement. However, same query can be rewritten as below, using the Subquery Expression on Arrays:
(SELECT RAW SUM(p.price * p.quantity)
FROM d.products AS p) AS total
FROM default AS d;
Isn’t it ridiculously simple and intuitive!! Infact, it would perform better too. That’s how N1QL empowers application developers, and an Engagement Database enables customers tackle their data challenges in Systems of Engagement applications. There is a lot more to talk about this wonderful feature.. See documentation and this blog.
Btw, imagine modeling this simple data with an RDBMS and write equivalent query in SQL, or another NoSql database. I am sure that can be a fun problem to solve if you wanna lose your weekend!!
External Data Access & Federated Queries
One of the data explosion problems is that customer engaging applications may need to find & integrate with relevant data from multiple internal & external sources. It is more and more important for data-centric applications to stitch their data with external data sources provided by:
Data providers such as www.data.gov
Micro services such as Couchbase Full Text Search, other Couchbase Clusters, or any other data sources that can provide JSON data.
Couchbase Server 5.0 adds a new N1QL function CURL(), which enables N1QL queries to interact and integrate with external JSON data sources available over HTTP/REST. This allows federated N1QL queries against external data sources. For example:
You can access Google geocoding API to get more location details about a given location stored in your database.
You can use Yahoo finance data API to fetch more details about a given stock symbol stored in your database.
SELECT meta().id, to_number(hdp_low) AS hdp_low
LET hdp_low = curl("https://query.yahooapis.com/v1/public/yql",
WHERE to_number(hdp_low) < $min_threshold;
Similarly, you can use FTS indexes in your N1QL query for full-text-searching, or access data from other Couchbase Clusters or other JSON data sources in your N1QL queries (using all the power of N1QL expressions on the external data). For more information, see CURL Function.
RBAC for N1QL Statements
Role Based Access Control (RBAC) is vital for security administration in enterprise deployments. Couchbase Server 5.0 brings RBAC for applications whereby users can be added to the system with various predefined roles. Query service adds support for RBAC with following major features:
Statement level roles are added for executing various N1QL statements such as SELECT, UPDATE, INSERT, and to access system keyspaces.
New N1QL statements GRANT and REVOKE are added aligned with the SQL standard.
New system keyspaces user_info, my_user_info, and applicable_roles are added to access various role assignments details.
Find more information here.
Monitoring and Profiling N1QL Queries
This feature enables fine grain and detailed query monitoring and profiling. This provides detailed information about various query phases, query operators involved in the processing of the query, and corresponding execution timings and characteristics (such as intermediate results). All monitoring information is also captured in various system keyspaces such as system:active_requests and system:completed_requests. This helps in diagnosing troubled queries and performance issues. Following picture shows the visual plan and query profile details in the query work bench.
New query parameters are provided to enable, disable, and control the monitoring capabilities, and can be configured for a query engine or individual queries. See Monitoring N1QL Queries for details.
N1QL Performance Improvements
In Couchbase Server 5.0, N1QL and the Global Secondary Indexes together enable many performance optimizations.
Complex Predicate Pushdown: Supports efficient and exact pushdown of complex AND/OR predicates to index. This helps in:
avoiding spurious results from Indexer in certain queries where predicates have range scans on prefixing index-keys.
efficiently using multiple indexes for different conditions of an OR predicate etc.
Index Projection: In the version 5.0, N1QL adds support to request only the required information from Index, and thus avoid any unnecessary processing.
Support is added to create indexes with ASC/DESC specified for individual index keys, and leverage index-order for queries with aligned ORDER BY clause.
Operators such as OFFSET, MAX(), MIN(), DISTINCT, COUNT (DISTINCT field) are pushed down to indexer.
For more details, see Index Pushdown Optimizations.
I am sure this is pretty exciting and overwhelming. We are super excited to share our vision on Engagement Databases and innovate N1QL to help you gear up!! In this article we talked about many rich N1QL feature and performance enhancements in the Couchbase Server 5.0 Engagement Database Platform. Drop a line if you have any comments or questions. Cheers!!