SQL Gene in NoSQL: Introducing N1QL
NoSQL have been a reaction to relational movement that required rigid schema. Relational database bundled SQL query language to powerfully search and reshape the results to compensate. With version 4.0, Couchbase Server contains both the SQL gene and the NoSQL gene in its gene-pool! Couchbase Server with SQL has both the flexible data model with JSON Documents AND it has a powerful SQL language – N1QL, that is by far the most expressive language for JSON data. The combination of both JSON and SQL produces a powerful pairing to allow simplified programmability and powerful new enterprise applications that empower the new digital economy.
Why add a SQL Gene to Couchbase Server?
This is an easy question to answer:
SQL is the most powerful way to query data. SQL has been around for more than 4 decades. While most other languages in its age group are no longer in demand, SQL continues to be the language of choice in many modern data processing platforms.
SQL is the de-facto standard for querying data between various data tools and platforms. SQL sits at the center of wide range of tools from data visualization tools, enterprise reporting environments, analytics apps to ETL platforms.
It also helps that, with its long life, SQL is already a tool in many developers toolchest. It may be the only thing both you and your computer-scientist parent have in common.
|Download Couchbase Server 4.0 Release Candidate|
First Glance at N1QL
With the new SQL gene in Couchbase Server, you are getting an expressive query language and query engine – N1QL, combined with a brand new powerful indexing mechanism – the new global secondary indexes.
This should look familiar to many of you. This is N1QL:
<strong><em>SELECT * FROM bucket
WHERE bucket.quantity>100 OR
bucket.price < 9.95</em></strong>
Cannot tell the difference between the relational SQL you know and Couchbase N1QL? Don't worry… They are very close cousins. However underneath there is a lot that is unique to N1QL.
N1QL is SQL+JSON: Couchbase Server is a document database with flexible schema built on the JSON foundation. With N1QL + JSON development gets simplified. N1QL works with JSON but does not compromise on the completeness of the SQL language by supporting cross document operations like JOINs or with clauses that can pivot JSON data like NEST and UNNEST.
N1QL gives you Performance with a new powerful indexer – GSI: Global secondary indexes enables unmatched low latency and scalability. Global secondary indexes are built to provide the best scan latency and throughput for demanding modern big data applications.
N1QL Scales with the new Multi-dimensional Scaling Model: N1QL query environment and the new global secondary indexes are architected to be independently scaled from the Core Data Operations, removing the common trade offs of other NoSQL and relational databases of choosing a unified scale model for all workloads from core data operations to indexing to queries.
Lets take a more detailed look at these 3 unique properties of N1QL.
Relational SQL operates over tables and columns and needs a strict schema. With N1QL, all queries are executed over JSON data in Couchbase Server. You don't have to deal with a strict tables and columns structure OR worry about changing schema as your data model evolves.
N1QL will adapt to the schema the JSON documents present as they get created and updated in your Couchbase Server bucket. Unlike SQL in relational databases, attributes like quantity or price that are referenced above can be missing. You can express your query either to disqualify documents that do not contain the attribute OR you can query the attribute only when it exists.
<strong><em>SELECT * FROM bucket b1
WHERE b1.quantity>100 OR
(b1.price IS NOT MISSING AND b1.price < 9.95)</em></strong>
Another important difference is that N1QL delivers JSON result-sets an lets you shape the JSON with powerful operators like JOINs NEST and UNNEST.
"beer_name": "Benchwarmer Porter",
"brewery_name": "Cooperstown Brewing Company",
"country": "United States"
"beer_name": "Old Jubilation Ale",
"brewery_name": "Avery Brewing Company",
"country": "United States"
For those of you using classic reporting, data visualization or ETL (extract-transform-load) tools: good news is that you can get high fidelity ODBC and JDBC drivers that give you seamless integration with a “rows and columns” style resultset from N1QL. Thanks to Simba ODBC and JDBC drivers, you can use your favorite tools like Tableau, Excel or PowerBI tools with the ODBC/JDBC drivers to integrate your full data lifecycle story together.
High Performance Queries with the New Global Secondary Indexing (GSI)
GSI bring in a new powerful indexer in addition to the View and Spatial indexers with incremental map-reduce in Couchbase Server 4.0. GSI capability is purpose built for N1QL and for big data applications with high rate of queries. GSIs achieves the great performance boost with a unique architecture which minimize the effects of wide scatter-gathers for processing queries.
GSIs come with independent partitioning and scalability with Multi-dimensional Scaling (more on MDS later). GSI is also built on a foundation of a strong storage and caching engine called ForestDB.
In the figure below, the left and right side shows the rough steps of execution of a common query.
Query Without GSI: The case on the left depicts how many existing products work to execute queries in a distributed environment. In absence of GSI, indexes are distributed aligned to the data partitioning scheme over N nodes. The execution plan for a typical query entail a wide scatter to N nodes, a gather from N nodes to capture the results from each node, and a final execution that compiles the final results of the query.
Query with GSI: Execution steps with GSI does not need to scatter-gather. Query can hit the index that exist on a single node and return the results.
The way to manage GSI should be very familiar if you have worked with relational indexes before.
<strong><em>CREATE INDEX friends_index
WHERE type=”user profile”
New Scalability Model: Multidimensional Scaling
To explain multi-dimensional scaling, lets start from the high level server architecture: With Couchbase Server 4.0, each node contains a cluster manager, data, index and query service components as well as the storage engine and managed cache.
Here are the details on each component:
Cluster Manager is responsible for governing cluster operations. Nodes joining and leaving the cluster or operations like rebalance are performed by the cluster manager.
Data Service is responsible for core data operations through get/set API which allows you to interact with a given key. Data Service also contains the View engine: the incremental Map-Reduce engine that existed in previous versions.
Index Service manages the new indexes that are purpose built for N1QL queries. The new global secondary indexes are maintained and can be queries through the index service.
Query Service manages the N1QL query interactions. It receives and executes the query to generate the final result for the applications.
All Services comes with core services that manage their storage and caching needs.
All smart clients and ODBC and JDBC drivers come with a “cluster map” that describe which nodes to go to for the given API calls for most efficient execution.
Even though every node contains the identical set of components, with the new scalability model, you can elastically turn on/off the services in blue on each node and create “zones” that independently scale services such as data, index and query operations and pick the best HW design per zone based on your workload.
You can still deploy the cluster the way you do today with Couchbase Server 3.0 – pick a single HW setup for all nodes, enable all the services on all the nodes and distribute the load across all the nodes. This is your good old uniform horizontal scale out. However the work needed to be done on the nodes to do core data operations like fast GET/SETs vs maintain GSI Indexes vs Process N1QL Queries all have competing requirements. Sharing the same node does not always make sense if you are sensitive to latencies of your operations.
The added model of multi-dimensional scalability allow you to divide the cluster into “zones” that run individual services (data, query and index) and in turn, allow independently choice of the HW for each zone. You can choose the best HW that goes with the type of indexing and querying you want to do and still maintain low latency in your core GET/SET operations.
In the figure below, the cluster has 3 zones that separate data, index and query services. Gray boxes represent services that are disabled on a given node.
There is more… much more…
We scratched the surface of N1QL and for a deeper view, you can download and play with the Release Candidate version of Couchbase Server 4.0.