Business applications have requirements: take customer orders, deliver customer orders, track shipping, generate inventory report, end of the day/month/quarter business report, generate business dashboards and more.  These requirements evolve slowly. They remain even when you choose a NoSQL database.

On NoSQL databases, challenges are addressed by a multitude of technologies and workarounds. Here are some of them:

  1. Copying the data with a different key to make the scans easier.
  2. Fetch all of the data needed to the application and then generate reports
  3. Load the data into relational databases to generate the report.
  4. Products provide workarounds with map-reduce views, APIs, etc.
  5. Finally, SQL itself has been unreasonably effective for both structured and semi-structured data. NoSQL databases like Couchbase, Cassandra, CosmosDB have extended SQL for JSON and wide-column data model.

NoSQL has evolved from “NO SQL” to Not Only SQL.  If you’re interested in some of the evolutionary background combined with a history of SQL evolution to support semi-structured data, I suggest the following interviews.

  1. Ravi Mayuram’s interview with Don Chamberlin[2017]:  https://youtu.be/-U_UjqnhMBI?t=3492
  2. A panel discussion between Ravi Mayuram, Don Chamberlin and Prof. Mike Carey [2018]:  https://www.youtube.com/watch?v=LAlDe1w7wxc

A lot of NoSQL databases claim “SQL Support”. The SQL standard is wide and deep, covered in nine voluminous books. No one, neither Oracle nor SQL Server, supports everything in the standard despite decades of work. So, the NoSQL databases have a long way to go in catching up. So, a detailed evaluation of the SQL support is worth your while. 

Here are the criteria you should use to evaluate the SQL support in NoSQL databases.

  1. Language support: What statements, data types, operations (joins, grouping, aggregation, windowing, pagination, etc)
  2. Indexing support:  Indexes are key to performance, especially in the workloads for interactive applications.
  3. Optimizer: Query rewrite, choosing the right access path, creating the optimal query execution path is what makes SQL a successful 4GL.  Some have a rule-based optimizer, some have a cost-based optimizer, some others have both.  Evaluating the quality of the optimizer is critical.  Typical benchmarks (TPC-C, TPC-DS, YCSB, YCSB-JSON) won’t help you here.
  4. As the saying goes: ” There are three things important in databases: performance, performance, and performance”. It’s important to measure the performance of your workload.  YCSB and the extended YCSB-JSON will make this evaluation easier.
  5. SDKs: Rich SDKs and language support speed up your development.
  6.  BI tool support: For large data analysis, support from BI tools usually via standard database connectivity drivers is important.

In this article, I’ll compare and contrast the SQL language support in Cassandra, CosmosDB, Couchbase, and MongoDB via their respective implementations.  To be sure, MongoDB doesn’t support SQL but have some comparative commands.

I’ve divided the analysis into multiple sections.  WordPress formatting makes these tables too big.  Here is a PDF version that’s compact and easy to read. [Click on the image to view the PDF]

Summary of the SQL Support in Cassandra, CosmosDB, Couchbase, and MongoDB.

 

 

 

 

 

SQL Support approach:  

SQL SQL is a declarative language, select-join-project operations serving as the foundation.
Cassandra CQL: SQL inspired language for Cassandra.
CosmosDB Supports SQL along with MongoDB API, Gremlin (for the graph), etc. Simple select-from-where-order-by support.  
Couchbase N1QL: SQL for JSON. 

Couchbase has two implementations of N1QL: query service and analytics service.

MongoDB Javascript based, simplified SQL command based query.

INPUT and OUTPUT

SQL INPUT: Set of rows (tuples)

OUTPUT: A set of rows (tuples)

Cassandra INPUT: Sets of rows

OUTPUT: Set of rows

CosmosDB INPUT: Sets of JSON

OUTPUT: Set of JSON

Couchbase INPUT: Sets of JSON

OUTPUT: Set of JSON

MongoDB INPUT: Sets of JSON

OUTPUT: Set of JSON

SELECT: FROM Clause

SQL Specifies the datasource tables (relations)
Cassandra FROM clause with only one table allowed.  Now joins, subqueries or expressions are allowed.  From clause interpretation is same as SQL.
CosmosDB FROM clause supports a single collection and self joins (same as UNNEST in Couchbase).   
Couchbase FROM clause with multiple keyspaces (subset of bucket), subqueries, expressions.  Same as SQL
MongoDB db.t1.find() for single table SELECT.

db.t1.aggregate() is used as the generalized query framework. aggregate() can join with additional collections using $lookup operator.  There can be multiple $lookup operators in the aggregation pipeline framework making it the closest cousin of the SQL FROM clause.

SELECT: WHERE Clause

SQL Criteria for selecting a row
Cassandra Standard boolean expressions. No subqueries.
CosmosDB Same as SQL
Couchbase Standard boolean expressions and subqueries.
MongoDB db.t1.find({x:10});

aggregate() has the $match clause.

SELECT: SELECT Clause

SQL Projection clause
Cassandra SELECT clause is same as SQL.
CosmosDB SELECT clause is same as SQL.
Couchbase SELECT clause is same as SQL.
MongoDB db.t1.find({x:10}, {a:1, b:1})

$project operator in the aggregation pipeline

SELECT: CTE – Common Table Expression

SQL Dynamically defined data source (table, resultset)
Cassandra Unsupported
CosmosDB Unsupported
Couchbase WITH clause; same as SQL (in v6.5). Recursive CTE is unsupported
MongoDB Unsupported

SELECT: Subquery

SQL Subquery: Subqueries in the FROM clause, WHERE clause, anywhere an expression is allowed.
Cassandra Unsupported
Cosmosdb Unsupported
Couchbase Supports both correlated and non-correlated subqueries.
MongoDB Unsupported in find(). Can add $match in the pipeline, but not exactly an equivalent of a subquery.

SELECT: GROUP BY

SQL Group the rows based on one or more expressions. Quite useful in reporting and aggregation for groups.
Cassandra Supported; Same as SQL.
Cosmosdb Unsupported. Can only do the aggregation on the whole resultset.
Couchbase Supported; Similar as SQL.
MongoDB $group operator in aggregate() pipeline

SELECT: HAVING clause

SQL Filtering after the aggregation.
Cassandra Unsupported
CosmosDB Unsupported
Couchbase HAVING clause; Same as SQL
MongoDB $match after the grouping and aggregation.

SELECT: ORDER BY Clause

SQL The final order of the results produced by the query block
Cassandra ORDER BY clause; Same as SQL.
Cosmosdb ORDER BY clause; Same as SQL.
Couchbase ORDER BY clause; Same as SQL.
MongoDB db.t1.find().sort({a:1, b:-1});

aggregate() has $sort to specify the result order.

SELECT: LIMIT, OFFSET Clause

SQL Used for pagination of the resultset
Cassandra “LIMIT is supported.

OFFSET is unsupported.”

CosmosDB LIMIT (TOP) and OFFSET clause; Similar to SQL
Couchbase LIMIT and OFFSET clause; Same as SQL
MongoDB skip(), limit() methods with find(). $offset, $limit with aggregate().

SELECT: JOIN clause

SQL INNER JOIN, LEFT/RIGHT/FULL outer joins.
Cassandra Joins are unsupported.  Applications will have to model the data to avoid joins or do the joins in the application layer.
Cosmosdb Only self JOINs.  No INNER/LEFT/RIGHT/etc joins.
Couchbase Supports INNER, LEFT OUTER, NEST, UNNEST and limited RIGHT outer.  Same syntax as SQL. FULL OUTER join is unsupported.
MongoDB Limited LEFT OUTER JOIN only via $lookup operator.  No join on array elements or expressions.

SELECT: Aggregation

SQL Aggregation
Cassandra Simple aggregation on the whole result is supported.  Aggregation with GROUP BY is unsupported.
CosmosDB Simple aggregation on the whole result is supported.  Aggregation with GROUP BY is unsupported.
Couchbase SUM, AVG, COUNT, MAX, MIN, VARIANCE: same as SQL
MongoDB $sum, $count, $avg with grouping support

SELECT: aggregate functions

SQL Simple aggregation on the whole result is supported.  Aggregation with GROUP BY is unsupported.
Cassandra Unsupported
Cosmosdb Unsupported
Couchbase Supports SQL Standard window analytical functions in 6.5.
MongoDB Unsupported

SELECT : Window (analytics/aggregation) functions

SQL Window functions for running totals using the OVER() clause
Cassandra Unsupported
CosmosDB Unsupported
Couchbase Supports SQL Standard window analytical functions in 6.5.

See details: https://www.couchbase.com/blog/json-to-insights-fast-and-easy/

https://www.couchbase.com/blog/get-a-bigger-picture-with-n1ql-window-functions-and-cte/

MongoDB Unsupported

INSERT:  Single row/document insert.

SQL Insert a single row
Cassandra INSERT statement
CosmosDB API Insert
Couchbase INSERT statement
MongoDB db.t1.save()

INSERT:  Multiple row/document insert.

SQL INSERT statement
Cassandra Unsupported
Cosmosdb Unsupported
Couchbase INSERT with Multiple documents
MongoDB db.t1.insert()

DELETE Statement

SQL Delete one or more documents
Cassandra DELETE statement; Same as SQL
Cosmosdb API delete
Couchbase DELETE statement; Same as SQL
MongoDB db.t1.delete()

UPSERT statement

SQL INSERT.  UPDATE if exists.
Cassandra Unsupported
Cosmosdb Unsupported
Couchbase UPSERT statement.
MongoDB Unsupported

UPDATE Statement

SQL
Cassandra UPDATE; Same as SQL
CosmosDB API update
Couchbase UPDATE; Same as SQL
MongoDB db.t1.update()

MERGE: Merge one relation (set of rows) to another.

SQL Merge a set of rows (documents) into another.
Cassandra Unsupported
CosmosDB Unsupported
Couchbase MERGE statement, same as SQL.
MongoDB Unsupported

PREPARE statement

SQL Parse, analyze and create an execution plan.
Cassandra Supported.   I see excamples of preparedStatement() in Java SDK.dd
CosmosDB Unsupported
Couchbase Supported; PREPARE
MongoDB Unsupported

EXECUTE

SQL Execute an ad-hoc or prepared statement.
Cassandra Supported in Java.
Cosmosdb Unsupported
Couchbase Supported, similar to SQL.
MongoDB Unsupported

GRANT/REVOKE

SQL Grant/REVOKE permissions for specific operation on the data set
Cassandra GRANT, REVOKE
CosmosDB API support
Couchbase GRANT ROLE, REVOKE ROLE
MongoDB TBD??

DESCRIBE statement

SQL Describes the schema of a table
Cassandra DESCRIBE
Cosmosdb Unsupported
Couchbase INFER describes the schema of the documents
MongoDB Compass tool — graphical only.

TRUNCATE statement

SQL Truncates the data in the table without altering security or physical schema.
Cassandra TRUNCATE
CosmosDB Unsupported
Couchbase FLUSH operation
MongoDB Unsupported.  Workaround via remove collection, recreate with the same security settings.

Value Logic (boolean values)

SQL True, False, NULL (Unknown)
Cassandra True, False, NULL (Unknown)
Cosmosdb True, False, NULL (Unknown)
Couchbase True, False, NULL (Unknown), Missing

https://docs.couchbase.com/server/4.0/n1ql/n1ql-language-reference/booleanlogic.html

MongoDB True, False, NULL (Unknown)

Query Optimizer: Type of Optimizer

SQL Rule-based and cost-based optimizer.  Does query rewrites, index selection, join ordering, join type selection and position of the tablers (inner/outer, hash-table build/probe)
Cassandra Rule-based optimizer.  Index selection for the single table is done since there are no joins.
CosmosDB Rule-based optimizer mainly does index selection.
Couchbase Rule-based optimizer, index(es) selection.  Block Nested Loop join by default but supports hash join via user hint in the query.
MongoDB “Shape-based” optimizer according to the docs.  Each new query is matched with a query based on the “shape”.  The first time a query is run, the optimizer does index selection, but when there are multiple candidates, it’ll run multiple queries concurrently to see who returns the results first.

Query Optimizer: Index Selection

SQL Yes
Cassandra Yes
CosmosDB Yes
Couchbase Yes
MongoDB Yes

Query Optimizer: Query Rewrite

SQL Rewrite parts of the query to logical equivalent to better performance. E.g. Subquery rewrite, view folding, join type conversion, constant expression evaluation, etc.
Cassandra None
Cosmosdb No known rewrites
Couchbase Basic query rewrite.  LEFT OUTER to INNER when applicable, constant expression evaluation.
MongoDB None

Query Optimization: JOIN Type

SQL Choose the most efficient index, from the available join types.
Cassandra Not applicable since joins are unsupported.
CosmosDB Not clearly documented.
Couchbase Nested loop by default. Hash join by user hint.
MongoDB Just the nested loop is supported.

TRANSACTION support.  

SQL ACID support with multi-row and multi-statement support.
Cassandra No
Cosmosdb Yes
Couchbase No
MongoDB Yes, in 4.0

Indexes

SQL Data structures maintained to speed up the query performance.  Indexes are
Cassandra Supports primary, secondary, array-indexes.  Need to install and index the data on SOLR for a search index.
CosmosDB Indexes everything by default: scalars, arrays.  No support for the search index.
Couchbase Supports primary, secondary, composite, functional, adaptive, search, spatial, partitioned and replica index.  Indexes are eventually consistent.
MongoDB Supports primary, secondary, composite, search, spatial, partitioned and replica index.  The search index is simplistically created on a B-Tree.

SQL: Datatype support.

SQL Extensive numerical, character, date-time data type support.
Cassandra Numeric, decimal, double. Int, float, varint, Timestamp, collection (set, list)
CosmosDB JSON data types: numeric, string, boolean, object, arrays
Couchbase JSON data types: numeric, string, boolean, object, arrays
MongoDB JSON data types: numeric, string, boolean, object, arrays and custom extensions for the timestamp datatype.

Conclusion:

The popular NoSQL databases have tried to extend and support the declarative SQL for the respective data model and architectures. So, it’s important to understand the capabilities, limitations of the features and architecture during the evaluation.

References:

  1. The Unreasonable Effectiveness of SQL

Author

Posted by Keshav Murthy

Keshav Murthy is a Vice President at Couchbase R&D. Previously, he was at MapR, IBM, Informix, Sybase, with more than 20 years of experience in database design & development. He lead the SQL and NoSQL R&D team at IBM Informix. He has received two President's Club awards at Couchbase, two Outstanding Technical Achievement Awards at IBM. Keshav has a bachelor's degree in Computer Science and Engineering from the University of Mysore, India, holds ten US patents and has three US patents pending.

One Comment

  1. […] transactions and their semantics. SQL as a query language has been unreasonably effective even in NoSQL database systems. However, few NoSQL database systems support transactions. The ones that support come with a long […]

Leave a reply