SQL is the only 22nd-century language available for developers today.

ABSTRACT

In relational database systems, SQL is more than a declarative query language. It includes procedural language (T-SQL, PL/SQL, etc) and defines transactions and its semantics.  SQL as a query language has been unreasonably effective even in NoSQL database systems. However, few NoSQL database systems support transactions.  The ones that support come with a long list of limitations and/or were unable to support SQL operations within the transaction.  We introduce and explain transactions in Couchbase N1QL: SQL for JSON.  N1QL transactions are multi-everything: multi-document, multi-bucket, multi-scope, multi collection, and multi-DML-statement.

INTRODUCTION

N1QL is a declarative language to manipulate JSON. Couchbase stores all the documents in the data service.  The query service orchestrates the query execution optimizing the query, creating an execution plan, and then executing it using data, indexing, and FTS.  The Couchbase SDK and query interaction protocol is built via REST over HTTP/S.   N1QL DML statements include SELECT, INSERT, UPDATE, UPSERT, DELETE, and MERGE.  

N1QL TRANSACTIONS

Here’s an example of transactions in RDBMS and Couchbase N1QL.

Transactions
MySQL Database (Statements are same/similar in Oracle, SQL Server, Informix & DB2)
Couchbase Database (Cheshire-Cat)
Insert data. Tuples in MySQL, JSON documents in Couchbase
INSERT INTO customer(cid, name, balance) VALUES(4872, “John Doe”, 724.23);
INSERT INTO customer(cid, name, balance) VALUES(1924, “Bob Stanton”, 2735.48);
INSERT INTO customer
VALUES(“cx4872”, {“cid”: 4872, “name”:”John Doe”, “balance”:724.23});
INSERT INTO customer
VALUES(“cx1924”, {“cid”: 1924, “name”:”Bob Stanton”, “balance”:2735.48});
Simple transaction, debit, and credit. Intermediate selects have to be read their own updates (RYOW)
START TRANSACTION;
UPDATE customer SET balance = balance + 100 WHERE cid = 4872;
SELECT cid, name, balance  from customer;
UPDATE customer SET balance = balance – 100 WHERE cid = 1924;
SELECT cid, name, balance from customer;
COMMIT ;
START TRANSACTION;
UPDATE customer SET balance = balance + 100 WHERE cid = 4872;
SELECT cid, name, balance  from customer;
UPDATE customer SET balance = balance – 100 WHERE cid = 1924;
SELECT cid, name, balance from customer;
COMMIT ;
The second transaction with partial rollback.
START TRANSACTION;
UPDATE customer SET balance = balance + 100 WHERE cid = 4872;
SELECT cid, name, balance  from customer;
SAVEPOINT s1;
UPDATE customer SET balance = balance – 100 WHERE cid = 1924;
SELECT cid, name, balance from customer;
ROLLBACK WORK TO SAVEPOINT s1;
SELECT cid, name, balance from customer;
COMMIT ;
START TRANSACTION;
UPDATE customer SET balance = balance + 100 WHERE cid = 4872;
SELECT cid, name, balance  from customer;
SAVEPOINT s1;
UPDATE customer SET balance = balance – 100 WHERE cid = 1924;
SELECT cid, name, balance from customer;
ROLLBACK WORK TO SAVEPOINT s1;
SELECT cid, name, balance from customer;
COMMIT ;

If you didn’t see much difference, that’s because, there isn’t. 

N1QL TRANSACTIONS STATEMENTS

N1QL transactions a set of transactions that include any of the DML statements in all forms: no restrictions. Transactional protection is issued from new statements: BEGIN/START, COMMIT, ROLLBACK, SAVEPOINT.

START TRANSACTION  (same as BEGIN WORK)

This statement starts a new transaction, assigns a new transaction ID, and returns the transaction ID to the caller. There are two rules the SDKs, tools (e.g. CBQ shell) follow to successfully execute the rest of the transaction.

  1. Send this transaction ID as a parameter every subsequent statement within the transaction.  This is how the query service knows the statement should be run as part of a particular transaction.
  2. Couchbase can have multiple query service nodes, but a single transaction is executed on a single query node. You can start a new transaction in ANY QUERY NODE. However, the rest of the statements FOR THAT SINGLE TRANSACTION should be sent to the VERY SAME query node.
COMMIT TRANSACTION or COMMIT WORK;

This commits all the changes in the transaction to the data store. This is a distributed commit of the transaction into the Couchbase key-value data store.  The commit supports all of the Couchbase durability options. This is still a distributed system — just like astronomy, rare things happen often. On any failure, the complete transaction is rolled back automatically and the application needs to retry the transaction. Failures can occur for various reasons: network failure, node failure, node overloaded, and write-write conflict.  Just like direct Couchbase WRITEs are optimistic and failures can occur due to concurrent writes resulting in CAS conflicts, N1QL transactions can also fail due to write conflicts. We implement a form of optimistic concurrency approach to transactions.

ROLLBACK TRANSACTION or ROLLBACK WORK;

On an application issued rollback, all of the modifications done within the transaction are rolled back.

As you’ve seen from the examples above, N1QL also supports savepoints and rollbacks to the savepoints within the transaction.  From the application perspective, these work the same as the RDBMS counterparts.

TRANSACTIONAL FEATURES

The transaction is more than just the statements — it’s all about the semantics and guarantees.  Hence the ACID definition. We talked about atomicity earlier wrt to COMMIT.  Let’s talk a bit more on this.

ATOMICITY is required for both the whole transaction and each statement.  The DML statements will atomically rollback on any failure, but the transaction itself is open and can be continued. An example of a failure is a document key conflict on insert.

CONSISTENCY ensures the constraints are applied consistently for each statement. The only constraint in Couchbase is the unique constraint on the document key.  N1QL checks for the pre-existence of each of the key inserted and rolls back the statement on any conflict.  Remember we use optimistic concurrency control.  That means, even after the INSERT is successful,  the commit stage can still run into a write-write conflict because some other session could have inserted between the insert and commit.   You’ll have to retry the transaction on such failures.

ISOLATION

We support the COMMITTED READ isolation level. All the data that’s read and evaluated is committed data in the index and data store. By default, we use the stringent request_plus consistency on the index reads.  This means, for a given predicate, we use the latest data in the index to qualify the documents to qualify select/update/delete.  We then go the extra step to fetch the documents from the KV store and re-apply the predicates to ensure the latest committed version of the document is qualified and updated.

If performance wasn’t a consideration, everyone would have used serializable transactions 😉 You can change the scan consistency to unbounded for improved index scan performance.

DURABILITY

N1QL supports all of the durability options and features with the Couchbase data store to ensure durability on our distributed database.

CONCURRENCYBroadly speaking, database transactions use either pessimistic or optimistic concurrency control. Traditional single node databases follow the pessimistic concurrency control to avoid conflicts.  This approach is also applied to some of the multi-node implementations like Oracle RAC, DB2 Sysplex. Multi-node implementations are possible but require expensive Infiniband, custom hardware, etc.

Optimistic concurrency control version each base unit of tuple (rows in rdbms, documents in Couchbase), remember the version they read to modify and check if the version has changed during the write.  If there’s indeed a conflict, the whole transaction has to be retried. The advantage of this approach is, in a well-designed application, there should be little conflicts: you won’t withdraw cash and transfer money between accounts at the same nanosecond.  On the rare occasion you do, the retry is tolerated.

Concurrency in the N1QL Query Service

Couchbase N1QL uses optimistic concurrency control. Each transaction reads the documents it needs to update, updates them, and keeps the updated documents in its private per-transaction cache. When you issue a subsequent statement, the query service is aware of the updated documents within the transactions and uses that version instead of the older version reflected in the index/data.  This is how it provides READ-YOUR-OWN-WRITE support.  This is modeled so all the DML statements, all operations (select, join, project, aggregate, nest, unnesst, etc) will all get this RYOW benefit providing a consistent and crucial feature of the transaction. Even while the application is doing transactions (doing both reads and writes), within the transaction we’re reading and caching the updates until the commit time. This is the READ phase of the transaction, And because of this approach, there’s no coordination between multiple transactions or multiple query nodes within a transaction until commit (WRITE phase). This ensures the performance and scalability of distributed transactions in Couchbase.  And, before you ask,  all these work concurrently with Couchbase distributed transactions we released in 6.5.

Coordination is the bane of scalable systems — Peter Bailis

NEXT STEPS

We just announced that Couchbase 7.0 Beta will be in November 2020. Stay tuned for details.

This is a short overview of what’s coming up with Couchbase Transactions.  In the upcoming series of articles, we’ll dive further into implementation, usage, SDK support, Lambda Transactions, Spring support, etc, and more.

ACKNOWLEDGMENTS

I’m happy to announce N1QL transactions here.  This is the result of intense work and collaboration in Couchbase query, SDK, and QE teams to design and implement.  Thank you!

Posted by Keshav Murthy

Keshav Murthy is a Vice President at Couchbase R&D. Previously, he was at MapR, IBM, Informix, Sybase, with more than 20 years of experience in database design & development. He lead the SQL and NoSQL R&D team at IBM Informix. He has received two President's Club awards at Couchbase, two Outstanding Technical Achievement Awards at IBM. Keshav has a bachelor's degree in Computer Science and Engineering from the University of Mysore, India, holds ten US patents and has three US patents pending.

Leave a reply