ACID properties are a topic that I get asked about a lot. Generally, people ask in the context of transactions: “Are there NoSQL transactions?”, “Can I use ACID transactions in Couchbase?”, and so on. But today’s distributed applications do not always expect or need all of the ACID properties from their database. I’ll dig deeper into the ACID properties in this post.

Couchbase and Transactions

Scaling, performance, and flexibility have been the core focus of the Couchbase data platform, and distributed, multi-document transactions are often at odds with these characteristics.

Part 1 of this 2-part blog post is going to cover the building blocks of ACID that are available in Couchbase. You can use these “primitives” without sacrificing the overall scaling, performance, and flexibility of Couchbase. Depending on your use case, these primitives might be adequate on their own, or in concert with each other.

Currently, Couchbase supports many ACID properties on single documents, but does not provide multi-document transaction support. In part 2 of this series, I will show an approach that uses these building blocks to make something like a multi-document transaction with Couchbase. But first, it’s important to have a complete understanding of the building blocks.

What are the ACID properties?

ACID stands for: Atomicity, Consistency, Isolation, Durability. It was coined in the 1980s, but has existed since the 1970s in traditional, non-distributed, relational databases. It describes a class of database that is capable of providing operations with guarantees.

Applications have evolved from monolithic all the way to distributed micro-service based applications. Microservices still expect certain aspects of transactionality such as an atomic commit or rollback, but not necessarily full ACID behavior. Full ACID behavior may still be important, but in modern web and mobile software, it is often lower priority to performance and scalability. However, it is not an “either/or” situation. Couchbase provides some tools and capabilities to help you balance ACID properties and performance.

A is for Atomicity

“Atomicity” means that a group of operations either all succeed or all fail. Couchbase provides atomicity for single documents. An operation to get or set a document either succeeds or fails. Compared to an RDBMS guarantee of multiple operations succeeding or failing together, this may not seem like much.

But consider that data modeling is very different between document databases and RDBMS.

In a relational database, you typically normalize the data. For instance, to store a a shopping cart with 3 items, you would need:

  • 1 row in a ShoppingCart table

  • 3 rows in a ShoppingCartItems table

Shopping Cart example for ACID properties

If you want to create a shopping cart with 3 items, this requires four INSERT statements. So, you might need your relational database to treat those 4 statements atomically.

Now consider a document database. You would create a single shopping cart JSON document, which would itself contain 3 items.

Getting or setting this document is a single atomic operation in Couchbase. A fully normalized model isn’t often needed. If I were to extend this model to include address, and my address changes a month after I ship an order, does the order to which the address was shipped actually change? Normalization partly aims to reduce data duplication. This is the correct and efficient thing to do in some situations, but not all.

If you can model your data like this, then you immediately reduce your need for a multi-operation transaction (if you’re concerned about ballooning document size, don’t worry, you can use a subdocument or N1QL operation to operate on a portion of the document).

C is for Consistency

“Consistency” means generally that an operation “won’t violate declared system integrity constraints”. Any constraints in the database about the data remain consistent before and after an ACID operation. For instance, if an atomic set of operations fails, then the data remains consistent with what it was before the operation (because it can be rolled back). There are other types of consistency constraints that a database can impose: in a relational database, for instance, you cannot insert 6 columns of data into a 5 column table.

Instead of a schema, Couchbase uses JSON. Every document implicitly contains its own schema. If you perform an insert operation with JSON, you must supply valid JSON (the Couchbase SDKs will typically handle this for you). Couchbase only enforces this at the document level. Couchbase cannot prevent you from using a different JSON schema from document to document. For instance, consider these two documents:

A Couchbase bucket allows both of those documents. Note that the field names are not consistent with each other. This means that there is more responsibility at the application level to ensure the documents have consistent naming (but typically, this will be derived from your data model, and thus is handled automatically).

Another constraint that Couchbase enforces is the key. Each document must have a unique key. If you try to insert another document with a key of “user::mgroves”, for instance, an error will occur. So if you need to enforce a unique constraint, using the document key is one way to achieve this. (It is also to your advantage to use a natural key when possible so that a Couchbase cluster can perform a direct lookup).

Query Consistency

Finally, I want to mention index consistency. Couchbase is able to index fields or combinations of fields in JSON documents (just as relational databases can index columns or combinations of columns). However, Couchbase updates indexes asynchronously to provide better performance than other types of databases. When executing a N1QL query in Couchbase, the default behavior is “Not Bounded”. This means that the query engine will return results based on the current state of the system. So, hypothetically, if you create a new document and immediately run a query, that document might not be in the results.

Fortunately, there are two other options to tweak the consistency of queries: RequestPlus and AtPlus.

RequestPlus is at the opposite end of the consistency spectrum. It will wait for any documents currently known to the cluster to be part of index recalculations before processing the query. The trade-off here is latency, of course. A RequestPlus query will possibly take longer to execute.

AtPlus is in the middle. Instead of waiting for an entire index to complete, AtPlus will only wait for indexing of the documents known to a particular instance of your application. This provides lower latency with a narrow window of consistency, but it requires more work for the SDK. The query request also needs to be handled by the same instance.

Note: The “Consistency” in ACID is different than the “Consistency” in the CAP theorem. In terms of CAP, Couchbase is a strongly consistent distributed database. This means that each document has a single correct document per cluster, and there cannot be a “conflicting” or “sibling” document with the same key elsewhere in the cluster. A strongly consistent database does not imply anything about transactions or ACID properties.

I is for Isolation

“Isolation” is the ability for an operation to occur only after another operation on the same data has been completed. In this way, each operation is independent of other operations. This is very important when a database is handling concurrent data access. It should appear that the database is handling only one operation at a time. In order to accomplish this, the data that’s being updated must be locked out of being modified (and/or viewed) until the operation is complete.

Couchbase provides read committed isolation by default at (again) the single document level.

To get stricter isolation, there are two types of locks that you can do in Couchbase: pessimistic locking and optimistic locking.

Optimistic locking

“Optimistic” locking hinges on a value in Couchbase called CAS (compare-and-swap). Every document has a CAS value, which is some opaque value. Every time that document changes, it gets a new CAS value. When you attempt to update a document, you pass a CAS value as part of the operation. If the CAS values match, Couchbase allows the operation. If not, the operation is not allowed, and Couchbase returns an error instead.

As an example, let’s suppose there are two processes: A and B. A and B both make a “get” request to Couchbase to fetch a document. Couchbase returns the document, along with a CAS value. Then, A and B both send a “set” request to Couchbase, passing along the CAS value they previously received. Processing will race, occurring on one of them first. Let’s say it’s A. When it’s B’s turn, the CAS value of the document has already changed, and so the operation fails.

Here’s an example in .NET. Let’s suppose we’re working on a mobile game, and we want to keep track of a player’s Sword Level. The higher the level, the more damage it does. In this example, A is trying to upgrade to level 2 and B is trying to upgrade to level 3.

At the end of this process, B fails, and the player’s sword stays at level 2. If you want B to succeed, then one solution is to re-fetch the document, get the latest CAS value, and try again.

Of course, that solution could fail again. And again. But this is why it’s called “optimistic”. It assumes that the document is not going to be under heavy contention, and it will eventually succeed. There is no actual locking by the servers required: just a check of values.

Pessimistic locking

You can use pessimistic locking to actually set a lock. This can be useful if you want to lock a document graph to mutate multiple documents.

There is an atomic operation available in Couchbase called “GetAndLock”. This operation returns the document and a CAS value. At this point, the document is considered “locked”. No more locks can be made on it by other processes, and only the CAS value can unlock the document.

Here’s a C# example of a pessimistic lock in action:

Further, when using GetAndLock, you must set a timeout period. After the timeout period, Couchbase releases the lock automatically. Here’s an example of the timeout in action.

When running this example, the following output would occur:

Pessimistic lock time out

Using these locks, you can obtain isolation of an individual document, to make sure that changes happen in the order that you expect.

D is for Durability

“Durability” traditionally means that when an operation completes successfully, that the disk stores the changes made by the operation. In a distributed database, durability can mean that the disk and/or the memory in other nodes store the changes. Replication to other nodes is the preferred mechanism for durability in Couchbase as the network is much faster than disk. Ultimately, to a developer, it means that even if a system failure occurs, the change still takes place.

Couchbase has a “memory-first” architecture. This means that the results of write operations are acknowledged when received to memory, and then put into a queue to be asynchronously written to disk or replicated to another node soon after. So, if an operation writes to memory, and the system shuts down immediately, then that operation is not durable. This is the default trade-off that Couchbase makes: speed over durability.

However, Couchbase allows you to override that default configuration and specify a stronger level of durability with Durability Requirements. This will swing the pendulum away from performance and towards the ACID properties. The benefit of this design is that the application developer knows and decides when it’s important to pay the extra cost.

The default behavior is that writing the document to memory is considered a success. Couchbase will still persist and replicate according to your Couchbase cluster configuration, but the method call will proceed after the operation is acknowledged.

I can specify the number of nodes to replicate to that I consider to be a successfully durable operation by using ReplicateTo:

And I can also specify a combination of persistence to other nodes and replication to other nodes that I consider to be durable “enough” by using PersistTo as well:

Each of these method calls will block until the desired Durability Requirements are met, and will allow your application to perform additional error handling.

Note that if the durability requirements fail, then Couchbase may still save the document and eventually distribute it across the cluster. All we know is that it didn’t succeed as far as the SDK knows. You can choose to act on this information to introduce more ACID properties into your application.

Final Notes

This blog post talked about the various primitives available to you in Couchbase to build ACID-like guarantees into your application. While these primitives do not satisfy full ACID definition, they are sufficient for what the vast majority of modern microservices-based applications need. For the small percentage of use cases that need additional transactional guarantees, Couchbase will continue to innovate further.

In the next post, we’ll look at techniques and sample code you might leverage to build a multi-document transaction in Couchbase.

If you have any questions, be sure to check out the Couchbase Forums. You can find the same code used in this blog post on Github.

Special thanks to all the co-authors of this blog post: Shivani Gupta, Ravid Mayuram, John Liang, Chin Hong, Matt Ingenthron, Michael Nitschinger (and I’m sure I’m missing a few others).

Posted by Matthew Groves, Developer Advocate

Matthew is a Developer Advocate for Couchbase, and lives in the Central Ohio area. He has experience as a web developer as a consultant, in-house developer, and product developer. He has been a regular speaker at conferences and user groups all over the United States, and he has written AOP in .NET for Manning Books. He has experience in C# and .NET, but also with other web-related tools and technologies like JavaScript and PHP. You can find him on Twitter at @mgroves.

3 Comments

  1. Perry Krug, Principal Architect, Couchbase May 21, 2018 at 7:58 am

    Matt, great post! I think it would also be helpful to discuss “consistency” within Couchbase in terms of document references and JOINs. i.e. that you can avoid typical problems of denormalization by referencing one document from within another. That way, when you make a write to either document, it is still ACID.

  2. Thanks, Perry. I did discuss exactly that, except under ‘atomicity’ as I think it’s more applicable.

    1. Thanks Matt, I do see that initial discussion there but I think it’s worth highlighting how multiple documents can still be updated “consistently” by showing the linkage between two documents and using a N1QL JOIN when reading them back. i.e. document1 contains most of my user profile and a pointer/reference to document2 which contains my list of orders. When I update document2, it is still consistent with the whole user profile when I read it back (via JOIN) even though I didn’t have to atomically update both document1 and document2. It’s this combination of normalisation and denormalisation that makes Couchbase so powerful and easy to achieve atomicity and consistency. Document1 is denormalised to contain multiple tables, but document1+document2 is normalised to prevent data bloat and improve concurrency…yet the combination is still atomic and consistent when updating either document (or any in a chain). Does that make more sense?

Leave a reply