Blog Post

CAP Theorem and Couchbase Server... But this time with XDCR

Cihan Biyikoglu of Couchbase Published

CAP is well known to many so I won't spend the time to explain the intro material here but wanted to correctly identify a misconception that came up a few times in conversations recently. Here is the punchline for this post: The 'CAP' behavior of Couchbase Server as a single cluster vs Couchbase Server with XDCR is different.

Lets start with Couchbase Server single cluster deployment: CAP is generally too high level to describe all colors of a system but Couchbase Server is mainly referred to as a CP system with options to replax C in favor of A (for example auto-failover).

However with a multi cluster deployment with XDCR, Couchbase Server provides you AP. You can write to one of the clusters and we'll detect and resolve the conflict giving you eventual consistency between multiple clusters (strongly recommend you should understand how we detect and resolve these conflicts to be sure the effect is what you expect). Couchbase Server can be a more CP or a more AP system depending on the deployment topology.

I'll inject one warning and this discussion also come up in various opinions we have in engineering here as well: CAP is a high level definition. It is a good first-intro to understanding a system's approach. However it isn't great for describing all colors of the system. With that the attached table should give you some more details on how Couchbase Server's CAP Balance works. Quick tour on columns: The table look at various deployment topologies with XDCR. The Fault Domains describe the failure domain for the topology in question. CAP Balance describe the AP-nees vs CP-ness of the system. use the following table with caution.

Finally, If you have not looked at XDCR as an availability facility, following table is a good starting point. If you like to read more about XDCR and its behavior, I'd also recommend downloading the following paper: Developing Appllications with XDCR

Thanks for reading and as always comments welcome.

Cihan Biyikoglu - Product Management @ Couchbase

Deployment Topology

Fault Domain

CAP Balance

Comments

Single Couchbase Server Cluster

Node level failure domain (example, HW failures, communication failures between nodes)

Can be configured CP and can be tuned to be Available through auto failover or with replica reads and more.

Couchbase Server allows reading active or replica vbuckets thus can be tuned with auto-failover to give you write availability after a short failover timeout.

Couchbase Server Clusters with Uni-Directional XDCR for HA/DR

Node and Cluster-wide failures (examples: DC failures caused by natural disasters)

AP across clusters with Uni-Directional XDCR with protection against cluster wide or node failures.

 

Same 'CAP balance' for within each cluster for node failures.  

Uni-Directional with passive computational capacity at a second site/data-center. Destination cluster can be used for eventually consistent reads during steady state and can be promoted to read/write when source cluster fails.

Couchbase Server Clusters with Bidirectional XDCR for HA/DR

Node and Cluster-wide failures (examples: DC failures caused by natural disasters)

AP across clusters with Bi-Directional XDCR with protection against cluster wide or node failures.

 

Same 'CAP balance' for within each cluster for node failures.  

Bi-Directional with active/split computational capacity across sites/data-centers. Destination cluster can be use for eventually consistent reads and writes during steady state. You can experience write conflicts if same key is mutated at both clusters. Many customers minimize the write conflicts by segmenting write traffic to non-overlapping key ranges across source and destination clusters.