Introduction

Cross Datacenter Replication (XDCR)  is an important core feature of Couchbase that helps users in disaster recovery and data locality. Conflict resolution is an inevitable challenge faced by XDCR when a document is modified in two different locations before it has been synchronized between the locations.

Until 4.6, Couchbase only supported a revision ID-based strategy to handle conflict resolution. In this strategy, a document’s revision ID, which is updated every time it is modified, is used as the first field to decide the winner. If the revision ID of both contestants are same, then CAS, TTL, and flags are used in the same order to resolve the conflict. This strategy works best for applications designed to work based on a “Most updates is best” policy. For example, a ticker app used by conductors in a train which updates a counter stored by cb server to count the number of passengers will work best with this policy and hence perform accurately with revision ID-based conflict resolution.

Starting with 4.6, Couchbase will be supporting an additional strategy called timestamp-based conflict resolution. Here, the timestamp of a document which is stored in CAS is used as the first field to decide the winner. In order to keep consistent ordering of mutations, Couchbase uses a hybrid logical clock (HLC), which is a combination of a physical clock and a logical clock. If the timestamp of both the contestants are same, then revision ID, TTL, and flags are used in the same order to resolve conflicts. This strategy is adapted to facilitate applications which are designed based on a “Most recent update is best” policy. For example, a flight tracking app that stores the estimated arrival time of a flight in Couchbase Server will perform accurately with this conflict resolution. Precisely, this mechanism can be summarized as “Last write wins”.

One should understand that the most updated document need not essentially be the most recent document and vice versa. So the user really needs to understand the application’s design, needs, and data pattern before deciding on which conflict resolution mechanism to use. For the same reason, Couchbase has designed the conflict resolution mechanism as a bucket level parameter. The users need to decide and select the strategy they wish to follow while creating the buckets. Once a bucket is created with a particular conflict resolution mechanism via UI, Rest API, or CLI, it cannot be changed. The user will have to delete and recreate the bucket to change the strategy. Also to avoid confusions and complications, Couchbase has restricted XDCR from being set up in mixed mode, i.e source and destination buckets cannot have different conflict resolution strategies selected. They both have to use either revision ID-based, or timestamp based conflict resolution. If the user tries to set up otherwise via UI, Rest API, or CLI, there will be an error message displayed.

Timestamp-based conflict resolution Use Cases

High Availability with Cluster Failover

Here, all database operations go to Datacenter A and are replicated via XDCR to Datacenter B. If the cluster located in Datacenter A fails then the application fails all traffic over to Datacenter B.

Datacenter Locality

Here, two active clusters operate on discrete sets of documents. This ensures no conflicts are generated during normal operation. A bi-directional XDCR relationship is configured to replicate their updates to each other. When one cluster fails, application traffic can be failed over to the remaining active cluster.

How does Timestamp-based conflict resolution ensure safe failover?

Timestamp-based conflict resolution requires that applications only allow traffic to the other Datacenter after the maximum of the following two time periods has elapsed:

  1. The replication latency between A and B. This allows any mutations in-flight to be received by Datacenter B.
  2. The absolute time skew between Datacenter A and Datacenter B. This ensures that any writes to Datacenter B occur after the last write to Datacenter A, after the calculated delay, at which point all database operations would go to Datacenter B.

When availability is restored to Datacenter A, applications must observe the same time period before redirecting their traffic. For both of the use cases described above, using timestamp-based conflict resolution ensures that the most recent version of any document will be preserved.

How to configure NTP for Timestamp-based conflict resolution?

A prerequisite that users should keep in mind before opting for timestamp-based conflict resolution is that they need to use synchronized clocks to ensure the accuracy of this strategy. Couchbase advises them to use Network Time Protocol (NTP) to synchronize time across multiple servers. The users will have to configure their clusters to periodically synchronize their wall clocks with a particular NTP server or a pool of NTP peers to ensure availability. Clock synchronization is key to the accuracy of the Hybrid Logical Clock used by Couchbase to resolve conflicts based on timestamps.

As a QE, testing timestamp-based conflict resolution was a good learning experience. One of the major challenges was learning how NTP works. The default setup for all the testcases is to enable NTP, start the service, sync up the wall clock with 0.north-america.pool.ntp.org, and then proceed with the test. These steps were achieved using the following commands in setup:

~$ chkconfig ntpd on

~$ /etc/init.d/ntpd start

~$ ntpdate -q 0.north-america.pool.ntp.org

Once the test is done and results are verified, NTP service is stopped and disabled using the following commands:

~$ chkconfig ntpd off

~$ /etc/init.d/ntpd stop

This is vanilla setup where all the individual nodes sync up their wall clock with 0.north-america.pool.ntp.org. It was interesting to automate test cases where nodes sync up their wall clock with a pool of NTP peers, source and destination cluster sync with different NTP pools (A (0.north-america.pool.ntp.org) -> B (3.north-america.pool.ntp.org)) and each cluster in a chain topology of length 3 (A (EST)  -> B (CST) -> C (PST)) are in different timezones. We had to manually configure these scenarios, observe the behaviour and then automate it out.

How did we test NTP based negative scenarios?

Next challenge was to test scenarios where NTP is not running on the Couchbase nodes and there is a time skew between the source and destination. Time skew might also occur If the wall clock time difference across clusters is high. Any time synchronization mechanism will take some time to sync the clocks resulting in a time skewed window. Note that Couchbase only gives an advisory warning while creating a bucket with timestamp-based conflict resolution stating that the user should ensure a time synchronization mechanism is in place in all the nodes. It does not validate and restrict users from creating such a bucket if a time synchronization mechanism as such is not present. So it is quite possible that the user might ignore this warning, create a bucket with timestamp based conflict resolution and see weird behaviour when there is a time skew.

Let us consider one such situation here:

  1. Create default bucket on source and target cluster with timestamp based conflict resolution
  2. Setup XDCR from source to target
  3. Disable NTP on both clusters
  4. Make wall clock of target cluster slower than source cluster by 5 minutes
  5. Pause replication
  6. Create a doc D1 at time T1 in target cluster
  7. Create a doc D2 with same key at time T2 in source cluster
  8. Update D1 in target cluster at time T3
  9. Resume replication
  10. Observe that D2 overwrites D1 even though T1 > T2 > T3 and last update to D1 in target cluster should have won

Here last write by timeline did not win as the clocks were skewed and not in sync leading to incorrect doc being declared as the winner. This shows how important time synchronization is for the timestamp based conflict resolution strategy. Figuring out all such scenarios and automating them was indeed a challenge.

How did we test complex scenarios with Timestamp-based Conflict Resolution?

Up next was determining a way to validate the correctness of this timestamp based conflict resolution against revision ID based strategy. We needed to perform the same steps in a XDCR setup and verify that the results were different based on the bucket’s conflict resolution strategy. In order to achieve this, we created two different buckets, one configured to use revID based conflict resolution and other to use timestamp based. Now follow these steps on both buckets parallely:

  1. Setup XDCR and pause replication
  2. Create doc D1 in target at time T1
  3. Create doc D2 with same key in source at time T2
  4. Update doc D2 in source at time T3
  5. Update doc D2 in source again at time T4
  6. Update doc D1 in target at time T5
  7. Resume replication

In the first bucket which is configured to use revID based conflict resolution, doc D1 at target will be overwritten by D2 as it has been mutated the most. Whereas in the second bucket which is configured to use timestamp based conflict resolution, doc D1 at target will be declared winner and retained as it is the latest to be mutated. Figuring out such scenarios and automating them made our regression exhaustive and robust.

How did we test HLC correctness?

Final challenge was to test the monotonicity of the hybrid logical clock (HLC) used by Couchbase in timestamp based conflict resolution. Apart from verifying that the HLC remained the same between an active vbucket and its replica, we had some interesting scenarios as follows:

  1. C1 (slower) -> C2 (faster) – mutations made in C1 will lose based on timestamp and C2 will always win – so HLC of C2 should not change after replication
  2. C1 (faster) -> C2 (slower) – mutations made in C1 will always win based on timestamp – so HLC of C2 should be greater than what it was before replication due to monotonicity
  3. Same scenario as 1, even though HLC of C2 did not change due to replication, any updates on C2 should increase its HLC owing to monotonicity
  4. Similarly, for scenario described in 2, apart from C2’s HLC being greater than what it was before replication, more updates to docs on C2 should keep its HLC increasing owing to monotonicity

Thus, all these challenges made testing timestamp based conflict resolution a rewarding QE feat.

Posted by Arunkumar Senthilnathan, Sr Software Engineer, Couchbase

Leave a reply