Fast failover is one of the many improvements that come with the release of Couchbase Server 5.0 (now available for download).

Failover is one of the important concepts to understand when it comes to distributed databases. The CAP theorem states that a distributed database can’t be both available and consistent all of the time. Couchbase Server’s architecture is designed to be always consistent, and partition tolerate. With fast failover, Couchbase Server is closing the gap on high availability.

In this blog post, I’m going to demonstrate failover in action. I’ll be using Docker to create a cluster of 3 Couchbase nodes on my local machine.

You can follow along with the code sample in this blog post: it is available on GitHub.

Fast failover overview

You’ll need a bit of setup and preparation.

First, create a 3-node (at least) Couchbase Server cluster. There are a number of ways to do this, including Vagrants, Virtual Machines, actual machines, Azure, and more.

I chose to use Docker. I blogged about how to create a Couchbase Cluster on Docker and access it with a .NET Core application (don’t forget the bridge network!). So, I just followed those same instructions again. The only difference is that I used a console application instead of an ASP.NET application (which you can read more about later in this post).

Three Couchbase Server nodes

I used the Couchbase Server 5.0.0-beta2 image from Docker Hub, but by the time you read this, an official release of Couchbase Server 5.0 should be available on the official docker Couchbase repository.

Next, I created a bucket called “mybucket”. Make sure to enable replicas to create additional cop(ies) of data within the same cluster.

Couchbase Server bucket

After that, create a user (I called mine “myuser”) with at least Data Writer and Data Reader permission for “mybucket”). If you aren’t familiar yet with the Couchbase Server Role-based Access Control (RBAC), start with this blog post on Authentication with RBAC and .NET.

Couchbase Server user

Finally, turn on automatic fast failover. From the Couchbase Console, go to Settings, and then Auto-Failover. Check the box to “Enable auto-failover”. As of version 5.0, you can set the Timeout value to as low as 5 (seconds). Previously, the value had to be at least 30 seconds.

Enable fast failover

There is a reason that auto-failover is off by default. Please review the full documentation on automatic failover to make sure that it’s a right fit for you.

.NET Example

Now that you have a 3-node cluster running inside of your Docker host, it’s time to write a demonstration application. I decided to write a console application that would continuously perform reads against Couchbase. At some point, I will “pull the plug” on one of the nodes to show automatic fast failover in action.

Connecting to the cluster

After creating a new .NET Core console application in Visual Studio, I added the Couchbase .NET SDK (currently version 2.5.1) using NuGet.

Then, I created a configuration to connect to the 3-node cluster, authenticate to “myuser”, and open up “mybucket”.

Those IP addresses are the addresses that are internal to the Docker host. This .NET Core application will also be running inside the Docker host, where those IP addresses will resolve. From outside the docker host, only “localhost:8091” will resolve (assuming you are following the tutorial I linked to earlier). If you are not using Docker, put in the IP addresses of the Azure machines, the VMs, etc, instead.

Next, PasswordAuthentication is used to ensure bucket access.

Finally, get a bucket object using OpenBucket.

Setting up documents

For this demonstration, I want to setup a bunch of documents that I will later be reading from, repeatedly. First, I wrote a loop to create some arbitrary number of documents, that each have a key like “documentKey[num]” (e.g. “documentKey1”, “documentKey2”, etc).

In my code, numDocuments is set to 50. But if you are following along, feel free to set it to another number and see what happens.

Reading documents

Therefore, there are 50 documents with well-known keys. The rest of the program will be continuously looping. Each loop iteration will attempt to retreive all 50 documents.

First, notice that there’s a loop within the loop. The inner loop will run 50 times to perform a Get on each document. ShowResult will then output what’s going on to the console (ShowResultTerse does the same thing, just in a much more compact fashion. ShowResult is below, but later screenshots will be using ShowResultTerse).

The comments will help you follow along, but ShowResult does three checks:

  1. Was the read successful? If so, output that. Done! Otherwise…
  2. Try to get a replica (from another node). Was THAT successful? If so, output that. Done! Otherwise…
  3. The application was unable to read the document or one of its replicas. In this example, that’s going to be very rare. In reality, it could mean that the document doesn’t exist, or replication isn’t configured correctly, or something else has gone wrong.

So, you’re ready to run the application. If you’re using Docker, don’t forget to run this application in Docker (which is easy to do from Visual Studio). (Also make sure to connect the .NET Core application container to the Docker bridge network).

Pull the plug!

Before pulling the plug on one of the nodes, let’s take a look at what the “normal” output is when running the above .NET Core application.

In the below GIF, you’ll see:

  • A three node Couchbase Server cluster
  • Switch over to Visual Studio
  • Build and start the Docker container with CTRL+F5
  • The (terse) console output of the Docker container

Console output from Docker

(I’ve sped up the animation a bit). Notice that “S” is being shown 50 times. This means that each document was (s)uccessfully retrieved.

Next, let’s show fast failover in action. I’m going to “pull the plug” on one of the nodes. With Docker, I can execute docker stop db2, for example.

There is a lot to keep track of at one time, so I’ve created a short video that demonstrates what’s going on.

[youtube https://www.youtube.com/watch?v=KbU5eG2R9XU&w=700&h=394]

What you’re seeing in that video is:

  1. Normal operation (all “S” for success)
  2. A node being stopped (with Docker)
  3. Couchbase detecting a node being down.
  4. Couchbase initiating fast failover to activate replicas.
  5. During that failover period, it’s no longer all “S”. There are some “R” for replicas (which are read only) in there too.
  6. When the failover is complete, the results go back to all “S” again.

The goal of fast failover is to reduce the period of time where not all documents are entirely available.

Summary

Couchbase Server 5.0 has improved failover with a “fast failover” option that can be useful for environments with solid networking in place.

This blog post shows off a console app that’s meant to demonstrate fast failover. It’s not a very useful app outside of that, but you can take the principles and apply them to an ASP.NET or ASP.NET Core website.

Check out Couchbase Server 5.0 today for this and other great new features.

Special thanks to Jeff Morris and the SDK team for helping out with this blog post!

Here are some links for more information on fast failover:

If you have questions or comments on failover, make sure to check out the Couchbase forums.

Please leave your questions and comments on all things .NET and Couchbase or find me on Twitter @mgroves.

Author

Posted by Matthew Groves

Matthew D. Groves is a guy who loves to code. It doesn't matter if it's C#, jQuery, or PHP: he'll submit pull requests for anything. He has been coding professionally ever since he wrote a QuickBASIC point-of-sale app for his parent's pizza shop back in the 90s. He currently works as a Senior Product Marketing Manager for Couchbase. His free time is spent with his family, watching the Reds, and getting involved in the developer community. He is the author of AOP in .NET, Pro Microservices in .NET, a Pluralsight author, and a Microsoft MVP.

Leave a reply