We are excited to announce release of Couchbase Autonomous Operator 1.2. This is landmark release marking several features requested by customers, mainly

  • Automated Upgrade of Couchbase Clusters
  • Integrated CouchbaseCluster Resource Validation via Adminission Controller
  • Helm Support
  • Public Connectivity for Couchbase Clients
  • Rolling Upgrade of Kubernetes Clusters
  • TLS x509 Certificate Rotation
  • Unified log collection experience for stateful and stateless deployments
  • Support for Public Kubernetes Services on GKE, AKS and EKS.Kubernetes running on public cloud was already working from day 1, but with Autonomous Operator 1.2, we are supporting in official capacity. For this blog’s perspective, we will be using GKE to setup kubernetes cluster on GKE with version 1.12, then deploying Autonomous Operator and then eventually deploying Couchbase Cluster with Server Groups, with persistent volumes, and with x509 TLS certificates.

Overall steps that we will be doing in this blog are as follows:

  1. Initialize gcloud utils
  2. Deploy kubernetes cluster (v1.12+) with 2 nodes in each availability zones
  3. Deploy Autonomous Operator 1.2
  4. Deploy Couchbase Cluster
  5. Perform Server Group Autofailover

Pre-requisites

  • kubectl (gcloud components install kubectl)
  • GCP account with right credentials
Initialize gcloud utils

Download gcloud sdk for the OS version of your choice from this URL.

One would need google cloud credentials to initialize the gcloud cli

Deploy kubernetes cluster (v1.12) with 2 nodes in each availability zones

Deploying kubernetes cluster on GKE is fairly straightforward job. To deploy resilient kubernetes clusters, its good idea to deploy nodes in all available zones within a given region. Doing it in such way we can make use of Server Groups or Rack Zone or Availability Zone(AZ) awareness feature within Couchbase server, means if we lose entire AZ, couchbase can failover entire AZ and Application will be active, as it still has the working dataset.

More Machines types can be here

At this point, k8s cluster with required number of nodes should be up and running

Details of the k8s cluster can be found like below

Deploy Autonomous Operator 1.2

GKE supports RBAC in order to limit permissions. Since the Couchbase Operator creates resources in our GKE cluster, we will need to grant it the permission to do so.

Download the appropriate package for deploying kubernetes in your environment. Untar the package and deploy the admission controller.

Check the status of admission controller

In order to visualize how admission controller works in sync with operator and couchbase cluster, it can be illustrated better with the following diagram

Next steps are to create crd, operator role and operator 1.2

Once the operator is deployed, it gets ready and available within secs

Deploy Couchbase Cluster

Couchbase cluster will be deployed with following features

  • TLS certificates
  • Server Groups (each server group in one AZ)
  • Persistent Volumes (which are AZ aware)
  • Server Group auto-failover
TLS certificates

Its fairly easy to generate tls certificates, details steps are be found here

Once deployed, tls secrets can be found with kubectl secret command like below

Server Groups

Setting up server groups is also straightforward, which will be discussed in the following sections when we deploy the couchbase cluster yaml file.

Persistent Volumes

Persistent Volumes provide way for a reliable way to run stateful applications. Creating them on public cloud is one click operation.

First we can check what storageclass is available for use

All the worker nodes available in the k8s cluster should failure domain labels like below

NOTE: I don’t have to add any failure domain labels, GKE added automatically.

Create PV for each AZ

yaml file svrgp-pv.yaml, can be found here.

Create secret for accessing couchbase UI

Finally deploy couchbase cluster with TLS support, along with Server Groups(which are Az aware) and on persistent volumes (which are also AZ aware).

yaml file couchbase-persistent-tls-svrgps.yaml, can be found here

Give a few mins, and couchbase cluster will come up, and it should look like this

Quick check on persistent volumes claims can be done like below

In order to access the Couchbase Cluster UI, either we can port-foward port 8091 of any pod or service itself, on local laptop, or local machine, or it can be exposed via lb.

port-forward any pod like below

At this point couchbase server is up and running and we have way to access it.

Perform Server Group Autofailover
Server Group auto-failover

When a couchbase cluster node fails, then it can auto-failover and without any user intervention ALL the working set is available, no user intervention is needed and Application won’t see downtime.

If Couchbase cluster is setup to be Server Group(SG) or AZ or Rack Zone(RZ) aware, then even if we lose entire SG then entire server groups fails over and working set is available, no user intervention is needed and Application won’t see downtime.

In order to have Disaster Recovery, XDCR can be used to replicate Couchbase data to other Couchbase Cluster. This helps in the event if entire source Data Center or Region is lost, Applications can cut over to Remote site and application won’t see downtime.

Lets take down the Server Group. Before that, lets see how the cluster looks like

Delete all pods in group us-east1-b, once the pods are deleted, Couchbase cluster will see that nodes are

Operator is constantly watching the cluster definition and it will see that server group is lost, and it spins the 3 pods, re-establishes the claims on the PVs and performs delta-node recovery, and then eventually performs rebalance operation and cluster is healthy again. All with no user-intervention whatsoever.

After sometime, cluster is back and up and running.

From the operator logs,

we can see that cluster is automatically rebalanced.

Epilogue

Sustained differentiation is key to our technology. We have added quite a number of new and supportability features. With all these enterprise grade supportability features, they enable end user to find the issues faster and help operationalize the Couchbase Operator in their environments in a efficient faster and way. We are very excited about the release, feel free to give a try!

 

References:

https://docs.couchbase.com/operator/1.2/whats-new.html

https://www.couchbase.com/downloads

https://docs.couchbase.com/server/6.0/manage/manage-groups/manage-groups.html

K8s Autonomous Operator Book from @AnilKumar

https://info.couchbase.com/rs/302-GJY-034/images/Kubernetes_ebook.pdf

https://docs.couchbase.com/operator/1.2/tls.html

All yaml files and help files used for this blog can be found here

 

 

 

 

Posted by Ram Dhakne

Ram Dhakne is Solutions Consultant - US West at Couchbase. He currently helps Enterprise customers with their digital innovations journey and helping them adopt NoSQL technologies. His current interests are running persistent applications like Couchbase NoSQL server on Kubernetes clusters running on AKS, GKE, ACS and OpenShift, securing end-to-end on kubernetes. In his past life has worked on IaaS platforms (AWS, GCP, Azure & Private Clouds), Enterprise Backup Target Products & Backup Applications.

Leave a reply