Couchbase Server is frequently used in the public and private cloud deployments and SaaS application settings and tenancy model question comes up often. I wanted to explain a couple of the options for setting up multi-tenancy with Couchbase Server.
Obviously, this is a wide topic and there are many tenancy models possible. Depending on application needs, what defines a tenant in each system can also have great variance. To introduce some limits however, I should declare that I define a tenants as entities that are sharing roughly the same application binaries and logic but storing its own data in the system. Some applications may call each user or subscriber a tenant, others may call a company and group of users a tenant or it may not be a user but a device or an appliance. So there is wide variability here… However one common property is that you always have an identified for the tenant: lets call that tenantID.
At a high level, you can set up your tenantID to be at the key level, bucket level or at the cluster level with Couchbase Server. Obviously all these options come with various benefits and compromises when compared to each other. Lets take a look at these few important parameters to consider when setting up a tenancy model.
- First parameter is to roughly declare the size and scale targets for your tenants. For example, how many tenants do you need to support: 10s, 1000s or millions? What is the variance in tenant data size (few thousands, millions or many billions of items) and the throughput needs (ex: 10s, 1000s, millions or hundreds of millions of operations/second)?
One other important aspect is to understand the server side user and administrative security isolation required between tenants. For example, can the application tier provide security isolation for users or would you need your tenants to self service their administrative settings in the environment?
You should also consider server side resource governance and isolation requirements. for example, does your application need to manage variance in tenant workload and prevent a tenant from monopolizing all resources or are the tenant workloads pretty even and don’t need complex resource governance? Obviously putting strict walls between tenants and reserving capacity may bring more predictability but also can have a big downside: you may end up limiting the experience of a tenant even when other tenants are idle and are not using system resources.
It will also be important to declare the desired packing density for your tenancy model. Can you afford to have few tenants or even less than a tenant per server or do you need many tenants per node?
Note that some of these may be reversely correlated. For example, in some cases, the higher the ‘isolation’ requirements the lower the ‘packing density’.
Lets dive into these multi-tenancy options in light of these properties…
Single Couchbase Bucket for All Tenants
In this model, Your keys essentially contain a tenantID as the prefix making you effective key tenantID+your_key and you create a single bucket to house all your tenants.
In terms of packing density, this does a great job reducing overhead per tenant. By pooling all your tenants into a single resource pool, you also get better resource sharing to handle variations in tenant workload distribution. This option also maximizes the number of tenants you can support – essentially each key can be a separate tenant, so billions of tenants – no problem! However you don’t get any security isolation or resource governance between tenants from the server with this setup. Any bucket level security applies to the entire bucket and that means, your application will need to take on more of the security and governance among tenants.
With this and all the other options below, you can spread subsets of your tenants over to multiple clusters for geo distribution of tenants, for fault isolation or to support very large numbers of tenants where a single bucket or a cluster for all tenants become impractical. You can also use a combination of these techniques – lets say you have a silver, gold and platinum package you provide to your tenants. You could make each package a buckets and distribute tenants to the relative bucket for the package they purchase. There are many more ways to slice and dice things and I am sure you can come up with more examples.
One common pattern however in all this is that, you will need a tenant distribution map to know which cluster/bucket to connect to. This information can be served to the application in many different ways but the important thing is that this information stays available under failures. Personally, I recommend that you put the tenant distribution map into a bucket on each Couchbase cluster that you replicate through XDCR (cross datacenter replication) between all clusters. There is great deal of convenience that comes with that. Example: updates to the tenant map automatically get replicated and tenant map is protected against many failures domains through the magic of XDCR.
Lets get back to our main topic and look at other options for multi-tenancy with Couchbase Server…
Couchbase Bucket per Tenant
In this model, each tenant gets a bucket. The tenants still share the same set of Couchbase Server nodes across multiple buckets in the cluster. This setup makes it easy to isolate tenants with bucket level security and allow you to isolate resource allocation through bucket resource configuration parameters such as memory quota, replica counts and IO bandwidth allocation through read-write concurrency settings. However the number of tenants you can support through this setup is limited. Couchbase Server supports up to 10 buckets and that number can be dialed up, however the overhead of creating too many buckets drastically limits the density you can achieve.
Obviously just like the previous option, you can fire up more clusters to support larger number of tenants but it is easy to spot the fact that this option does not give you a great deal of density.
Couchbase Cluster per Tenant
In this model, you create a Couchbase cluster per tenant. The tenants do not share the same set of nodes. However this model bring in even more control over tenant isolation. Tenant can even get administrative rights for their clusters. So full security isolation… You also can fully isolate tenant workloads from each other through allocation of nodes to each cluster – tenant #1 can have 40 nodes while tenant #3 can be at 4. However packing can’t beat option #1 and #2 above – your smallest tenants still need 3 nodes and that isn’t a small horsepower.
I should note here that you can virtualize Couchbase Server nodes and deploy multiple clusters to the same subset of machines to improve density. However you would be introducing unpredictability for your performance as multiple virtual machines compete for the same resources. For completeness, I should also mention that install multiple instances is another option. Multiple Instances is not recommended unless your tenants are using the environment for development (a.k.a non-production) workloads.
To recap here is a quick summary of the options and properties we discussed:
Single Bucket for All Tenants
Bucket per Tenant
Cluster per Tenant
Number of Tenants
As always welcome all comments.
Cihan Biyikoglu – Product Management