Couchbase 101: Q & A
In our ongoing training series, a number of questions come up each time, I list them out with their respective answers below!
Couchbase 101 - Architecture, Installation and Configuration
My Ruby based load generator can be downloaded here: https://github.com/scalabl3/ruby-couchbase-loadgen
Q: We're using 2.0 in production, what's the best practice to upgrade to 2.2?
A: You can upgrade your cluster in three different ways. The first is the Swap Rebalance online upgrade and is a great way to maintain uptime of your cluster while also upgrading. To do a swap rebalance, you add an equal number of new nodes running Couchbase 2.2 as your current cluster size, but before rebalance, remove the Couchbase 2.0 nodes. When you rebalance, since you are adding and removing the same number of nodes it will be the most efficient. Read more about it here: http://docs.couchbase.com/couchbase-manual-2.0/#swap-rebalance You can also do an offline upgrade by using cbbackup/cbrestore from the old cluster to the new, or you can use cbtransfer (but you have cease operations that create data before transferring!)
Q: Is Couchbase Server free or do you need licenses?
A: Couchbase Community is free for development and production for any number of nodes. Couchbase Enterprise is free for development, any number of nodes, and up to 2 in production. Beyond 2 nodes in production requires license, but our licensing also includes Enterprise Support bundled with it!
Q: Are the application servers physical devices or can they be VM's?
A: Applications servers and Couchbase servers can both be physical machines or virtual machines.
Q: Could Couchbase be used as an alternative to Clearquest/Clearcase or is this product strictly used for document control?
A: Couchbase is a data store that you build applications on top of, Clearquest/Clearcase are applications that are built on data stores, so comparing with Couchbase is not really "apples-to-apples".
Q: Can Couchbase be monitored with SNMP? Is it possible to integrate Couchbase monitoring with Solarwinds?
A: You can use SNMP to monitor the server itself just like any other machine, but Couchbase itself doesn't have SNMP integration as of now. As far as I know I am not aware of a standard out of the box integration with Solarwinds, but I imagine it wouldn't be difficult if you can extend Solarwinds to poll http/JSON for information and have custom triggers.
Q: Does couchbase support fail-over to a standby couchbase node? How will the stored data synchronize with a standby couchbase cluster?
A: We don't have an auto-failover to a standby node, mostly because failover involves promoting replica partitions to active partitons. If you were to use a standby node you'd have to have a copy of all the data in the cluster on it because you don't know which node (and which partitions) are going to be inaccessible/fail. This wouldn't make sense to do, instead we have replica partitions, and in the case of a failure, failover will promote replica partitions to be active. If you are maintaining an entire separate cluster on standby (and using Cross Data Center Replication (XDCR) to replicate your active cluster data to it), you would have to script your own logic for deciding when to swap clusters.
Q: Can the metadata values (i.e id's) be automated by the CB system (i.e. - auto count for id's), or does it pass that responsibility to the application?
A: All id's (keys) are the application's responsibility, there aren't mechanisms built into Couchbase for generating ID's. However, you can use Atomic Counters to act like IDENTITY columns in RDBMS's. Check out the Couchbase 103 webinar for more info on some patterns.
Q: Can we store mp3 or any Audio or Video files in Couchbase?
A: Of course you can store anything in Couchbase, simple data types, JSON and binary data of any type (MP3, JPEG, PNG, etc.). You are only limited by a 20MB per "document" limit. However, video files tend to be quite large, in which case, you'd be better served to use a CDN system designed for large files and streaming them to large audiences, and store the asset metadata for the file (like it's title, url to stream it, etc.) in Couchbase.
Q: How much RAM should we leave for OS?
A: It depends, if you are using Views heavily then it would be prudent to allocate more RAM for filesystem cache. We generally recommend leaving about 40% of available RAM to OS (so configure Couchbase to use 60%) and that gives good performance all around. If you are not using Views then you can allocate more RAM to Couchbase. It might be good to reference the sizing guidelines in this blog post: http://blog.couchbase.com/how-many-nodes-part-1-introduction-sizing-couchbase-server-20-cluster
Q: What is a high number of OPS on a 16GB 4 core system?
A: Another "it depends", it's going to be strongly influenced by the network speed and ability to send binary operations through the wire to Couchbase. A 16GB 4-core box on Amazon AWS won't be the same as a physical box connected to the load generator(s) with 10GigE's. You won't see this kind of performance, http://blog.couchbase.com/understanding-performance-benchmark-published-cisco-and-solarflare-using-couchbase-server on AWS! But it goes to show that it's not really Couchbase itself limiting the ops/s, but rather networking and ability to deliver operations to Couchbase through the binary sockets.
Q: How many replicas do you recommend?
A: Generally most people are comfortable with just 1 replica, however there are some that want the safety of 2 replica's, I am not aware of a customer that uses 3 replica's. Of course you need to beef up your servers for more replica's with more allocated RAM and potentially CPU as well if you are going to index the replica's as well. Ultimately that decision has to be yours as you are most aware of the data and importance of securing the data against any data loss, etc.
Q: For sdk client connections, do we need to add all the server ips manually?
A: There are a number of ways to handle this, through DNS for instance where you have your app servers always connect to a CNAME or A record, and list all the cluster machines (or auto-register them) with the A record. Or you can put the IP's in a config file that is updated across the servers (or centrally located), or type in the ip's into your application startup code, etc.
Q: Concerning the OS it's avalailable for Ubuntu. Is there any problem if using other distribution like Debian?
A: I believe this is fine, but I haven't tried it myself. The OS's that are listed on the download page are also the ones that are heavily tested. I know one of our engineers got Couchbase working on Joyent SmartOS, but it's not an official download, etc.
Q: Is metadata not persisted?
A: Metadata is also persisted to disk of course, but it is also always kept in RAM as well. Documents will be in RAM if there is enough RAM available in the bucket (across the cluster) to contain the values. If not, Not Recently Used (NRU) is used to eject document values to disk.
Q: What is the use of IO workers? Why is it divided when we add new nodes?
A: IO workers are used to read/write from disk, you can increase the number of threads (workers) in your bucket configuration depending on your capacity to do so (if you can only handle 4 workers, setting it to 8 won't change performance). When you add more nodes to your cluster, you increase your IO workers linearly, with each new node adding the same amount of IO workers (for their own IO). They are not "divided" they are allocated per node.
Q: Did you say, "High water mark has change recently from 80% to 90%"? What about low water mark? is it still 60% or changed?
A: These are actually configurable parameters, the default settings are roughly 80% for low watermark and 90% for high watermark. At the low watermark, ejection of replica partition data from RAM will begin, and at the high watermark ejection of active partition data from RAM will occur. These are also configurable parameters at the bucket level, see: http://docs.couchbase.com/couchbase-manual-2.2/#changing-thresholds-for-ejection
Q: How can I delete a data bucket?
A: In the Admin interface you can delete a bucket by clicking on Data Buckets in the top nav, click on the triangle next to the bucket name to expand, click the Edit button on the right side, and at the bottom of the modal dialog that pops up there is a Delete button. You can also delete buckets programmatically from the SDK's.
Q: How does Couchbase relate to Apache CouchDB?
A: The founders of CouchDB (Damien Miller and JChris Anderson) left the Apache CouchDB project and joined/merged with NorthScale/Membase as Founders of Couchbase along with Steve Yen and Dustin Sallings (of Northscale/Membase). There are many similarities in the Views Map-Reduce style and query syntax that will be familiar to CouchDB users, however, there are also many important differences. Couchbase is an independent for profit open-source company with it's own independent code base that is not tied to CouchDB in any way. The binary CRUD operations however resemble memcached/membase rather than anything CouchDB related. They certainly could have picked a less confusing name...
Q: How does the architecture deal with out-of-balance partitions?
A: Actually our hashing and partition strategy has shown over many many years to be very well distributed which is why we are still using it.
Q: How are node failures handled?
A: There are two ways to handle node failures, the first is by enabling auto-failover. In auto-failover if a node is unreachable for 30 seconds then that node will automatically be failed over and replica's will be promoted to active. The alternative is to manually failover a node (or script it to be automatic but based on your own monitoring solution), which can give you the flexibility to decide all the parameters and timing for node failovers.
Q: I tried to install couchbase enterprise2.2.0 server on fedora17 but got failures for libcrypto.so and libssl.so, how to I fix this?
A: Yes because of an erlang dependency in the erlang core library you need to yum install openssl098e
Q: What does acid means for Couchbase?
A: Couchbase supports ACID "transactions" on a per-document level. You can use either CAS (Check and Set/Compare and Swap) for optimistic concurrency or use GetAndLock to actually lock a document for pessimistic concurrency scenarios. Transactions are generally much more required in Normalized RDBMS data stores. The reason is because of Normalization, data structures are often broken up into many different tables, without Transactions, data integrity collapses quickly. In NoSQL scenarios like Couchbase, since data is far less normalized, transactions are generally less necessary than in the RDBMS world. Using the concurrency model you can create transactions. We also have durability operations where you can ensure that data has made it to a replica and/or on disk.
Q: If one of the nodes in the cluster goes down, how does map on client appserver get updated and to the data in that node?
A: When a failover is triggered (either automatically or manually) this promotes the replica partitions to active. If a node gets a CRUD operation for a partition number that it does not own, it returns a "Not My VBucket" error to the sdk client, the sdk client knows how to handle this. This error indicates that the cluster map is out of date/out of sync and it automatically requests a new one from the persistent HTTP connection. There are only 2 scenarios where a cluster map changes, on rebalances and failovers. So those are the times where topology changes and cluster map is changed, and the client will see that on the first operation that returns the "Not My VBucket" error.
Q: Is 20 GB hard limit on disk for each server?
A: There is no storage limit, what you might have misheard was that there is a 20MB limit per document value in Couchbase.
Q: Is the append only disk writes similar to journaling and what OS file system uses?
A: Yes, It is a similar strategy but in our own format.
Q: Is there a way to monitor free disk size? will it send pager or email notification when it reaches thresholds (say 80% full)? Same case for CPU / MEM usage?
A: We don't have these sort of notification systems built in, but it's certainly trivial to integrate your own system for doing this. These are more like VM/Computer monitoring rather than Couchbase specific ones, but you could easily create an integration for your own custom parameters. All the information in the graphs of the Admin console are available as JSON with http requests.
Q: What is a bucket?
A: A bucket is a "database" a collection of data. It also is a namespace for the data, so all keys need to be unique within a bucket. It also acts as a namespace for Views, design documents and views can only access data within the bucket they are defined in.
Q: In the case there are multiple couchbase server and multiple application server, can all the application server connect to the same couchbase server (same ip address)? In this case the load balancing is automatically done by couchbase or is better to distribuite connections between application servers?
A: Great question, it's important to actually NOT put a load balancer between the application servers and Couchbase. Because of the key-hash partitioning, data is already automatically distributed across the Couchbase cluster. The app servers will connect and interact with the Couchbase cluster nodes directly and because of the partitioning be already "load balanced" in the sense that they will be doing CRUD operations across the cluster based on the hashing of the keys. The app servers and sdk clients will maintain open connections to each node in the Couchbase cluster, and generally, a single shared connection is all that is needed for each app server.
Q: What happens when a document is being accessed by client apps while the rebalancing is in progress of that document containing bucket?
A: All operations continue as normal during rebalance, and normal operations are prioritized over rebalancing. This means that it will continue to rebalance while you are doing operations. Of course, generally speaking, it's best not to initiate a rebalance during peak usage! However, it depends on what kind of usage level and configuration whether it might cause a slowdown or not. In most cases it's unnoticeable.
Q: What is the similar BLOB type in Couchbase?
A: You can store binary data directly as a value using the SDK, since we don't have defined or enforced schema, you don't need to specify that it is a BLOB. The BLOB is simply the value of the "document".
Q: When I perform a set operation, the document first goes to the RAM and from then it is sent to the disk write queue and replication queue. what if the node crashes when the document is on the RAM?
A: Whenever there is a failure in any system of any type (any type of database, app server, mobile phone, etc) there is always a potential for data loss. All strategies are attempts to minimize it as best as possible, but in this exact scenario, yes, it is possible to make it into RAM but not make it into disk-write-queue and/or replication queue if the machine crashes hard at that exact moment between storing and RAM and inserting into the queues.
Q: When you add a server is it sharding?
A: Yes, we use "Hash Sharding" for all data, that means that for every key-value pair, we hash the string key to get the partition number (a logical container) it should live in. We look up the partition number in the cluster map to see where that partition is located in the Couchbase cluster, and then do CRUD directly with that Couchbase node responsible for that partition. When you add a new server (or more than one), we are simply redistributing the partitions evenly across the new number of nodes.
Q: Do we have to add servers for replica alone?
A: Replicas, for effectiveness in a failover situation, need to be located on different nodes so that nodes can be "failed over". For one replica, there needs to be at least two Couchbase nodes. For two replicas there needs to be at least three Couchbase nodes, and for three replicas there needs to be at least four Couchbase nodes. Of course increasing the number of replicas also increases the need for bigger resources for optimal performance.
Q: Are all the docs with the same hash value stored on the same partition?
A: Yes, our Hash function is simply turning a string key into a number from [0...1023], all keys that hash to the number 2 will live within the partition container #2. That partition will reside on a particular node of Couchbase in the cluster, if you add more servers and rebalance, that partition may be moved to another server, but there is only ever one node that is the "master" for that partition. The "replica" for that partition will of course live on a separate node.
Q: In your horizontal scale example, the total partitions are always 1024. Is 1024 the max partitions in a cluster?
A: When first learning about our sharding and distribution scheme, many developers look at this number as a knob to tweak, naturally, since we are all tinkerers. However, the number of partitions doesn't change performance characteristics. We feel 1024 is a good number of partitions for even distribution of data, it's not really something that needs changing, and is not a configuration parameter. The number very easily could be 10,000 or 990, and it wouldn't change the architecture of how we distribute data (it would just increase or reduce the number of data files).
Q: How Many Buckets can be installed on a single virtual machine?
A: This is a difficult question to give a rule of thumb, because the resources allocated to that virtual machine can vary greatly, but generally for a sizable machine of 8+ cores, about 10-12 Buckets is feasible. For each Bucket we allocate resources to manage and monitor, so having huge numbers of buckets increases CPU overhead and is generally not recommended.
Q: Does rebalancing of the nodes has any effect on the efficiency of the Couchbase?
A: We prioritize primary operations over rebalancing operations in Couchbase, but with that said, there is an increase of CPU and network traffic between Couchbase nodes during a rebalance. It's recommended to do rebalancing in off-peak usage times of course, the same goes for doing backups, just like any other system. Doing a rebalance at peak usage times will certainly slow down the rebalance timing if peak usage is very heavy.
Q: When you rebalance are partitions actually moved to another node or are they duplicated?
A: They are copied until the rebalance is complete, in case there is a failure during rebalance. The original master node remains the master node until rebalance finishes completely. Upon rebalance completion, the new master node for that partition becomes the master and the old one removes its data, and the cluster maps are updated accordingly across the cluster, and then the client sdk's.
Q: What tools are available for backup/restore and to help with disaster recovery?
A: We have cbbackup and cbrestore command line tools just for this purpose, you can read about them here: http://www.couchbase.com/docs//couchbase-manual-2.0/couchbase-backup-restore-backup-cbbackup.html