10 Things Developers Should Know about Couchbase
As a developer, I’ve been using Couchbase Server for couple of months now and I love it. Having written several apps myself, I’ve come to learn many (but not all) of the ins-and-outs of Couchbase. To be a good Couchbase developer, it’s not just enough to know how to use the API’s - it takes a little bit more.
To give you a quick overview of what developers can get out of Couchbase we’ve put together this Top 10 list of things you should know. These are in no particular order but it’s a good collection of information you should know if you’re building your app with Couchbase.
Here we go...
#10. Document access in Couchbase is strongly consistent, query access is eventually consistent
Couchbase guarantees strong consistency by making sure that all reads and writes for a particular document go to a single node in a cluster. This is for document (key / value ) access. Views are eventually eventually consistent compared to the underlying stored documents.
#9. Writes are asynchronous by default but can be controlled
By default, writes in Couchbase are async - replication and persistence happen in the background, and the client is notified of a success or failure. The updates are stored in memory, and are flushed to disk and replicated to other Couchbase nodes asynchronously.
Using the APIs with durability constraints within the application, you can choose to have the update replicated to other nodes or persisted to disk, before the client responds back to the app.
#8. Couchbase has atomic operations for counting and appending
Couchbase supports atomic incr/decr and append operations for blobs.
x = cb.incr("mykey")
puts x #=> 2
incr is both writing and returning the resulting value.
The update operation occurs on the server and is provided at the protocol level. This means that it is atomic on the cluster, and executed by the server. Instead of a two-stage operation, it is a single atomic operation.
#7. Start with everything in one bucket
A bucket is equivalent to a database. You store objects of different characteristics or attributes in the same bucket. So if you are moving from a RDBMS, you should store records from multiple tables in a single bucket.
Remember to create a “type” attribute that will help you differentiate the various objects stored in the bucket and create indexes on them. It is recommended to start with one bucket and grow to more buckets when necessary.
#6. Try to use 5 or less buckets in Couchbase. Never more than 10.
Documents don’t have a fixed schema, multiple documents with different schema can be in the same bucket. Most deployments have a low number of buckets (usually 2 or 3) and only a few upwards of 5. Although there is no hard limit in the software, the max of 10 buckets comes from some known CPU and disk IO overhead of the persistence engine and the fact that we allocate specific amount of memory to each bucket. We certainly plan to reduce this overhead with future releases, but that still wouldn't change our recommendation of only having a few buckets.
#5. Use CAS over GetL almost always
Optimistic or pessimistic locking, which one should you pick? If your app needs locking, first consider using CAS(optimistic locking) before using GetL (pessimistic locking).
But remember, locking might not be good for all cases - your application can have a problem if there is a lock contention. A thread can hold a lock and be de-scheduled by the OS. Then all the threads that want to acquire this lock will be blocked. One option is to avoid locking altogether where possible by using atomic operations. These API's can be very helpful on heavily contested data.
#4. Use multi-get operations
Once your client application has a list of document IDs, the highest performance approach to retrieve items in bulk using a multi-GET request. This performs better than a serial loop that tries to GET for each item individually and sequentially.
#3. Keep your client libraries up-to-date
#2. Model your data using JSON documents
Couchbase Server supports JSON and binary document format. First, try modeling your data using JSON. JSON documents can be indexed and queried. You can store binary blobs and range query off of the key names. Start by creating documents from application-level objects. Documents that grow continuously or under high write contention should be split.
Use primary key access as much as possible. Couchbase has keys and metadata in memory - data accesses are fast. Use secondary indexes for less performance sensitive paths or for analytics. Start with 4 design documents and less than 10 views per design document. Create a few “long” indexes that can be used for multiple queries and use creative filtering. Construct indexes to “emit” the smallest amount of data possible: use “null” for value if you do not have any reduce function.
Just 10 things? No, of course not! Couchbase is a NoSQL database system and after you try it you will find that there’s a lot more you will learn. If you feel that I missed something important that should be added in the top 10 list, feel free to add them using the comments below.