Couchbase Server 2.0: Most Common Questions (and Answers)
I just finished up a nine-week technical webinar series highlighting the features of our upcoming release of Couchbase Server 2.0. It was such a blast interacting with the hundreds of participants, and I was blown away by the level of excitement, engagement and anticipation for this new product.
(By the way, if you missed the series, all nine sessions are available for replay.) There were some great questions generated by users throughout the webinar series, and my original plan was to use this blog entry to highlight them all. I quickly realized there were too many to expect anyone to read through all of them, so I’ve taken a different tack. This blog will feature the most common/important/interesting questions and answer them here for everyone’s benefit. Before diving in, I’ll answer the question that was by far the most commonly asked: “How long until the GA of Couchbase Server 2.0?” We are currently on track to release it before the end of the year. In the meantime, please feel free to experiment with the Developer Preview that is already available. As for the rest of the questions, here goes!
Q: What are the primary benefits of incorporating Membase and CouchDB into a single product?
A: Membase is a super fast, highly scalable key value store known for its performance and scalability. CouchDB on the other hand is a great document database, with powerful indexing and querying capabilities. Combining these two products brings together the best of both worlds to create a high-performance, highly elastic NoSQL database that scales out linearly while providing querying, indexing and document-oriented features.
Q: Does Couchbase speed up access to a database document by automatically caching it in memory?
A: Absolutely! That’s one of the great feature of Couchbase Server 2.0, and comes from the vast experience we have with memcached. All access to documents goes through our integrated RAM caching layer (built out of memcached) to provide extremely low and, more importantly, predictable, latency under extremely heavy loads. For instance, we regularly see customers well over 100k operations/sec across a cluster and have taken single nodes to over 200k operations/sec in our own testing environments. This RAM caching layer also allows us to handle spikes in write (and read) load without affecting the performance of the application.
Q: I see in your forums that Couchbase Server 2.0 uses the memcached protocol for accessing data as this is compatible for existing Membase users and also for the much higher performance. Is there a way to use REST APIs akin to CouchDB’s to access the documents in Couchbase Server 2.0?
A: The first version of Couchbase Server 2.0 uses the memcached protocol for document access, and the CouchDB HTTP protocol for accessing views. Over time, these two will merge even closer. In the meantime, we have provided a number of client libraries that abstract these two access methods away from the developer.
Q: Is Couchbase Server 2.0 going to be open source?
A: It already is! As a company, Couchbase is fully committed to the furthering of the open source communities that exist and are being built around our various products. While our focus is on providing enterprise-class software to our paying customers, we embrace the free-flow of ideas and wide adoption that an open source project allows for and believe very strongly that there is a place for both.
Q: "All I need is a simple secondary index, not map/reduce...how do I do that?
A: Currently, all of our indexes are built using a map function (the reduce is totally optional and can be ignored here). This is really just another syntax for creating an index and there are a variety of examples avialable discussing how to create very simple indexes. The very simplest form would involve just putting "emit(doc.<field>)" in your map function where <field> is what you want to index off of. This will create a list of all documents containing that field, sorted by that field. Of course there are more complex scenarios, but it can be made quite simple if that is what is needed.
Q: How does dealing with Couchbase Server 2.0 views differ from CouchDB and Couchbase Single Server?
A: Not at all...the format, the syntax, everything is the same. Additionally, all the options for querying are supported. You can literally copy-paste the view code from one to another. Multiple design docs are also supported.
Q: Does Couchbase Server 2.0 support ad-hoc querying?
A: At the moment, all querying to Couchbase Server (like CouchDB) must be done against pre-materialized views. In general, this is the only way of providing reliable performance when making those queries. We also understand the need to for more on-demand/ad-hoc querying and are working diligently to provide that as well. Couchbase has already begun to take an industry-leader approach to creating a language specifically for unstructured data that can be used across the NoSQL landscape. Take a look at http://unqlspec.org to see what we're working on!
Q: Which SDK's and client libraries are supported?
A: At a base level, Couchbase Server 2.0 supports any library that implements the memcached protocol (and there are MANY of those). For the additional functionality that we have added (extended protocol commands and view access) Couchbase provides client libraries for a variety of languages (Java, .NET, PHP, Python, Ruby, C/C++) as well as instructions for how to extend libraries for other languages.
Q: Is there any chance of dogpiling with stale=update_after? If you get 30 requests simultaneously for a view with stale=update_after, will they generate several requests simultaneously for updating the index?
A: To recap, “stale” tells the server that this query request should be returned as quickly as possible, knowing that some data that has already been written may not be included in the view. By putting “update_after” in the request as well, the client is telling the server to rematerialize the index in the background…after returning the initial request as quickly as possible. Once this rematerialization is started, subsequent requests will not cause anything different to happen so there’s no worry of “dogpiling” or “stampeding herd” issues.
Q: How does the client know when to pull updated the server/vbucket maps?
A: All clients (whether they be our “smart” clients or are going through our Moxi process) will maintain a streaming connection to a Couchbase Server. When the topology of the cluster changes (add/remove/failover nodes), the clients will be automatically updated with a new vbucket map over this connection. The clients can also request this map on-demand, and do so everytime they startup. Additionally, each node of the cluster knows which vbuckets it is responsible for and will only return data for those vbuckets. This way, even if a client is temporarily out of sync with the cluster, it will never be vulnerable to inconsistent data.
Development/Production View Usage
Q: Why the extra effort of creating a view in “development” mode and then pushing it to production?
A: We wanted to provide the ability to do view development on a live dataset, but didn’t want to have that development impact the currently running application. Thus, a “development” mode was created so that users could create and edit views on “real” data. In order to speed up the iterations of development, the default is to materialize a view over a subset of the data. When the development is complete, the user can opt to materialize the view over the whole cluster right before pushing it to production. This gives the added benefit of materializing the view so that it is immediately ready for the application to use. Lastly, this “development” mode can be used to edit views that are currently in production , without affecting the application’s access to them (by making a copy). When the edits are complete, the view can then be materialized and swapped with the original into production.
Q: How do you control what the development data set is?
A: Currently, the development dataset is automatically decided by the software depending on how much data exists. For small datasets, the software will actually materialize the view across the whole thing. As that gets larger, the software will automatically scale it down to provide a quicker response time while developing. Once the view is finalized, the user has the option to run it over the whole dataset manually (by clicking the tab “Full Cluster Dataset”) both for the purposes of final verification and to prepare it for production use.
Q: For a bucket with replica and auto-failover, will a server failure without rebalance causing retrieval/update errors on that bucket?
A: When a server initially fails (for whatever reason: hardware, network, software) the application will briefly get errors for any data which that server was responsible for. Requests for data on other servers will not be impacted. These errors will continue until the node is “failed over” which activates the replica data (vbuckets) elsewhere in the cluster. The amount of time will vary depending on whether you are using automatic or manual failover…but once the failover is triggered there is no more delay. You might ask “but why can’t I read from the replica data that already exists.” The answer is two-fold. First, we specifically disallow access to the replica data (while it is “replica”) to preserve the very strong consistency that our system provides. Under normal operation, you are guaranteed to “read your own writes” and this is done by only providing one location for accessing any given piece of data. By allowing unrestricted reading of replicas, you might have a situation where one client writes a piece of data to the active copy and another client immediately tries to read that data from the replica…leading to possible inconsistency. Now, the second part of this answer is that we are currently working on feature to allow for reading from these replicas. It will be a new operation that is explicitly invoked by the application so that there won’t be any confusion about which copy is being read from. You’ll still want to failover the node as quickly as possible since writes will continue to fail. This is one example of the many features we have added as a direct response to our customers’ and users’ demands…you speak, and we listen (and then do something about it too)!
Q: Is there any effect/risk/time when rebalancing a system under heavy write loads? Is it best to add nodes during quite times?
A: By design, the rebalance operation is done asynchronously so as to have as minimal-as-possible an impact on the performance of the cluster. However, the reality is that rebalancing puts an increased load on the cluster and requires resources in order to do so (network, disk, RAM, CPU). If the cluster is already close to capacity, any increased load may impact the application’s performance. While safe to do at anytime, we highly recommend performing your own tests in your own environment to characterize what, if any, impact will be had by a rebalance. Typically our customers perform these at low or quiet times, but the main advantage is that you don’t need to take the application completely offline as you continue to scale.
Q: What’s a vbucket?
A: A vbucket is our way of logically partitioning data so that it can be spread across all the nodes within a cluster. Every Couchbase-type bucket that gets created on the cluster is automatically (and transparently) split up into a static set of slices (the vbuckets). These are then “mapped” to individual servers. When a node is added or removed, it is these slices that get moved around and re-mapped to provide linear and non-disruptive scaling. While totally abstracted from the application and user, it’s important to realize that vbuckets exist “under-the-hood” to provide much of the wonderful capabilities that Couchbase Server has. You can learn more about the vbucket concept here: http://www.couchbase.org/wiki/display/membase/vBuckets
Q: Is the Couchbase Server Web UI the only method of monitoring a Couchbase Server cluster?
A: Not necessarily, no. All that you see and can do in the Web UI is actually driven by our REST interface (http://www.couchbase.org/wiki/display/membase/Membase+Management+REST+API) that is programmatically accessible externally. Additionally, each individual server (and each individual bucket on that server) provides its own “raw” statistics that are used by the REST API. These raw statistics are available externally as well: http://www.couchbase.org/wiki/display/membase/Monitoring+Membase. It is our goal to provide as much information as possible about the system so that our users can effectively monitor it both from a capacity planning perspective and a diagnostic/troubleshooting perspective when things start to go wrong (or to prevent things from going wrong in the first place.
Q: What kind of alerting does Couchbase Server provide?
A: Technically, we are not a company that makes alerting software. In our minds, our job is to provide an interface for other systems to make use of. Most larger organizations would not want each piece of technology in their stack sending out a differently formatted set of alerts. That is why we have made it so easy to plug our statistics and monitoring data into any other system. However, we also realize that some smaller environments may in fact want our software to provide this out of the box. We are working on extending our capabilities here and already provide alerts for when nodes go down.
Q: If you abort the compaction at the end of the timeperiod, is the compaction done up until that point still saved or is all compaction done thus far lost?
A: Normally, a compaction is all-or-nothing and so aborting it will lose the progress that has been made so far. However, within Couchbase Server, we are performing the compaction on a per-vbucket (see above) basis and so the whole dataset can actually be compacted incrementally without losing all of the progress it has made when aborted.
Q: Why is a delay imposed before the cluster will automatically failover a downed node?
A: By default, the software is configured with a 30-second minimum before automatic failover will kick in. This is designed to prevent the software from doing the “wrong thing”. For example, if a node is simply slow to respond, or there is a brief network hiccup, you wouldn’t want it to be failed over and so the cluster will wait to ensure that the node is actually down. There are a few other situations to be taken into consideration and you can read more about these and our design decisions to handle them here.
To get even more information, you can view the 25-30 minute videos of each week's webinar by going here
Cross-cluster synchronization (aka cross-data center replication) Backup/Restore with Couchbase Server 2.0 Upgrading from Membase 1.7
- And more!