Couchbase Server 2.0: Most Common Questions- The Couchbase Blog

I just finished up a nine-week technical webinar series highlighting the features of our upcoming release of Couchbase Server 2.0. It was such a blast interacting with the hundreds of participants, and I was blown away by the level of excitement, engagement and anticipation for this new product.

(By the way, if you missed the series, all nine sessions are available for replay.) There were some great questions generated by users throughout the webinar series, and my original plan was to use this blog entry to highlight them all. I quickly realized there were too many to expect anyone to read through all of them, so I’ve taken a different tack. This blog will feature the most common/important/interesting questions and answer them here for everyone’s benefit. Before diving in, I’ll answer the question that was by far the most commonly asked: “How long until the GA of Couchbase Server 2.0?” We are currently on track to release it before the end of the year. In the meantime, please feel free to experiment with the Developer Preview that is already available. As for the rest of the questions, here goes!

Q: What are the primary benefits of incorporating Membase and CouchDB into a single product?

A: Membase is a super fast, highly scalable key value store known for its performance and scalability. CouchDB on the other hand is a great document database, with powerful indexing and querying capabilities. Combining these two products brings together the best of both worlds to create a high-performance, highly elastic NoSQL database that scales out linearly while providing querying, indexing and document-oriented features.

Q: Does Couchbase speed up access to a database document by automatically caching it in memory?

A: Absolutely! That’s one of the great feature of Couchbase Server 2.0, and comes from the vast experience we have with memcached. All access to documents goes through our integrated RAM caching layer (built out of memcached) to provide extremely low and, more importantly, predictable, latency under extremely heavy loads. For instance, we regularly see customers well over 100k operations/sec across a cluster and have taken single nodes to over 200k operations/sec in our own testing environments. This RAM caching layer also allows us to handle spikes in write (and read) load without affecting the performance of the application.

Q: I see in your forums that Couchbase Server 2.0 uses the memcached protocol for accessing data as this is compatible for existing Membase users and also for the much higher performance. Is there a way to use REST APIs akin to CouchDB’s to access the documents in Couchbase Server 2.0?

A: The first version of Couchbase Server 2.0 uses the memcached protocol for document access, and the CouchDB HTTP protocol for accessing views. Over time, these two will merge even closer. In the meantime, we have provided a number of client libraries that abstract these two access methods away from the developer.

Q: Is Couchbase Server 2.0 going to be open source?

A: It already is! As a company, Couchbase is fully committed to the furthering of the open source communities that exist and are being built around our various products. While our focus is on providing enterprise-class software to our paying customers, we embrace the free-flow of ideas and wide adoption that an open source project allows for and believe very strongly that there is a place for both.

Indexing/Querying

Q: “All I need is a simple secondary index, not map/reduce…how do I do that?

A: Currently, all of our indexes are built using a map function (the reduce is totally optional and can be ignored here). This is really just another syntax for creating an index and there are a variety of examples avialable discussing how to create very simple indexes. The very simplest form would involve just putting “emit(doc.)” in your map function where is what you want to index off of. This will create a list of all documents containing that field, sorted by that field. Of course there are more complex scenarios, but it can be made quite simple if that is what is needed.

Q: How does dealing with Couchbase Server 2.0 views differ from CouchDB and Couchbase Single Server?

A: Not at all…the format, the syntax, everything is the same. Additionally, all the options for querying are supported. You can literally copy-paste the view code from one to another. Multiple design docs are also supported.

Q: Does Couchbase Server 2.0 support ad-hoc querying?

A: At the moment, all querying to Couchbase Server (like CouchDB) must be done against pre-materialized views. In general, this is the only way of providing reliable performance when making those queries. We also understand the need to for more on-demand/ad-hoc querying and are working diligently to provide that as well. Couchbase has already begun to take an industry-leader approach to creating a language specifically for unstructured data that can be used across the NoSQL landscape. Take a look at http://unqlspec.org to see what we’re working on!

SDKs/Client Libraries

Q: Which SDK’s and client libraries are supported?

A: At a base level, Couchbase Server 2.0 supports any library that implements the memcached protocol (and there are MANY of those). For the additional functionality that we have added (extended protocol commands and view access) Couchbase provides client libraries for a variety of languages (Java, .NET, PHP, Python, Ruby, C/C++) as well as instructions for how to extend libraries for other languages.

Q: Is there any chance of dogpiling with stale=update_after? If you get 30 requests simultaneously for a view with stale=update_after, will they generate several requests simultaneously for updating the index?

A: To recap, “stale” tells the server that this query request should be returned as quickly as possible, knowing that some data that has already been written may not be included in the view. By putting “update_after” in the request as well, the client is telling the server to rematerialize the index in the background…after returning the initial request as quickly as possible. Once this rematerialization is started, subsequent requests will not cause anything different to happen so there’s no worry of “dogpiling” or “stampeding herd” issues.

Q: How does the client know when to pull updated the server/vbucket maps?

A: All clients (whether they be our “smart” clients or are going through our Moxi process) will maintain a streaming connection to a Couchbase Server. When the topology of the cluster changes (add/remove/failover nodes), the clients will be automatically updated with a new vbucket map over this connection. The clients can also request this map on-demand, and do so everytime they startup. Additionally, each node of the cluster knows which vbuckets it is responsible for and will only return data for those vbuckets. This way, even if a client is temporarily out of sync with the cluster, it will never be vulnerable to inconsistent data.

Development/Production View Usage

Q: Why the extra effort of creating a view in “development” mode and then pushing it to production?

A: We wanted to provide the ability to do view development on a live dataset, but didn’t want to have that development impact the currently running application. Thus, a “development” mode was created so that users could create and edit views on “real” data. In order to speed up the iterations of development, the default is to materialize a view over a subset of the data. When the development is complete, the user can opt to materialize the view over the whole cluster right before pushing it to production. This gives the added benefit of materializing the view so that it is immediately ready for the application to use. Lastly, this “development” mode can be used to edit views that are currently in production , without affecting the application’s access to them (by making a copy). When the edits are complete, the view can then be materialized and swapped with the original into production.

Q: How do you control what the development data set is?

A: Currently, the development dataset is automatically decided by the software depending on how much data exists. For small datasets, the software will actually materialize the view across the whole thing. As that gets larger, the software will automatically scale it down to provide a quicker response time while developing. Once the view is finalized, the user has the option to run it over the whole dataset manually (by clicking the tab “Full Cluster Dataset”) both for the purposes of final verification and to prepare it for production use.

Clustering

Q: For a bucket with replica and auto-failover, will a server failure without rebalance causing retrieval/update errors on that bucket?

A: When a server initially fails (for whatever reason: hardware, network, software) the application will briefly get errors for any data which that server was responsible for. Requests for data on other servers will not be impacted. These errors will continue until the node is “failed over” which activates the replica data (vbuckets) elsewhere in the cluster. The amount of time will vary depending on whether you are using automatic or manual failover…but once the failover is triggered there is no more delay. You might ask “but why can’t I read from the replica data that already exists.” The answer is two-fold. First, we specifically disallow access to the replica data (while it is “replica”) to preserve the very strong consistency that our system provides. Under normal operation, you are guaranteed to “read your own writes” and this is done by only providing one location for accessing any given piece of data. By allowing unrestricted reading of replicas, you might have a situation where one client writes a piece of data to the active copy and another client immediately tries to read that data from the replica…leading to possible inconsistency. Now, the second part of this answer is that we are currently working on feature to allow for reading from these replicas. It will be a new operation that is explicitly invoked by the application so that there won’t be any confusion about which copy is being read from. You’ll still want to failover the node as quickly as possible since writes will continue to fail. This is one example of the many features we have added as a direct response to our customers’ and users’ demands…you speak, and we listen (and then do something about it too)!

Q: Is there any effect/risk/time when rebalancing a system under heavy write loads? Is it best to add nodes during quite times?

A: By design, the rebalance operation is done asynchronously so as to have as minimal-as-possible an impact on the performance of the cluster. However, the reality is that rebalancing puts an increased load on the cluster and requires resources in order to do so (network, disk, RAM, CPU). If the cluster is already close to capacity, any increased load may impact the application’s performance. While safe to do at anytime, we highly recommend performing your own tests in your own environment to characterize what, if any, impact will be had by a rebalance. Typically our customers perform these at low or quiet times, but the main advantage is that you don’t need to take the application completely offline as you continue to scale.

Q: What’s a vbucket?

A: A vbucket is our way of logically partitioning data so that it can be spread across all the nodes within a cluster. Every Couchbase-type bucket that gets created on the cluster is automatically (and transparently) split up into a static set of slices (the vbuckets). These are then “mapped” to individual servers. When a node is added or removed, it is these slices that get moved around and re-mapped to provide linear and non-disruptive scaling. While totally abstracted from the application and user, it’s important to realize that vbuckets exist “under-the-hood” to provide much of the wonderful capabilities that Couchbase Server has. You can learn more about the vbucket concept.

Monitoring

Q: Is the Couchbase Server Web UI the only method of monitoring a Couchbase Server cluster?

A: Not necessarily, no. All that you see and can do in the Web UI is actually driven by our REST interface that is programmatically accessible externally. Additionally, each individual server (and each individual bucket on that server) provides its own “raw” statistics that are used by the REST API. These raw statistics are available externally as well:

It is our goal to provide as much information as possible about the system so that our users can effectively monitor it both from a capacity planning perspective and a diagnostic/troubleshooting perspective when things start to go wrong (or to prevent things from going wrong in the first place.

Q: What kind of alerting does Couchbase Server provide?

A: Technically, we are not a company that makes alerting software. In our minds, our job is to provide an interface for other systems to make use of. Most larger organizations would not want each piece of technology in their stack sending out a differently formatted set of alerts. That is why we have made it so easy to plug our statistics and monitoring data into any other system. However, we also realize that some smaller environments may in fact want our software to provide this out of the box. We are working on extending our capabilities here and already provide alerts for when nodes go down.

Autocompaction

Q: If you abort the compaction at the end of the timeperiod, is the compaction done up until that point still saved or is all compaction done thus far lost?

A: Normally, a compaction is all-or-nothing and so aborting it will lose the progress that has been made so far. However, within Couchbase Server, we are performing the compaction on a per-vbucket (see above) basis and so the whole dataset can actually be compacted incrementally without losing all of the progress it has made when aborted.

Autofailover

Q: Why is a delay imposed before the cluster will automatically failover a downed node?

A: By default, the software is configured with a 30-second minimum before automatic failover will kick in. This is designed to prevent the software from doing the “wrong thing”. For example, if a node is simply slow to respond, or there is a brief network hiccup, you wouldn’t want it to be failed over and so the cluster will wait to ensure that the node is actually down.

To get even more information, you can view the 25-30 minute videos of each week’s webinar by going here. And the authoritative place for all information regarding Couchbase Server 2.0 can be found here. While this series may have come to a conclusion, we are already planning on starting up another one to highlight not only the features of Couchbase Server 2.0, but also Couchbase Mobile, our SDKs/client libraries and more! Some of the topics will include:

Cross-cluster synchronization (aka cross-data center replication)
Backup/Restore with Couchbase Server 2.0
Upgrading from Membase 1.7
And more!

To make it even better, I’m asking you to help participate! Please comment here (or send me an email directly at perry@couchbase.com) with any topics that think we need to cover more and we’ll do our best to include them in an upcoming webinar.

Perry Krug

Posted in: Uncategorized

Posted by Perry Krug

Perry Krug is an Architect in the Office of the CTO focused on customer solutions. He has been with Couchbase for over 8 years and has been working with high-performance caching and database systems for over 12 years.

All Posts

22 Comments

Richard Stanford October 13, 2011 at 10:22 pm

I wasn\’t able to attend the advanced querying seminar, but had a quick question on group-level (hopefully I\’ve got the terminology correct). I\’ve attached the slide. The reduce function in this case returned \”7\”, the count of rows. Is there any way to combine that reduce function with the grouping behaviour, so as to return for group level one (please forgive the formatting if it\’s incorrect):

[\”a\”] 3
[\”b\”] 2
[\”c\”] 2

And for level two:

[\”a\”,\”1\”] 1
[\”a\”,\”3\”] 2
[\”b\”,\”2\”] 2
[\”c\”,\”1\”] 1[\”c\”,\”4\”] 1That would seem to follow in the spirit of the group behavior described, and would actually be phenomenally useful for some of our use-cases (specifically being able to do this dynamically with the roll-up rather than creating separate views for every case), but I\’m not sure if its impossible or just ommitted for space/clarity. Thanks!

Log in to Reply
1. Matthew Kane Parker October 14, 2011 at 12:56 am
  
  hi richard, not only is it possible to combine group with reduce, it\’s required! group/group-level can only be used in combination with reduce. your examples are correct. read up more in the couchdb view api wiki: http://wiki.apache.org/couchdb… and in the couchdb definitive guide: http://guide.couchdb.org/draft…
  
  Log in to Reply
2. Perry Krug October 14, 2011 at 5:57 pm
  
  What Matthew said below is correct, I\’ll elaborate a bit more.
  
  -You\’ll have whatever map function you want to output the index, making sure that the \”key\” that gets emitted is an array (in your case, [\”a\”,\”1\”,\”maybesomethingelse\”])
  -You\’ll also have a reduce function (most commonly the built-in _count)
  -Now, you can query the view with reduce=false and get the full index
  -You can also query (without reduce=false) with grouplevel=1 to get your first output, and grouplevel=2 to get your second
  
  I\’d appreciate your feedback in how to make that slide better, since I thought that I was detailing exactly the case you are asking for.
  
  Perry
  
  Log in to Reply
  1. Richard Stanford October 14, 2011 at 8:19 pm
    
    Its obvious from the slide that its outputting the keys, but not that its actually including a value. Probably is obvious after you\’ve used the functionality, of course! I would either have the slide include the group counts or, alternately, not include the example \”7\” count of the entire set.
    
    Log in to Reply
    1. Perry Krug October 14, 2011 at 8:21 pm
      
      Oh duh, I\’m sorry and you\’re right. :-)
      
      Log in to Reply
Patrick Durusau October 14, 2011 at 2:48 pm

Perry, why not post all the questions wiki style and let the community join in answering the questions? Would be a good community activity and what may seem like a common question to one person may seem of little interest to another. Having all the questions would be a way to smooth that out.

Log in to Reply
1. Perry Krug October 14, 2011 at 5:59 pm
  
  Patrick, I\’m all for openness and transparency, but I had literally almost 200 questions come out of this webinar series. A few were totally unrelated (\”how many people are on this webinar\”), a few unrelated to the content (\”what about Couchbase Mobile\”) and it was turning into a lot of work to group, prune, format all of them.
  
  If you\’re referring to simply taking the questions above into a wiki format, I can see that being beneficial, but I also do want to have a \”little\” bit of control over the messaging and content…I\’m sure you can understand that given the amount of confusion that already exists out there ;-)
  
  I\’m interested in hearing you feedback though…and I am very much encouraging you and anyone else to send me other questions that you feel need addressing.
  
  Perry
  
  Log in to Reply
Richard Stanford October 14, 2011 at 8:22 pm

One webinar or recording that I\’d be a very interested in would be a \”best practices\” cloud setup walkthorough. Start with an \”empty\” AWS account and end up with an EBS-based persistent Couchbase installation that will scale by adding/removing servers. I\’m sure it wouldn\’t be perfect for anyone, but it would be \”pretty good\” for many people.

This could cover things that may be changing between old-and-2.0, such as what the \”common\” best practice connections are (still using an elastic load balancer?), since there\’s still a real mix of membase/couchdb/couchbase answers out there, many of which differ.

Log in to Reply
1. Perry Krug October 14, 2011 at 9:52 pm
  
  Thanks Richard, we\’ve actually been having similar discussions internally. I think it will most likely come out in written form rather than webinar, but it\’s certainly on the general (and long) list of topics needing discussion. I\’ll see about bumping it up a few notches ;-)
  
  Log in to Reply
  1. Richard Stanford October 14, 2011 at 10:04 pm
    
    Cool. BTW, as a generic documentation comment (and one that may be out of date by now), I know I\’ve run into the issue of not knowing if a particular website entry or documentation entry is referring to 2.0 or to the \”legacy\” product, especially when describing suggested practicies. If updating everything is going to take a while, then just adding a \”this section refers to\” tag would be incredibly helpful for new users.
    
    Log in to Reply
    1. Frank October 15, 2011 at 2:37 am
      
      Richard, we are actually moving to having nice polished version specific manuals that will include admin stuff etc. So that will make it much clearer what applies to a specific version. We actually have the current DRAFT for the Couchbase Server 2.0 manual already: http://docs.couchbase.org/couc… It is a work in progress and still needs a bunch of polish and updates but you can see where we are heading.
      
      Log in to Reply
Suraj B December 14, 2012 at 10:45 am

Hello Friends,

I have one query related to couchbase view. I have group of 1000 records i.e documents into buckets. I want to retrieve single record from that records by passing parameter from C# side. Please tell me solution asap. It will be help for me.

Thanks in advance

Suraj B

Log in to Reply
1. Perry Krug December 14, 2012 at 6:32 pm
  
  Hi Suraj, you\’ll want to take a look at http://www.couchbase.com/devel… to find out all the different ways you can get data out of Couchbase Server using C#. If you have other questions, it would be best to ask the larger community at: http://www.couchbase.com/forum…
  
  Perry
  
  Log in to Reply
2. jzablocki December 18, 2012 at 1:34 pm
  
  Hi Suraj,
  
  You can see examples of passing parameters to views in C# at http://www.couchbase.com/docs/…. As a quick example, if you\’re searching for a key, you call client.GetView(\”design_doc\”, \”view_name\”).Key(\”Your Key Here\”);
  
  Log in to Reply
Gj July 22, 2013 at 6:53 pm

Where is the Vbucket map and Vbucket-server maps stored on couchbase exactly. Is it on every node or cluster or ?? . Also, are Vbuckets slices in the bucket i.e. each bucket has around 1024 vbuckets or ??. What is the hierarchy is couchbase … Cluster -> Nodes -> Buckets -> Vbuckets?

Log in to Reply
1. Matt Ingenthron July 22, 2013 at 7:02 pm
  
  Each bucket has 1024 vbuckets. The vbuckets are mapped to nodes by the cluster manager, so you can look at the vbucket as a logical slice of a bucket and the nodes as places those vbuckets are mapped to. So, it\’d be Cluster -> Buckets -> vbuckets where the nodes are simply resources that elements are mapped to.
  
  Storage of the vbucket map is internal to the cluster manager, but it\’s accessible through every node of the cluster through an HTTP request for the configuration for a given bucket.
  
  Hope that helps!
  
  Log in to Reply
kman July 30, 2013 at 7:33 pm

Hi my name is Kevin and I recently started using couchbase for a task of sorting through documents and finding useful data. My map-reduce function works perfectly on the development time subset of my data, but returns empty when I try to run it on the full cluster data set. Any insight as to why this might be? Thank you!

Log in to Reply
1. kmanc July 30, 2013 at 7:34 pm
  
  as* jeeze sorry
  
  Log in to Reply
2. Matt Ingenthron July 30, 2013 at 8:26 pm
  
  It could be that your map function is hitting an error on a document that doesn\’t exist in your dev time subset. Check to be sure that you have if guards for each of the elements you\’ll reference over the course of execution and check the logs for execution errors.
  
  http://www.couchbase.com/docs/…
  
  Log in to Reply
  1. kmanc July 31, 2013 at 12:02 pm
    
    Thanks Matt!
    
    Log in to Reply
Razz12 October 15, 2018 at 6:57 am

Hey!! I ran into an issue. My requirement was to copy a document from one bucket to another bucket. The problem is the source and destination buckets are in different clusters and their VPNs are different. I wrote a java program that replicates the document from bucket to another. Since the VPNs are different, it is throwing ConfigurationException while trying to open the source bucket. Below is the piece of code.

CouchbaseEnvironment sourceEnv = DefaultCouchbaseEnvironment.builder()
.bootstrapHttpDirectPort(8091)
.kvTimeout(10000)
.continuousKeepAliveEnabled(true)
.connectTimeout(TimeUnit.SECONDS.toMillis(10000))
.socketConnectTimeout(10000)
.build();
Cluster sourceCluster = CouchbaseCluster.create(sourceEnv, couchbaseHost);
Bucket sourceBucket = sourceCluster.openBucket(bucketName, bucketPwd);

Any help will be greatly appreciated!!!

Log in to Reply
Perry Krug October 15, 2018 at 8:14 am

Hi Razz12, it would really be best for you to post this on our actual forums so that the engineers and rest of the community can help identify whatever issue you’re having: https://www.couchbase.com/forums/

Log in to Reply