Hello everyone, Perry Krug hailing you from the Couchbase headquarters in Mountain View, CA where we’ve just released the next major version of Membase Server: 1.7.

For a visual demonstration of the new UI in 1.7, including a look at our 100-node cluster, please watch a brief (5 minute) video.

You are missing some Flash content that should appear here! Perhaps your browser cannot display it, or maybe it did not initialize correctly.

For details of the release, read on!

Along with the usual stability improvements and bug fixes, we’ve added a whole slew of new features and capabilities:

  • New commands for working with expiration times and synchronous replication
  • Greatly enhanced clustering, replication and rebalancing
  • Basic alerting on error conditions and upgrade availability
  • Redesigned UI for better and more granular monitoring

It’s this last point that I want to focus on today.  See Frank’s blog for more details on the other new additions and Dustin’s blog for the new commands.

The management of older versions of Membase Server (1.6.x) was made possible through a streamlined UI built on top of our REST API. We received many accolades on the statistics we presented, and the fact that you could see up-to-the-second real-time traffic statistics of a live Membase node/cluster. but, there were some limitations: you were limited to 30 viewable statistics; these statistics were only available as aggregated numbers across all the nodes within a cluster; the indication of “up” or “down” of an individual server was limited and sometimes confusing.

Through our various production and test deployments, we gained an immense amount of experience on how to better monitor and troubleshoot Membase Server, which is reflected in the new release. We’ve introduced a variety of new areas of monitoring and have improved upon the existing capabilities:

Per-server statistics

With 1.7, you can now use a drop-down menu to select an individual server. With clusters growing larger by the minute, you can also enter a few or all digits of the server name (either IP address or hostname depending on your configuration) and receive immediate results without having to scan through a long list.

When viewing an individual server, we also added stats around RAM, swap and CPU usage (we learned that it is not really useful to aggregate these.)

Slicing-and-dicing this in a different way, 1.7 also allows for any individual statisic (let’s say “disk write queue”) to be displayed across all servers. Simply click on the blue arrow within the mini-graph of any statistic and you’ll see what I mean.

Expanded field of statistics

Prior to 1.7, Membase Server exposed 16 statistics by default with the ability to choose another 14 to display, which is nothing to sneeze at. However, now with 1.7, you have roughly 87 (at last count) to choose from…that’s almost a 3x increase!  (I say “roughly” because a few statistics are duplicated in various places to make it easier to pull them out.)

The statistics are now broken down into a few sections for easier viewing:

  • “Server Resources”– RAM/swap/CPU usage of an individual server
  • “Summary” – quick-glance statistics including ops per second and item counts, etc. These 12 stats provide continuity between 1.6.x versions and 1.7
  • “vbucket Resources” – Much more detailed information about the status, count, activity and memory utilization of our active, replica and pending vbuckets. A total column is also provided.
  • “Disk Queues” – Expanding on the singular “disk write queue” of 1.6.x, this new section breaks out the item counts, drain/fill rates and average age of data in the various disk queues for active, replica, and pending vbuckets.
  • “TAP Queues” – Never before seen graphs and monitoring of the various inter-node communication queues related to replication and rebalancing. We also have a section for “client” TAP queues representing the data flowing to external consumers of our TAP data.

As I mentioned above, all of these statistics are available per-bucket and either aggregated across the whole cluster or from an individual node.

Improved Server Status monitoring

In 1.6.x, our only indication on the status of a server was either “up” or “down”…and that status was derived strictly from Erlang’s view. We ran into cases where the UI reported the server as “up” when in reality it was not.

Membase Server 1.7 enhances the up/down monitoring to add a third status of “pending” (denoted by a yellow box). This is designed to indicate that the actual memcached process (responsible for the actual serving of data) is not functioning properly. It may have recently crashed and is warming up (indicated by an increasing item count) or is unresponsive to stats requests (rare, but possibly indicating a hung process).

This new status indicator helps to shed more light on the level of function each node has within the cluster.

We also added a pie chart to each bucket in the “Manage Buckets” screen to indicate how much of the data within that bucket is available. If a node fails, some percentage of the data is possibly unavailable and we wanted to help make that clear.

Monitor Server Screen

A brand new UI page showing a concise list of all nodes within the cluster. While this information is available elsewhere as well, we felt it important to present some specific statistics about each node in an easy to view and search list.

This new screen includes an up/down/pending status marker, swap/RAM/CPU usage and Active/Replica item counts for each server. You can also click on the server name to get the detailed graphs for each bucket on that server.

All this work was done with our customers/users/support team in mind and we made sure the keep the best parts:

  • Simple, clean and FAST interface for managing and monitoring a Membase Server cluster
  • Per-second statistics for live traffic monitoring
  • Completely RESTful API allowing for external monitoring (in case our UI isn’t enough!)
  • “All nodes created equal” means that any node of a cluster can serve the UI and has a consistent view of the cluster
  • Easily selectable mini-graphs to display on the “main graph”

Thanks for reading, thanks for your interest, thanks for your support. Until next time…

Perry Krug
Solutions Architect

Posted by Perry Krug

Perry Krug is an Architect in the Office of the CTO focused on customer solutions. He has been with Couchbase for over 8 years and has been working with high-performance caching and database systems for over 12 years.

Leave a reply