There are two simple linux OS level settings that people seem to be overlooking setting correctly on their production systems I have seen. These are documented elsewhere, but they keep coming up and seems like they need some quick review here. It is not like these are some super secret setting or magic bullet performance fixing items necessarily, but they are things that in a production Couchbase DB should be set correctly as below and incorporated into whatever system/process you use to bootstrap the nodes you use for Couchbase. They help with memcached performance and rebalance performance and in some cases stability issues.

Please make sure you test these out in a test environment first before moving to production with them obviously.

Swappiness should to be turned off

This one is pretty straightforward if you know about the Linux virtual memory system. Swappiness levels tell the virtual memory subsystem how much it should try and swap to disk. The thing is, the system will try to swap out items in memory even when there is plenty of RAM available to the system. The OS default is usually 60, which is a little aggressive IMO. You can see what value your system is set to by running the following command:

cat /proc/sys/vm/swappiness

Since Couchbase is tuned to really operate in memory as much as possible. You can gain or at minimum not lose performance by just changing the swappiness value to 0. In non-tech talk, this tells the virtual memory subsystem of the OS to not swap items from RAM to disk unless it really really has to, which if you have sized your nodes correctly, swapping should not be needed. To set this, perform the following process use sudo or just become root if you ride in the wild west.

# Set the value for the running system
sudo sh -c 'echo 0 > /proc/sys/vm/swappiness'

# Backup sysctl.conf
sudo cp -p /etc/sysctl.conf /etc/sysctl.conf.date +%Y%m%d-%H:%M

# Set the value in /etc/sysctl.conf so it stays after reboot.
sudo sh -c 'echo “” >> /etc/sysctl.conf'
sudo sh -c 'echo “#Set swappiness to 0 to avoid swapping” >> /etc/sysctl.conf'
sudo sh -c 'echo “vm.swappiness = 0” >> /etc/sysctl.conf'

Make sure that you either have or modify your process that builds your OSs to do this. This is especially critical for public/private clouds where it is so easy to bring up new instances. You need to make this part of your build process for a Couchbase node.

Disable Transparent Huge Pages (THP)

Starting in Red Hat Enterprise Linux (RHEL) version 6, so this includes CentOS 6 and 7 too, a new default method of managing huge pages was implemented in the OS. Ubuntu has this setting as well starting in 12.02, so it will need this changed as well. THP combines smaller memory pages into Huge Pages without the running processes knowing. The idea is to reduce the number of lookups on TLB required and therefor increase performance. It brings in abstraction for automatation and management of huge pages basically.  Couchbase Engineering has determined that under some conditions, Couchbase Server can be negatively impacted by severe page allocation delays when THP is enabled. Couchbase therefore recommends that THP be disabled on all Couchbase Server nodes

Confirm if the OS settings need to be disabled

Check the status of THP by issuing the following commands:

cat /sys/kernel/mm/transparent_hugepage/enabled
cat /sys/kernel/mm/transparent_hugepage/defrag

On some Red Hat or Red Hat variants, you might have to do this:

cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
cat /sys/kernel/mm/redhat_transparent_hugepage/defrag

If in one or both files, the output looks like this, you need the below procedure:

[always] madvise never

 

Copy the Init Script

The init script is designed to make sure the changes are made around the same time as Couchbase is loaded on reboot.

#!/bin/bash
### BEGIN INIT INFO
# Provides: disable-thp
# Required-Start: $local_fs
# Required-Stop:
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Disable THP
# Description: disables Transparent Huge Pages (THP) on boot
### END INIT INFO

 

case $1 in
  start)
    if [ -d /sys/kernel/mm/transparent_hugepage ]; then
       echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
       echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag
    elif [ -d /sys/kernel/mm/redhat_transparent_hugepage ]; then
      echo 'never' > /sys/kernel/mm/redhat_transparent_hugepage/enabled
      echo 'never' > /sys/kernel/mm/redhat_transparent_hugepage/defrag
   else
      return 0
   fi
   ;;
esac

 

How to Register the Code in the OS

Do the following:

Create a file with the above code

$ sudo vi /etc/init.d/disable-thp

 

Chmod the file to be executable

$ sudo chmod 755 /etc/init.d/disable-thp

 

Execute it so it takes effect right now

$ sudo service disable-thp start

 

Make sure the init script starts at boot

Red Hat variants:

$ sudo chkconfig disable-thp on

 

Ubuntu:

$ sudo update-rc.d disable-thp defaults

 

Test the Process

Check the status of THP by issuing the following commands:

cat /sys/kernel/mm/transparent_hugepage/enabled
cat /sys/kernel/mm/transparent_hugepage/defrag

 

On some Red Hat or Red Hat variants, you might have to do this instead:

cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
cat /sys/kernel/mm/redhat_transparent_hugepage/defrag

 

For both files, the output should be like this:

always madvise [never]

 

Note: There is a different way to do this that you will find elsewhere and edits /etc/grub.conf. My problem with it is that it would get blown out with each and every kernel update in the future. What I propose is easier to manage in the long run and easy to put into something like Puppet module or Chef recipe to append to the end of rc.local when you boot strap a node.

THP is a great feature for some things, but causes problems with applications like Couchbase. It is not alone in this. If you go search the Internet for transparent huge pages, there are multiple documented issues from other DB and application vendors about this. Until something has been found to work with this, it is just best to turn THP off.

Author

Posted by Kirk Kirkconnell, Senior Solutions Engineer, Couchbase

Kirk Kirkconnell was a Senior Solutions Engineer at Couchbase working with customers in multiple capacities to assist them in architecting, deploying, and managing Couchbase. His expertise is in operations, hosting, and support of large-scale application and database infrastructures.

9 Comments

  1. \”sudo echo > file\” doesn\’t do what you think it does.

    \”sudo sh -c \’echo > file\’\” (or echo | sudo tee file) does.

    1. Good catch. Fixed. Thank you!

  2. While I agree that swappiness should be low, you might want to run through some tests with the 2.6.32 kernel. There was a modification to how vm.swappiness=0 behaves, and I have seen reports of the kernel killing services like MySQL due to OOM, even when some of the RAM was still being used for file cache. I run with vm.swappiness=1 instead of 0, which still means the kernel shouldn\’t swap unless it really really has to… but is able to if it has to, versus killing off processes.

    1. Cool. I shall take a look. Would you happen to have a link handy about the changes to 2.6.32 around this? If not, no worries. I will go search for it.

      1. Not sure if you still need this, but I found a blog post that sums up the modification and the resulting changes in behavior: http://www.percona.com/blog/20

        In case that post goes away at some point, here is some of the info in the post:

        Kernel code commit info:
        commit fe35004fbf9eaf67482b074a2e032abb9c89b1dd
        Author: Satoru Moriya <satoru.moriya@hds.com>
        Date: Tue May 29 15:06:47 2012 -0700
        mm: avoid swapping out with swappiness==0

        RHEL changelog info:
        * Mon Aug 27 2012 Jarod Wilson <jarod@redhat.com> [2.6.32-303.el6]

        – [mm] avoid swapping out with swappiness==0 (Satoru Moriya) [787885]

        1. I have been checking internally and there seems to be some debate on this one. The recommendation is still to set swappiness=0 from what I am hearing, though we do have customers that have gone to 1. I will dig deeper and do some more testing.

          1. I hear yeah.
            It\’s mostly a \”how do you want it to fail and how quickly\” type of question I suppose, and not a situation people should encounter often. Hopefully anyone would have RAM usage related alarms going off well before the kernel faces the choice of swapping or killing Couchbase :)

  3. Hi,

    I followed your advice any my rc.local file looks like this:

    ————————————————————–

    #!/bin/sh -e
    #
    # rc.local
    #
    # This script is executed at the end of each multiuser runlevel.
    # Make sure that the script will \”exit 0\” on success or any other# value on error.
    #
    # In order to enable or disable this script just change the execution
    # bits.
    #
    # By default this script does nothing.
    [ -x /sbin/initctl ] && initctl emit –no-wait google-rc-local-has-run || true
    exit 0

    if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
    echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
    fi

    if test -f /sys/kernel/mm/transparent_hugepage/defrag; then
    echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag
    fi

    ——————————————————————————-

    Now I\’m really green at linux, but doesn\’t the exit 0 mean that it would never get to transparent_hugepage tests? Please help me understand/fix this.

    Kind regards,
    David

Leave a reply