Blog Post

Lost your files from rm -rf * ? No problem, you have a backup

Published

I recently came across video that describes how someone had accidentally typed the ‘rm *’ linux command during production of the Toy Story 2 movie and all the graphics, the art work, the scripts “vanished”. Luckily, they had a backup of the data and the files were restored back. If you watch the video below, you’ll find a few takeaways from this story (which happens to be real). But the most important one that I want to talk about is “TAKE BACKUPS of your data”. 

Click the picture above to watch the video !

Like many of you, I have lost data a couple of times as a result of accidentally misplaced command or worse circumstances. In this blog, I want to show how you can use online backup and restore (cbbackup and cbrestore) in Couchbase Server so that you do not lose information in the event of a serious hardware, installation failure or due to an accidental ‘oops’ scenario.

cbbackup

The cbbackup command enables you to backup the entire Couchbase cluster, a single node or a single bucket, into a file that you can restore from when needed into a Couchbase cluster. The backup and restore operations can be performed on a live Couchbase cluster.

Backing up data to a file using cbbackup tool
 

To backup an entire cluster, consisting of all the buckets and all the node data:

shell> cbbackup http://HOST:8091 ~/backups \
-u Administrator -p password

‘~/backups’ is the backup path, ‘Administrator’ and ‘Password’ are the server credentials.

Assuming you have 3 buckets in your Couchbase Cluster - Default, gamesim-sample and beer-sample, the output of the cbbackup would look like the following :

The folder structure inside the backups root folder would look like the following :

There is a folder corresponding to each Couchbase Server bucket (bucket-beer-sample, bucket-default, bucket-gamesim-sample). design.json captures the design documents in each couchbase bucket. The .cbb files consists of raw data exported.

To backup all the data for a single bucket, containing all of the information from the entire cluster:

shell> cbbackup http://HOST:8091 /backups/backup-20120501 \
  -u Administrator -p password \
  -b default

‘/backups/backup-20120501’ is the backup path, ‘Administrator’ and ‘Password’ are the server credentials, ‘default’ is the bucket name.

To backup all of the data stored on a single node across all of the different buckets:

shell> cbbackup http://HOST:8091 /backups/ \
  -u Administrator -p password \
  --single-node

‘/backups’ is the backup path, ‘Administrator’ and ‘Password’ are the server credentials, --single-node is the flag.

Using multiple instances of cbbackup to write different backup files in parallel

If you have large datasets, you can parallelize and speed-up your backup by using 1 cbbackup process per couchbase server.

To backup the data from a single bucket on a single node:

shell> cbbackup http://HOST:8091 /backups \
  -u Administrator -p password \
  --single-node \
  -b default

‘/backups’ is the backup path, ‘Administrator’ and ‘Password’ are the server credentials, ‘default’ is the bucket name.

Now that you have backed up your data, how do you restore it back?

cbrestore

The cbrestore command takes the information that was backed up using cbbackup and streams the stored data into the cluster to restore the data back.

To restore a single bucket of data to a cluster:

shell> cbrestore \
    ~/backups \
    http://Administrator:password@HOST:8091 \
    --bucket-source=XXX \

To restore the bucket data to a different bucket on the cluster:

shell> cbrestore \
    ~/backups \
    http://Administrator:password@HOST:8091 \
    --bucket-source=XXX \
    --bucket-destination=YYY

‘/backups’ is the backup path, ‘Administrator’ and ‘Password’ are the server credentials. bucket-source and bucket-destination are the source and destination bucket names.

Using multiple instances of cbrestore to restore multiple files in Couchbase Server

Things to keep in mind 

  • Frequently backup your data.
  • Use a separate machine for backing up data to reduce the strain on your live couchbase cluster. Even though backup/restore commands can be run while the system is online, you might consider running these commands during off peak-load times from a separate machine.

Why is this important for performance?  

Backup is based on TAP protocol. Based on the amount of memory that is available and   the resident document ratio, it may need to read the entire dataset from disk and add more strain to the system

  • Finally, don’t forget to test your backup strategy to make sure that your backup’s work when you need them for data recovery.
  • Only the design documents are backed up and restored, indexes are not.  Indexes are rebuilt incrementally after data is restored back.
  • During restore, you can specify the -a option in cbrestore to avoid overwriting existing items. This will insert items only if they don’t already exist. 
  • You can filter the keys that are backed up using the -k backup option. For example, to backup information from a bucket where the keys have a prefix of 'object':

shell> cbbackup http://HOST:8091 /backups/backup-20120501 \
  -u Administrator -p password \
  -b default \
  -k '^object.*'

Enjoy!