June 19, 2014

DataStax FUD

This is not the first time I've addressed FUD (link). However, it is the first time I've addressed FUD directed at Couchbase and that's a great thing. After all, I know what happens next.

First they ignore you, then they laugh at you, then they fight you, then you win. - Gandhi

I believe the best way to deal with FUD is to address it. Today, I'm addressing a blog post by DataStax (link).

Asynchronous Writes

Like MongoDB circa 2013, Couchbase performs asynchronous writes by default.

This is not true. Yes, MongoDB performed asynchronous writes by default. It did not respond to write requests (link). However, that is no longer the case with MongoDB 2.6 (link). Couchbase Server does not perform asynchronous writes. It responds to write requests.

Persistence & Performance

Couchbase can be forced to persist writes to disk, but doing so kills performance; since there is no commitlog or journaling, each write must update Couchbase's btree and fsync.

This is misleading. Yes, Couchbase Server writes to memory first. Next, it synchronizes in-memory data to the storage device. However, so does DataStax Enterprise (Apache Cassandra). By default, it does not fsync after writing to the commit log. Thus, it writes to memory first. It writes to the page cache via the OS. If DataStax Enterprise is configured to fsync after writing to the commit log, it kills performance.

Buckets & Documents

Couchbase's storage engine has trouble dealing with more than five buckets (analogous to relational tables).

This is not true. A bucket is analogous to a database. There is no schema. There are no tables. That one of the benefits of a document database.

Consistency & Availability

Couchbase manages to be neither fully consistent, nor fully available: it cannot serve reads during failover or network partitions, but it can still serve stale data to reads.

This is not true and it makes no sense. It cannot serve reads, but it can still serve data to reads?

Couchbase Server maintains strong consistency. By default, automatic failover is disabled. If a node is unavailable or unresponsive, its data is not available. Couchbase Server is CP. However, Couchbase Server can be configured for AP.

Is DataStax Enteprise fully consistent? You can find the answer here.

FUD dispelled.

Recommendations

DataStax published a blog post on how not to benchmark Apache Cassandra (link). It's a great start.

Don't perform distributed database benchmarks with virtual machines, shared storage, misconfigured drives, inadequate load and with small data sets.

I agree. However, the list is incomplete.

1. Don't perform distributed database benchmarks with a single node.

A distributed database has to make trade-offs between consistency and availability (link), and those trade-offs impact performance. If a benchmark is performed with a single node, it ignores those trade-offs and hides the performance impact. The performance results are unrealistic and thus useless.

2. Don't perform distributed database benchmarks with different configurations.

For example, don't configure Couchbase Server to fsync every write while configuring DataStax Enterprise to fsync every ten (10) seconds (link and link). The performance results are bias and thus invalid. It looks like DataStax set commitlog_sync to batch. That's fair. However, DataStax did not chart latency. If commitlog_sync is set to batch, Apache Cassandra will wait 50ms before completing a write operation.


Well, that's it for today.

Comments