Let's dissect the Fall 2014 NoSQL Benchmark.
Apache Cassandra / DataStax Enterprise. MongoDB. Couchbase Server.
Go.
Hardware
Server (8) | Client (32) | |
Processor | Intel Xeon E5-2620 V2 (2) | Intel Core i5-4440 |
Memory | 64GB | 8GB |
Storage (Data) | 100GB SATA 6Gb/s SSD (2) | |
Storage (OS) | 80GB SATA 6Gb/s SSD |
Network: 96Gb/s (1Gb/s Switch with 48 Ports)
Wait, 96Gb/s? It's a full duplex matrix. (link)
Why not benchmark with dozens of servers?
I prefer benchmarks performd on bare-metal server, on-premise servers. It's not cheap.
Data
4 Nodes | 6 Nodes | 8 Nodes | |
Entries | 20M | 30M | 40M |
Copies | 2 | 2 | 2 |
Feilds | 10 | 10 | 10 |
Field | 150 bytes | 150 bytes | 150 bytes |
Entry | 1,500 bytes | 1,500 bytes | 1,500 bytes |
Data | 55.8GB | 83.8GB | 111.6GB |
Why not benchmark with terabytes of data?
The goal is to identify how well database perform and scale when the working sets in memory. The working set could have been 32GB, or it could have been 384GB. It wouldn't have mattered. It fit in memory. The working set could have been 10% of the data, or it could have been 90% of the data. It wouldn't have mattered. It fit in memory.
Consider customers streaming movies from Netflix. There are many, many customers. However, only a fraction of them are streaming movies at any given point in time. That's a working set. Consider web and mobile application users. I play Words with Friends with my wife. We play two to three games at a time. Every now and then I'll launch the app. I'll spend a few, if not several minutes, playing the right words to crush her. We're all about Amazon Prime. However, we only shop once in a while. When we do, we're part of a working set. Customers who are currently shopping. Not all customers. Then there is real time data. Consider click stream analysis and sensor data. The working set is users currently visiting the web site. Not all users. The working set is the last minute, the last hour, or the last day of sensor data. Not all sensor data.
While an in-memory working set improves read performance, it does not improve write performance.
All data was persistent.
Configuration
Consistency
Apache Cassandra, eventually consistent (ONE). MongoDB and Couchbase Server, consistent.
Durability / Persistence
Apache Cassandra and MongoDB and Couchbase Server, asynchronous persistence.
Note: The commit log was disable in Apache Cassandra.
Replication
There are two copies.
Topology
Apache Cassandra and Couchbase Server, one node per server. MongoDB, two nodes per server.
Note: MongoDB relies on a master / slave topology. To ensure every server responded to client read / write requests, one master and one slave (from different shards) were installed on every server.
Workloads
Read Heavy: 95% Reads / 5% Writes
Balanced: 50% Reads / 50% Writes
Database Nodes & Client Threads
The benchmarks were performed on deployments of four, six, and eight servers.
The benchmarks were performed with an increasing number of client threads.
Results
Read Heavy
Apache Cassandra
Throughput peaks between 248 and 496 client threads: 99K ops/s.
Adding nodes increases throughput.
Latency is never less than 1ms.
Adding nodes decreases latency.
MongoDB
Thoughput peaks between 132 and 198 client threads: 227K ops/s.
Adding nodes increases throughput.
Latency is less than 1ms up to 231 client threads: 200K ops/s.
Adding nodes decreases latency.
Couchbase Server
Throughput peaks at 2,970 client threads: 1.62M ops/s.
Adding nodes increases throughput.
Latency is less than 1ms up to 1,320 client threads: 1.44M ops/s.
Adding nodes decreases latency.
MongoDB v Couchbase Server
Let's compare the throughput and latency of MongoDB and Couchbase server under equivalent load.
Couchbase Server throughput increases under increasing load. MongoDB, not so much.
Couchbase Server latency is less than 0.6ms up to 264 client threads: 642K ops/s.
MongoDB is less than 1ms up to 231 client threads: 200K ops/s
Explanation
- MongoDB read latency benefits from memory mapped files. However, memory mapped files do not perform as well as direct memory. That, and MongoDB relies on an archaic, complex topology. In fact, the benchmark compared MongoDB with 16 nodes to Couchbase Server with 8 nodes.
Apache Cassandra v Couchbase Server
Couchbase Server throughput increases under increasing load. Apache Cassandra, not so much.
Couchbase Server latency is less than 0.7ms up to 792 client threads: 1.22M ops/s.
Apache Cassandra latency is never less than 1ms.
Explanation
Apache Cassandra relies on a poor, disabled by default row cache (JIRA).
Balanced Read / Write
Apache Cassandra
Throughput peaks between 272 and 544 client threads: 89K ops/s.
Adding nodes increases throughput.
Latency is less than 1ms up to 680 client threads: 79K ops/s.
Adding nodes decreases latency.
MongoDB
Throughput peaks at 256 client threads: 85K ops/s.
Adding nodes increases throughput.
Latency is never less than 1ms.
Adding nodes decreases latency.
Couchbase Server
Throughput peaks at 3,000 client threads: 1.18M ops/s.
Adding nodes increases throughput.
Latency is less than 1ms up to 480 client threads: 499K ops/s.
Adding nodes decreases latency.
MongoDB v Couchbase Server
Let's compare the throughput and latency of MongoDB and Couchbase Server under equivalent load.
Couchbase Server throughput increases under increasing load. MongoDB, not so much.
Couchbase Server latency is less than 0.8ms up to 360 client threads: 457K ops/s.
MongoDB latency is never less than 1ms.
Explanation
Well, a single lock doesn't help (manual | JIRA). Document level locking has been requested since 2010.
Apache Cassandra v Couchbase Server
Couchbase Server throughput increases under increasing load. Apache Cassandra, not so much.
Couchbase Server latency is less than 1ms up to 480 client threads: 499K ops/s.
Apache Cassandra latency is less than 1ms up to 680 client threads: 79K ops/s.
Explanation
Apache Cassandra write latency benefits from the memtable, but Couchbas Server was engineered for concurrency.
Conclusion
Reads | Writes | |||
Low Throughput | High Thoughput | Low Thoughput | High Throughput | |
Apache Cassandra | Poor | Poor | Great | Poor |
MongoDB | Great | Poor | Poor | Poor |
Couchbase Server | Great | Great | Great | Good |
Questions?
Comment or join the discussion over on Reddit or HN.
You can download the white paper here.