Blog Post

Distributed Databases and Replication Design

Shane Johnson of Couchbase Published

One of the most important elements of distributed database architecture is replication. In fact, it defines the database architecture. It determines whether or not the data is consistent / available.

Master / Slave

Writes are executed on master nodes, replicated to slave nodes. If consistency is required, reads are executed on master nodes. If it is not, reads are executed on master nodes and / or slave nodes.

What's the problem?

Let's assume a) there is one node per physical server and b) writes are replicated to two slaves.

The problem is... two-thirds of the nodes are passive, two-thirds of the nodes do not respond to write requests, may not respond to read requests. The problem is... there are three times the number of nodes needed to meet throughput / latency requirements.

Masterless

Writes are exectued multiple nodes. If consistency is required, reads are executed on multiple nodes.

What's the problem?

Let's assume a) there is one node per physical server and b) writes are replicated to two nodes.

The problem is... consistent reads / writes are executed on multiple nodes. It requires multiple requests, one from the client to the server and two from the server to two other servers. The node receiving the write request must send write requests to the other two nodes resulting in higher latency, less throughput.

Primary Owner

It's the best of both: reads / writes are executed on the primary owner and replicated, and reads / writes are executed on every node. Like a master / slave design, consistency is maintained by executing reads / writes on the primary node. Like a masterless design, every node is a primary node.

One of these is a poor design. One of these is good design. One of these is a great design.

What do you think?

Discuss on Hacker News
Discuss on Reddit