What the internet won't tell you about setting up mongo replica sets

Over the last week or so I have been working on designing a simple autoscalable MongoDB cluster for hosting various projects. Seems like it should be easy, but a couple of details took way too much searching to figure out.

Single node replica sets

Common knowledge

A quick search turns up the fact that a replica set requires at least two servers and is entirely different from standalone mode. Tutorials on setting up a replica set typically start with setting up three empty servers then calling rs.initialize() with three addresses.

The truth

A replica set will work just fine with a single node.

Why everyone says you can’t

In normal operation, a single node replica set would be a bit silly - it uses more resources than standalone mode, but without the benefit of having a replacement server available, which is the whole point of replica sets.

Why it matters

Automating setup of the replica set is far easier if all nodes after the first are identical:

On server 1:

Call rs.initialize with only one node configured

On server 2..n

Connect to server 1
Call rs.add with own address

Even number of nodes

Common knowledge

There is no shortage of sources saying that a replica set should have an odd number of nodes, and that if you have an even number you need to add an arbiter to get back to odd numbers.

The truth

Having an even number of nodes will not cause any problems - The issue is not the possibility of deadlocked elections or something similarly catastrophic, but that adding a fourth server does not usually provide any value until you add the fifth.

Why everyone says you can’t

The primary purpose of a mongo replica set is to be able to tolerate failing servers. This only works when more than half the replica set survives the failure.

If you have two servers and one fails, there is no way to tell the difference between the primary failing (secondary should become primary) and the secondary becoming disconnected from a functioning primary (secondary should wait for reconnection). The cluster becomes read only, which is far better than potentially allowing both servers to think they are the survivor and start writing inconsistent data.

If you have 3 servers configured, you need 2 surviving servers to form a majority, so the replica set can survive a single server failing.

Add a fourth server, and you need three survivors to make a majority - you can still only tolerate a single failure.

Adding a fifth server to get back to odd numbers, the majority requirement stays at three, so you gain the ability to handle an additional server failing.

Note that the probablity of 2/4 servers failing is slightly higher than the probabilty of 2/3 servers failing, so technically you are worse off with four than with three. However, your individual servers would have to be extremely flaky for this to become a significant problem.

Why it matters

Scaling. My AWS setup treats all servers as disposable - they are never rebooted, only terminated and replaced. Deciding that there will normally be three servers in the cluster is easy. Ensuring that there will never be two or four is much harder.