Chained MariaDB replica

In the "miscellaneous" topics we treat, we sometimes have to build infrastructures that offer web and database servers redundancy.

Context

When building database redundancy, one "simple" way is to use the built-in redundancy/replication mechanism of MariaDB to configure two database servers with MASTER-MASTER replication, and then share a floating IP between both or use a dispatcher/load balancer to select which database is active. In this case, only *one* database server is considered active at any one time, while the other just acts as a fallback (or "fail-over") option if the main db server is down. This ensures that write operations only happen on one server and that you don't have to worry about conflicts of duplicate primary keys etc.

In some specific cases where a lot of heavy reporting is necessary, it might be good to have a third database server, in SLAVE mode only, that serves as the server to be queried while the others are used for common operations.

Issue

In this case, it is important to note that this SLAVE will have to be configured as a SLAVE of the *active* MASTER. If, somehow, you end up setting this last SLAVE as a SLAVE of the *passive* MASTER, the synchronisation will not work properly (unless specific mechanisms are put in place).

This is due to the fact that a SLAVE only executes requests from the "master log" files which it finds at its master's disk. So the secondary MASTER (the passive one) reads the master log from the primary (active) one, and replicates these operations locally. But that does not end into its own master log, as otherwise the primary master would read that later on and try to execute them again.

Solution

So when a new (passive) SLAVE is registered for stats, big queries, etc, it is important to make it point to either the floating IP or the load balancer (i.e. to always point to the active db server) as its MASTER. Otherwise you will end up watching and synchronizing *only* the genuine operations happening on the secondary (passive) master directly, which are usually... *none* (as it is passive most of the time, by definition).

Testing it out

What can happen if you are pointing to the passive MASTER?

Well... the "SHOW SLAVE STATUS\G" command will tell you that everything is fine and that "Seconds_Behind_Master: 0" and "Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates". But really, it means it is in tune with the master log of the passive MASTER, which does not contain anything...

You could check that by creating a new table on the primary (active) MASTER and see it replicate on the secondary (passive) MASTER, but *not* on the SLAVE connected to the secondary MASTER.

I found that confusing, and I hope this can help some people work it out.