[DRBD] Fixing a split-brain

For the most part DRBD is pretty resilient but if a power failure occurs on both nodes or if you screw up an update on a Corosync cluster, you have a good chance to finish with a split-brain situation. In that case DRBD automatically disconnect the resources and let you must fix the mess by hand.

Check nodes status

cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by gardner@, 2011-12-1
2 23:52:00
 0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:76

The master isn’t happy and running.
The secondary node isn’t better either:

cat /proc/drbd
version: 8.4.0 (api:1/proto:86-100)
GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by gardner@, 2011-12-1
2 23:52:00
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
    ns:0 nr:0 dw:144 dr:4205 al:5 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:100

Fixing the cluster

To repair the cluster we will declare one node “obsolete” (we choose the secondary here) and then reconnect resources so they can resume synchronization.

On the “obsolete” node:

drbdadm secondary all
drbdadm disconnect all
drbdadm -- --discard-my-data connect all

On the master node:

drbdadm primary all
drbdadm disconnect all
drbdadm connect all