Customer Question:
We have a 4 nodes replicaset w/ leading nodes each with different priority. From time to time, when we suffer from network issues, all the replicaset goes down.
Why is this Happens?
Since two of the nodes are a single site and the two other on two other sites, when the first site goes offline, no site can create a majority. Therefore, you must remove one of the nodes in the first site to avoid these downtimes.
Moreover, the MongoDB selects the primary node by priority. If the primary is unstable, everytime it will go offline, a secondary will be chosen w/ a possible few secondds replicaset downtime. When the high priority node goes back online, it will be selected again to primary (and causing another possible downtime).
Therefore, if there is no good reason, avoid specifing variable priorities.
How to fix It?
1. Perform the task during off peak hours
2. Remove the arbiter node by rs.remove()
3. Verify replicaset is okay by running rs.status()
4. Modify the remaining nodes configuration:
cfg = rs.conf()
cfg.members[0].priority = 1
cfg.members[1].priority = 1
cfg.members[2].priority = 1
rs.reconfig(cfg)
5. Verify again.
Bottom Line
Keep it Simple :-)
Keep Performing,