mirror of
https://github.com/outbackdingo/patroni.git
synced 2026-01-27 10:20:10 +00:00
During the network split some Etcd nodes could become stale, however they are still available for read requests. It is not a problem for the old primary, because such etcd nodes are not writable and primary demotes when fails to update the leader lock. When the network split resolves and Etcd node connects back to the cluster, it may trigger the leader election (in Etcd), what typically results in some failures of client requests. Such state quickly resolves, and client requests could be retried. but it takes some time for the disconnected node to catch up. During this time it shows stale data for read requests without quorum requirement. It could happen that the current Patroni primary is impacted by failed client requests while Etcd cluster is doing leader elections and there is a chance that it may switch to the stale Etcd node, discover that someone else is a leader, and demote. To protect from this situation we will memorize the last known "term" of the Etcd cluster and when executing client requests we will compare the "term" reported by Etcd node with the term we memorized. It allows to detect stale Etcd nodes and temporary ignore them by switching to some other available Etcd nodes. An alternative approach to solve this problem would be using quorum/serializable reads for read requests, but it will increase resource usage on Etcd nodes. Close #3314