10 Commits

Author SHA1 Message Date
Alexander Kukushkin
416a0f7c8b Use names with "unusual" symbols in behave tests (#3162)
It'll hopefully prevent problems like #3142 in future.
2024-09-16 09:35:22 +02:00
Ants Aasma
7e53a604d4 Add synchronous replication support. (#314)
Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions.

To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over.

We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions.

On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS.

When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock.

Added acceptance tests and documentation.

Implementation by @ants with extensive review by @CyberDem0n.
2016-10-19 16:12:51 +02:00
Alexander Kukushkin
10c7fa41f3 Exclude unhealthy nodes when choosing where to clone from (#313)
Node MUST have tag clonefrom: true, be in the 'running' state and also
we should not try to clone from itself.
2016-09-21 09:42:48 +02:00
Alexander Kukushkin
d57310bbc0 Fix one more corner-case
It could take up to 10 seconds to create replication slot.
In addition to that when replica fails to connect to the master via
streaming replication it doesn't retry immediately, but with some
timeout (5 seconds). 10 + 5 == 15 what causes replication check
scenarios fail.
2016-04-13 14:09:45 +02:00
Alexander Kukushkin
ada50e418c Update scenario description 2016-03-31 17:13:29 +02:00
Alexander Kukushkin
db5999a639 Correct implementation of 'clonefrom' feature
According to https://github.com/zalando/patroni/issues/48 'clonefrom'
tag should be boolean and it should be used to mark node as a suitable
for creation of a new replica from. If there are more then one such node
in the cluster (with tag clonefrom=true), one of them will be chosed
randomly.
2016-03-30 11:30:05 +02:00
Alexander Kukushkin
62f11ab747 Attempt to export acceptance tests coverage results to coveralls 2016-03-13 09:09:31 +01:00
Oleksii Kliukin
6985df3aca Restore the test for the clone from the replica. 2016-03-11 16:59:35 +01:00
Alexander Kukushkin
c2d1eea7d0 disable clonefrom test 2016-03-10 17:19:43 +01:00
Oleksii Kliukin
998f0da3d8 Add cascading replication (backup from the replica) tests. 2016-03-10 16:05:06 +01:00