patroni

mirror of https://github.com/optim-enterprises-bv/patroni.git synced 2026-01-06 14:51:31 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	a5ff38a034	Improve behave tests (#1313 ) Hopefully, make them less flaky	2019-12-02 10:33:44 +01:00
Dmitry Dolgov	dd7c3c349f	[WIP] Standby cluster implementation (#679 ) Implementation of "standby cluster" described in #657. Standby cluster consists of a "standby leader", that replicates from a "remote master" (which is not a part of current patroni cluster and can be anywhere), and cascade replicas, that replicate from the corresponding standby leader. "Standby leader" behaves pretty much like a regular leader, which means that it holds a leader lock in DSC, in case if disappears there will be an election of a new "standby leader". One can define such a cluster using the section "standby_cluster" in patroni config file. This section provides parameters for standby cluster, that will be applied only once during bootstrap and can be changed only through DSC.	2018-09-07 10:10:56 +02:00
Alexander Kukushkin	4ca8a6e506	Make retries of calls to DCS consistent across implementations (#805 ) in addition to that do a small refactoring of zookeeper and consul and try to improve the stability of AT	2018-09-06 08:37:26 +02:00
Alexander Kukushkin	a513a7bb68	Improve stability of acceptance tests (#780 ) last time tests were failing due to postgres/patroni slowness in picking sync standby	2018-08-29 11:13:18 +02:00
Alexander Kukushkin	5668367181	Implement '/sync' and `/async` endpoints (#578 ) They will respond with http status code 200 only when the node is running as a synchronous or asynchronous replica. Fixes https://github.com/zalando/patroni/issues/189 Fixes https://github.com/zalando/patroni/issues/415	2018-01-05 15:28:40 +01:00
Alexander Kukushkin	4328c15010	Make Patroni Kubernetes native (#500 ) * Use ConfigMaps or Endpoins for leader elections and to keep cluster state * Label pods with a postgres role * change behavior of pip install. From now on it will not install all dependencies, you have to specify explicitly DCS you want to use Patroni with: `pip install patroni[etcd,zookeeper,kubernetes]`	2017-12-08 16:55:00 +01:00
Alexander Kukushkin	37c1552c0a	Smart pg_rewind (#417 ) Previously we were running pg_rewind only in limited amount of cases: * when we knew postgres was a master (no recovery.conf in data dir) * when we were doing a manual switchover to a specific node (no guaranty that this node is the most up-to-date) * when a given node has nofailover tag (it could be ahead of new master) This approach was kind of working in most of the cases, but sometimes we were executing pg_rewind when it was not necessary and in some other cases we were not executing it although it was needed. The main idea of this PR is first try to figure out that we really need to run pg_rewind by analyzing timelineid, LSN and history file on master and replica and run it only if it's needed.	2017-05-19 16:32:06 +02:00
Alexander Kukushkin	d138a8db17	AT for master_start_timeout + minor fixes (#361 )	2016-12-09 12:02:41 +01:00
Ants Aasma	7e53a604d4	Add synchronous replication support. (#314 ) Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions. To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over. We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions. On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS. When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock. Added acceptance tests and documentation. Implementation by @ants with extensive review by @CyberDem0n.	2016-10-19 16:12:51 +02:00
Alexander Kukushkin	7006a4ee14	Sometimes replica can't attach to the master after pg_rewind The reason for that is: it takes up to 10 seconds to create replication slot + up to 5 seconds to start straming and recover.	2016-04-13 14:28:00 +02:00
Alexander Kukushkin	d57310bbc0	Fix one more corner-case It could take up to 10 seconds to create replication slot. In addition to that when replica fails to connect to the master via streaming replication it doesn't retry immediately, but with some timeout (5 seconds). 10 + 5 == 15 what causes replication check scenarios fail.	2016-04-13 14:09:45 +02:00
Alexander Kukushkin	01da5266a0	Give time for running healh-checks when promoting replica	2016-04-13 13:32:39 +02:00
Alexander Kukushkin	24a2ea6cef	Refactor acceptance tests to make them work against ZooKeeper and make it easier to implement controllers for new DCS, i.e. consul	2016-04-10 10:37:43 +02:00
Alexander Kukushkin	62f11ab747	Attempt to export acceptance tests coverage results to coveralls	2016-03-13 09:09:31 +01:00
Alexander Kukushkin	42d798a3de	acceptance tests on travis	2016-03-10 17:19:10 +01:00
Oleksii Kliukin	c9b8c2d3a9	Bugfixes, add a function to kill patroni daemon, make the feature description more concise.	2016-02-24 19:22:42 +01:00
Oleksii Kliukin	6f03953268	Merge basic failover and basic replication scenarios in one feature.	2016-02-24 17:12:45 +01:00
Oleksii Kliukin	6ec3523748	Collect test output, add basic failover test.	2016-02-24 16:30:52 +01:00
Oleksii Kliukin	38bd037d99	Add the 1st lettuce test for the basic replication. Basically check that the table inserted on the primary will get its way to the secondary.	2016-02-05 13:30:42 +01:00

19 Commits