patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-28 02:20:04 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	496d14e6ca	Better handling of failed pg_rewind attempt (#2304 ) Close #2302	2022-05-19 14:52:26 +02:00
Alexander Kukushkin	5f6197aaad	Don't copy logical slot if there is mismatch with the config (#2274 ) A couple of times we have seen in the wild that the database for the permanent logical slots was changed in the Patroni config. It resulted in the below situation. On the primary: 1. The slot must be dropped before creating it in a different DB. 2. Patroni fails to drop it because the slot is in use. Replica: 1. Patroni notice that the slot exists in the wrong DB and successfully dropping it. 2. Patroni copying the existing slot from the primary by its name with Postgres restart. And the loop repeats while the "wrong" slot exists on the primary. Basically, replicas are continuously restarting, which badly affects availability. In order to solve the problem, we will perform additional checks while copying replication slot files from the primary and discard them if `slot_type`, `database`, or `plugin` don't match our expectations.	2022-04-14 12:10:37 +02:00
Alexander Kukushkin	81912c9cae	Handle rewind when demoted node was shut down (#2252 ) In case of DCS unavailability Patroni restarts Postgres in read-only. It will cause pg_control to be updated with the `Database cluster state: in archive recovery` and also could set the `MinRecoveryPoint`. When the next time Patroni is started it will assume that Postgres was running as a replica and rewind isn't required and will try to start the Postgres up. In this situation there is the chance that the start will be aborted with the FATAL error message that looks like `requested timeline 2 does not contain minimum recovery point 0/501E8B8 on timeline 1`. On the next heart-beat Patroni will again notice that Postgres isn't running which would lead to another start-fail attempt. This loop is endless. In order to mitigate the problem we do the following: 1. While figuring out whether the rewind is required we consider `in archive recovery` along with `shut down in recovery`. 2. If pg_rewind is required and the cluster state is `in archive recovery` we also perform recovery in a single-user mode. Close https://github.com/zalando/patroni/issues/2242	2022-03-24 13:51:59 +01:00
Alexander Kukushkin	fce889cd04	Compatibility with psycopg 3.0 (#2088 ) By default `psycopg2` is preferred. The `psycopg>=3.0` will be used only if `psycopg2` is not available or its version is too old.	2021-11-19 14:32:54 +01:00
Alexander Kukushkin	89388c2e4b	Handle DCS exceptions when demoting (#2081 ) While doing demote due to failure to update leader lock it could happen that DCS goes completely down and the get_cluster() call raise the exception. Not being properly handled it results in postgres remaining stopped until DCS recovers.	2021-10-07 16:08:10 +02:00
Alexander Kukushkin	d394b63c9f	Release the leader lock when pg_controldata reports "shut down" (#2067 ) Due to different reasons, it could happen that WAL archiving on the primary stuck or significantly delayed. If we try to do a switchover or shut it down, the shutdown will take forever and will not finish until the whole backlog of WALs is processed. In the meantime, Patroni keeps updating the leader lock, which prevents other nodes from starting the leader race even if it is known that they received/applied all changes. The `Database cluster state:` is changed to `"shut down"` after: - all data is fsynced to disk and the latest checkpoint is written to WAL - all streaming replicas confirmed that they received all changes (including the latest checkpoint) - at the same time, the archiver process continues to do its job and the postmaster process is still running. In order to solve this problem and make the switchover more reliable/fast in a case when `archive_command` is slow/failing, Patroni will remove the leader key immediately after `pg_controldata` started reporting PGDATA as `"shut down"` cleanly and it verified that there is at least one replica that received all changes. If there are no replicas that fulfill the condition the leader key isn't removed and the old behavior is retained, i.e. Patroni will keep updating it.	2021-10-05 10:55:35 +02:00
Alexander Kukushkin	1c2bf258d6	Allow switchover only to sync nodes when synchronous replication is on (#2076 ) Close https://github.com/zalando/patroni/issues/2074	2021-10-04 16:23:45 +02:00
Michael Banck	2f31e88bdc	Add dcs_last_seen field to API (#2051 ) This field notes the last time (as unix epoch) a cluster member has successfully communicated with the DCS. This is useful to identify and/or analyze network partitions. Also, expose dcs_last_seen in the MemberStatus class and its from_api_response() method.	2021-09-22 10:01:35 +02:00
Michael Banck	fae96b3148	Improve "I am" status messages (#2056 )	2021-09-17 14:46:07 +02:00
Alexander Kukushkin	93efa91bbd	Release 2.1.1 (#2039 ) * Update release notes * Bump version * Improve unit-test coverage	2021-08-19 15:44:37 +02:00
Christian Clauss	75e52226a8	Fix typos discovered by codespell (#1997 )	2021-07-06 10:01:30 +02:00
Alexander Kukushkin	333d292eb3	Handle DNS issues in Raft implementation (#1960 ) - Resolve Node IP for every connection attempt - Handle exception with connection failures due to failed resolve - Set PySyncObj DNS Cache timeouts aligned with `loop_wait` and `ttl` In addition to that, postpone the leader race for freshly started Raft nodes. It will help with the situation when the leader node was alone and demoted the Postgres and after that, the replica arrives, and quickly takes the leader lock without really performing the leader race. Close https://github.com/zalando/patroni/issues/1930, https://github.com/zalando/patroni/issues/1931	2021-07-05 09:30:31 +02:00
Alexander Kukushkin	0ceb59b49d	Write prev LSN to before checkpoint to optime if wal_achive=on (#1889 ) The #1527 introduced a feature of updating `/optime/leader` with the location of the last checkpoint after the Postgres was shutdown cleanly. If wal archiving is enabled, Postgres always switching the WAL file before writing the checkpoint shutdown record. Normally it is not an issue, but for databases without too much write activity it could lead to the situation that the visible replication lag becomes equal to the size of a single WAL file. In fact, the previous WAL file is mostly empty and contains only a few records. Therefore it should be safe to report the LSN of the SWITCH record before the shutdown checkpoint. In order to do that, Patroni first gets the output of the pg_controldata and based on it calls pg_waldump two times: * The first call reads the checkpoint record (and verifies that this is really the shutdown checkpoint). * The next call reads the previous record and in case if it is the 'xlog switch' (for 9.3 and 9.4) or 'SWITCH' (for 9.5+), the LSN of the SWITCH record is written to the `/optime/leader`. In case of any mismatch, failure to call pg_waldump or parse its output, the old behavior is retained, i.e. `Latest checkpoint location` from the pg_controldata is used. Close https://github.com/zalando/patroni/issues/1860	2021-07-05 09:29:39 +02:00
Florian Bütler	e2d8a7d086	fix minor typo (#1991 ) close #1990	2021-07-02 08:27:17 +02:00
Alexander Kukushkin	f403719bb4	Reduce chattiness of Patroni logs (#1955 ) 1. When everything goes normal, only one line will be written for every run of HA loop (see examples): ``` INFO: no action. I am (postgresql0) the leader with the lock INFO: no action. I am a secondary (postgresql1) and following a leader (postgresql0) ``` 2. The `does not have lock` became a debug message. 3. The `Lock owner: postgresql0; I am postgresql1` will be shown only when stream doesn't look normal.	2021-06-22 09:13:30 +02:00
Alexander Kukushkin	c7173aadd7	Failover logical slots (#1820 ) Effectively, this PR consists of a few changes: 1. The easy part: In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots. In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster. 2. The tricky part: On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()` function. When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward. 3. Additional requirements: In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots. 4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use. Compatibility. The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary. The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed. Close: https://github.com/zalando/patroni/issues/1749	2021-03-25 16:18:23 +01:00
Alexander Kukushkin	e8e87bf0a1	Don't interrupt restart or promote if lost leader lock in pause (#1726 ) In pause it is allowed to run postgres as master without lock.	2020-10-08 08:56:53 +02:00
Alexander Kukushkin	fa88d80c4f	Apply master_start_timeout when executing crash recovery (#1720 ) It is not very common, but the master Postgres might "crash" due to different reasons, like OOM, or out of disk space. Of course, there are chances that the current node holds some unreplicated data and therefore Patroni by default prefers to start Postgres on the leader node rather than doing a failover. In order to be on the safe side Patroni always starts Postgres in recovery no matter whether the current node owns the leader lock or not. If the Postgres wasn't shut down cleanly, starting in recovery might fail, therefore in some cases as a workaround Patroni is executing a crash recovery by starting the postgres up in the single-user mode. A few times we end up in the situation: 1. Master postgres crashed due to the out of disk space 2. Patroni starts crash recovery in a single-user mode 3. While doing crash-recovery Patroni keeps updating the leader lock It makes Patroni stuck on step 3 and the manual intervention is required for recovering the cluster. Patroni already has the option `master_start_timeout`, which controls for how long we let postgres stay in the `starting` state and after that Patroni might decide to release the leader lock if there are healthy replicas available which could take it over. This PR makes the `master_start_timeout` option also work for crash recovery.	2020-09-30 08:04:27 +02:00
Alexander Kukushkin	2c5d62bf10	Workaround unittest bug and fix requirements (#1718 ) * unittest bug: https://bugs.python.org/issue25532 * `urllib3[secure]` wrongly depends on `ipaddress` for python3, while in fact we don't need all dependencies of the `secure` extra, but only `ipaddress` for the `kubernetes` on python2.7 Close https://github.com/zalando/patroni/issues/1717 Close https://github.com/zalando/patroni/issues/1709	2020-09-29 15:15:58 +02:00
Alexander Kukushkin	8a8409999d	Change the behavior in pause (#1687 ) 1. Don't call bootstrap if PGDATA is missing/empty, because it might be for purpose, and someone/something working on it. 2. Consider postgres running as a leader in pause not healthy if pg_control sysid doesn't match with the /initialize key (empty initialize key will allow the "race" and the leader will "restore" initialize key). 3. Don't exit on sysid mismatch in pause, only log a warning. 4. Cover corner cases when Patroni started in pause with empty PGDATA and it was restored by somebody else 5. Empty string is a valid `recovery_target`.	2020-09-18 08:25:00 +02:00
Alexander Kukushkin	4dd902fbf1	Fix bug in kubernetes.update_leader (#1685 ) Unhandled exception prevented demoting the primary. In addition to that wrap the update_leader call in the HA loop into try..except block and implement a test case. Fixes https://github.com/zalando/patroni/issues/1684	2020-09-11 10:19:03 +02:00
Sergey Dudoladov	950eff27ad	Optional fencing script (pre_promote) (#1099 ) Call a fencing script after acquiring the leader lock. If the script didn't finish successfully - don't promote but remove leader key Close https://github.com/zalando/patroni/issues/1567	2020-09-01 07:50:39 +02:00
Alexander Kukushkin	3e553df69d	BUGFIX: pause on K8s (#1659 ) On K8s the `Cluster.leader` is a valid object even if the cluster has no leader because we need to know the `resourceVersion` for future CAS operation. Such a non-empty object broke HA loop and made other nodes to think that the leader is there. The right way to identify the missing leader which reliably works across all DCS is checking that the leader's name is empty.	2020-08-24 16:35:46 +02:00
ksarabu1	1ab709c5f0	Multi Sync Standby Support (#1594 ) The new parameter `synchronous_node_count` is used by Patroni to manage number of synchronous standby databases. It is set to 1 by default. It has no effect when synchronous_mode is set to off. When enabled, Patroni manages precise number of synchronous standby databases based on parameter synchronous_node_count and adjusts the state in DCS & synchronous_standby_names as members join and leave. This functionality can be further extended to support Priority (FIRST n) based synchronous replication & Quorum (ANY n) based synchronous replication in future.	2020-08-14 11:51:07 +02:00
Alexander Kukushkin	3341c898ff	Add Etcd v3 protocol support via api gRPC-gateway (#1162 ) The only python-etcd3 client working directly via gRPC still supports only a single endpoint, which is not very nice for high-availability. Since Patroni is already using a heavily hacked version of python-etcd with smart retries and auto-discovery out-of-the-box, I decided to enhance the existing code with limited support of v3 protocol via gRPC-gateway. Unfortunately, watches via gRPC-gateway requires us to open and keep the second connection to the etcd. Known limitations: * The very minimal supported version is 3.0.4. On earlier versions transactions don't work due to bugs in grpc-gateway. Without transactions we can't do atomic operations, i.e. leader locks. * Watches work only starting from 3.1.0 * Authentication works only starting from 3.3.0 * gRPC-gateway does not support authentication using TLS Common Name. This is because gRPC-proxy terminates TLS from its client so all the clients share a cert of the proxy: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/authentication.md#using-tls-common-name	2020-07-31 14:33:40 +02:00
Alexander Kukushkin	59dae9b1bb	A few little bug-fixes (#1623 ) 1. Put the `*` character into pgpass if the actual value is empty 2. Re-raise fatal exception from the HA loop (we need to exit if for example cluster initialization failed) Close https://github.com/zalando/patroni/issues/1617	2020-07-28 08:37:04 +02:00
Alexander Kukushkin	8eb01c77b6	Don't fire on_reload when promoting to standby_leader on 13+ (#1552 ) PostgreSQL 13 finally introduced the possibility to change the `primary_conninfo` without a restart. Just doing reload is enough, but in case if the role is changing from the `replica` to the `standby_leader` we want to call only `on_role_change` callback and skip `on_reload`, because they duplicate each other.	2020-06-29 14:49:25 +02:00
Alexander Kukushkin	cd1b2741fa	Improve timeline divergence check (#1563 ) We don't need to rewind when: 1. replayed location for the former replica is not ahead of switchpoint 2. end of checkpoint record for the former primary is the same as switchpoint In order to get the end of checkpoint record we use the `pg_waldump` and parse its output. Close https://github.com/zalando/patroni/issues/1493	2020-05-29 14:15:10 +02:00
Alexander Kukushkin	98c2081c67	Detect a new timeline in the standby cluster (#1522 ) The standby cluster doesn't know about leader elections in the main cluster and therefore the usual mechanisms of detecting divergences don't work. For example, it could happen that the standby cluster is ahead of the new primary of the main cluster and must be rewound. There is a way to know that the new timeline has been created by checking the presence of a history file in pg_wal. If the new file is there, we will start usual procedures of making sure that we can continue streaming or will run the pg_rewind.	2020-05-29 14:14:47 +02:00
Alexander Kukushkin	c6207933d1	Properly handle the exception raised from refresh_session (#1531 ) The `touch_member()` could be called from the finally block of the `_run_cycle()`. In case if it raised an exception the whole Patroni process was crashing. In order to avoid future crashes we wrap `_run_cycle()` into the try..except block and ask a user to report a BUG. Close https://github.com/zalando/patroni/issues/1529	2020-05-29 14:14:11 +02:00
Alexander Kukushkin	ad5c686c11	Take advantage of pg_stat_wal_recevier (#1513 ) So far Patroni was parsing `recovery.conf` or querying `pg_settings` in order to get the current values of recovery parameters. On PostgreSQL earlier than 12 it could easily happen that the value of `primary_conninfo` in the `recovery.conf` has nothing to do with reality. Luckily for us, on PostgreSQL 9.6+ there is a `pg_stat_wal_receiver` view, which contains current values of `primary_conninfo` and `primary_slot_name`. The password field is masked through, but this is fine, because authentication happens only during opening the connection. All other parameters we compare as usual. Another advantage of `pg_stat_wal_recevier` - it contains the current timeline, therefore on 9.6+ we don't need to use the replication connection trick if walreceiver process is alive. If there is no walreceiver process available or it is not streaming we will stick to old methods.	2020-05-15 18:04:24 +02:00
Alexander Kukushkin	08b3d5d20d	Move ensure_clean_shutdown into rewind module (#1528 ) Logically fits there better	2020-05-15 16:22:57 +02:00
Alexander Kukushkin	7cf0b753ab	Update optime/leader with checkpoint location after clean shut down (#1527 ) Potentially this information could be used in order to make sure that there is no data loss on switchover.	2020-05-15 16:13:16 +02:00
Alexander Kukushkin	80fbe90056	Issue CHEKPOINT explicitely after promote happened (#1498 ) It is safe to call pg_rewind on the replica only when pg_control on the primary contains information about the latest timeline. Postgres is usually doing immediate checkpoint right after promote and in most cases it works just fine. Unfortunately we regularly receive complaints that it takes to long (minutes) until the checkpoint is done and replicas can't perform rewind. At the same time doing the checkpoint manually immediately helped. So Patroni starts doing the same. When the promotion happened and postgres is not running in recovery, we explicitly issue the checkpoint. We are intentionally not using the AsyncExecutor here, because we want the HA loop continues doing its normal flow.	2020-04-20 11:55:05 +02:00
ksarabu1	e3335bea1a	Master stop timeout (#1445 ) ## Feature: Postgres stop timeout Switchover/Failover operation hangs on signal_stop (or checkpoint) call when postmaster doesn't respond or hangs for some reason(Issue described in [1371](https://github.com/zalando/patroni/issues/1371)). This is leading to service loss for an extended period of time until the hung postmaster starts responding or it is killed by some other actor. ### master_stop_timeout The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled. When set to > 0 and the synchronous_mode is enabled, Patroni sends SIGKILL to the postmaster if the stop operation is running for more than the value set by master_stop_timeout. Set the value according to your durability/availability tradeoff. If the parameter is not set or set <= 0, master_stop_timeout does not apply.	2020-04-15 12:18:49 +02:00
Alexander Kukushkin	613634c26b	Reset rewind state if postgres started after successful pg_rewind (#1408 ) Close https://github.com/zalando/patroni/issues/1406	2020-02-27 12:24:17 +01:00
Alexander Kukushkin	4a29caa9d3	On role change callback didn't fire on failed primary (#1420 ) Bug was introduced in https://github.com/zalando/patroni/pull/703 Close https://github.com/zalando/patroni/issues/1418	2020-02-27 12:22:44 +01:00
Alexander Kukushkin	16d1ffdde7	Update timeline on standby cluster (#1332 ) Fixes https://github.com/zalando/patroni/issues/1031	2019-12-20 12:56:00 +01:00
Igor Yanchenko	726ee46111	Implemented patroni --version (#1291 ) That required a refactoring of `Config` and `Patroni` classes. Now one has to explicitely create the instance of `Config` before creating `Patroni`. The Config file can optionally call the validate function.	2019-12-02 12:14:19 +01:00
Alexander Kukushkin	412c720d3a	Avoid importing all DCS modules (#1286 ) We will try to import only the module which has a configuration section. I.e. if there is only zookeeper section in the config, Patroni will try to import only `patroni.dcs.zookeeper` and skip `etcd`, `consul`, and `kubernetes`. This approach has two benefits: 1. When there are no dependencies installed Patroni was showing INFO messages `Failed to import smth`, which looks scary. 2. It reduces memory usage, because sometimes dependencies are heavy.	2019-11-21 14:39:37 +01:00
Alexander Kukushkin	5ea73d50ed	Make it possible to apply some recovery params without restart (#1260 ) Starting from PostgreSQL 12 the following recovery parameters could be changed without restart, but Patroni didn't yet support it: * archive_cleanup_command * promote_trigger_file * recovery_end_command * recovery_min_apply_delay In future postgres releases this list will be extended and Patroni will support it automatically.	2019-11-11 16:18:23 +01:00
Alexander Kukushkin	863aed314b	Fix race conditions in async actions (#1215 ) Specifically, there was a chance that `patronictl reinit --force` was overwritten by recover and we end up in a situation when Patroni was trying to start the postgres while basebackup still running.	2019-10-11 10:17:02 +02:00
Alexander Kukushkin	b666f5e4ed	Refactor Patroni REST API communication (#1197 ) * make it possible to use client certificates with REST API * define a separate PatroniRequest class which handles all communication * refactor patronictl to use the new class * make Ha to use the new class instead of calling requests.get. The old call wasn't taking into account certificates and basic-auth Close #898	2019-10-11 10:16:33 +02:00
Alexander Kukushkin	1572c02ced	Use passfile in the primary_conninfo instead of password (#1194 ) Fixed a few minor issues related to the #1134 and #1122 Close https://github.com/zalando/patroni/issues/1185	2019-10-09 18:04:14 +02:00
Alexander Kukushkin	a4bd6a9b4b	Refactor postgresql class (#1060 ) * Convert postgresql.py into a package * Factor out cancellable process into a separate class * Factor out connection handler into a separate class * Move postmaster into postgresql package * Factor out pg_rewind into a separate class * Factor out bootstrap into a separate class * Factor out slots handler into a separate class * Factor out postgresql config handler into a separate class * Move callback_executor into postgresql package This is just a careful refactoring, without code changes.	2019-05-21 16:02:47 +02:00
Alexander Kukushkin	e54dfa508d	Consider sync node as a healthy even when the former leader is ahead (#1059 ) Fixes https://github.com/zalando/patroni/issues/1054	2019-05-13 16:32:53 +02:00
Alexander Kukushkin	4b48653d09	More standby cluster bugfixes (#1053 ) 1. use the default port is 5432 when only standby_cluster.host is defined 2. check that standby_cluster replica can be bootstrapped without connection to the standby_cluster leader against `create_replica_methods` defined in the `standby_cluster` config instead of the `postgresql` section. 3. Don't fallback to the create_replica_methods defined in the `postgresql` section when bootstrapping a member of the standby cluster. 4. Make sure we specify the database when connecting to the leader.	2019-05-13 14:19:22 +02:00
Alexander Kukushkin	bba9066315	Make it possible to run pg_rewind without superuser on pg11+ (#1035 ) * expose the current patroni version in DCS * expose `checkpoint_after_promote` flag in DCS as an indicator that pg_rewind could be safely executed * other nodes will wait until this flag is set instead of connecting as superuser and issuing the CHECKPOINT * define `postgresql.authention.rewind` with credentials for pg_rewind in patroni configuration files. * create user for pg_rewind if postgres is 11+ * grant execute on functions required for pg_rewind to rewind user	2019-05-02 14:07:26 +02:00
Alexander Kukushkin	f0b784fe7f	Manage pg_ident.conf with Patroni (#1037 ) This functionality works similarly to the `pg_hba`: If the `postgresql.pg_ident` is defined in the config file or DCS, Patroni will write its value to pg_ident.conf, however, if `postgresql.parameters.ident_file` is defined, Patroni will assume that pg_ident is managed from outside and not update the file.	2019-04-23 16:16:53 +02:00
Alexander Kukushkin	e38fe78b56	Fix callbacks behavior (mostly for standby cluster) (#998 ) First of all, this patch changes the behavior of `on_start`/`on_restart` callbacks, they will be called only when postgres is started or restarted without role changes. In case if the member is promoted or demoted only the `on_role_change` callback will be executed. `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument. Before that `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument. In addition to that, the REST API will return standby_leader role for the leader of the standby cluster. Closes https://github.com/zalando/patroni/issues/988	2019-03-29 10:28:07 +01:00

1 2 3 4

187 Commits