patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-28 02:20:04 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	fb06af9adb	Release 2.1.4 (#2322 ) - bump version - update release notes - implement missing unit-tests	2022-06-01 16:00:56 +02:00
Dennis4b	b42550aad4	Add /read-only-sync endpoint (#2305 ) (#2311 ) `/read-only-sync` mirrors `/read-only`, but only returns `200` on a replica if this replica is a synchronous standby.	2022-05-30 17:09:43 +02:00
Alexander Kukushkin	729f1dddc8	Compatibility with PostgreSQL 15 beta1 (#2299 ) * update postgresql/validator.py * pg_rewind doesn't like if there are unix sockets in PGDATA * pg_rewind now supports --config-file option	2022-05-19 15:36:09 +02:00
Alexander Kukushkin	ad3d953410	K8s: reset watchers if PATCH fails with 409 (#2283 ) High CPU load on Etcd nodes and K8s API servers created a very strange situation. A few clusters were running without a leader and the pod which is ahead of others was failing to take a leader lock because updates were failing with HTTP response code `409` (`resource_version` mismatch). Effectively that means that TCP connections to K8s master nodes were alive (in the opposite case tcp keepalives would have resolved it), but no `UPDATE` events were arriving via these connections, resulting in the stale cache of the cluster in memory. The only good way to prevent this situation is to intercept 409 HTTP responses and terminate existing TCP connections used for watches. Now a few words about implementation. Unfortunately, watch threads are waiting in the read() call most of the time and there is no good way to interrupt them. But, the `socket.shutdown()` seems to do this job. We already used this trick in the Etcd3 implementation. This approach will help to mitigate the issue of not having a leader, but at the same time replicas might still end up with the stale cluster state cached and in the worst case will not stream from the leader. Non-streaming replicas are less dangerous and could be covered by monitoring and partially mitigated by correctly configured `archive_command` and `restore_command`.	2022-05-19 15:24:20 +02:00
Alexander Kukushkin	496d14e6ca	Better handling of failed pg_rewind attempt (#2304 ) Close #2302	2022-05-19 14:52:26 +02:00
Alexander Kukushkin	96b75fa7cb	Special handling of check_recovery_conf for v12+ (#2292 ) When starting as a replica it may take some time before Postgres starts accepting new connections, but meanwhile, it could happen that the leader transitioned to a different member and the `primary_conninfo` must be updated. On pre v12 Patroni regularly checks `recovery.conf` in order to check that recovery parameters match the expectation. Starting from v12 recovery parameters were converted to GUC's and Patroni gets current values from the `pg_settings` view. The last one creates a problem when it takes more than a minute for Postgres to start accepting new connections. Since Patroni attempts to execute at least `pg_is_in_recovery()` every HA loop, and it is raising at exception, the `check_recovery_conf()` effectively wasn't reachable until recovery is finished, but it changed when #2082 was introduced. As a result of #2082 we got the following behavior: 1. Up to v12 (not including) everything was working as expected 2. v12 and v13 - Patroni restarting Postgres after 1m of recovery 3. v14+ - the `check_recovery_conf()` is not executed because the `replay_paused()` method raising an exception. In order to properly handle changes of recovery parameters or leader transitioned to a different node on v12+, we will rely on the cached values of recovery parameters until Postgres becomes ready to execute queries. Close https://github.com/zalando/patroni/issues/2289	2022-05-12 07:45:49 +02:00
Alexander Kukushkin	b901e62ad0	Enhanced checks of replica logical slots safety (#2285 ) The logical slot on a replica is safe to use when the physical replica slot on the primary: 1. has a nonzero/non-null `catalog_xmin` 2. has a `catalog_xmin` that is not newer (greater) than the `catalog_xmin` of any slot on the standby 3. the `catalog_xmin` is known to overtake `catalog_xmin` of logical slots on the primary observed during `1` In case if `1` doesn't take place, Patroni will run an additional check whether the `hot_standby_feedback` is actually in effect and shows the warning in case it is not.	2022-05-10 12:24:47 +02:00
Haitao Li	aa0cd48060	k8s: Support refreshing service account tokens (#2287 ) Since Kubernetes v1.21, with projected service account token feature, service account tokens expire in 1 hour. Kubernetes clients are expected to reread the token file to refresh the token. This patch re-reads the token file very minute for in-cluster config. Fixes #2286 Signed-off-by: Haitao Li <hli@atlassian.com>	2022-05-05 17:35:06 +02:00
Alexander Kukushkin	5f6197aaad	Don't copy logical slot if there is mismatch with the config (#2274 ) A couple of times we have seen in the wild that the database for the permanent logical slots was changed in the Patroni config. It resulted in the below situation. On the primary: 1. The slot must be dropped before creating it in a different DB. 2. Patroni fails to drop it because the slot is in use. Replica: 1. Patroni notice that the slot exists in the wrong DB and successfully dropping it. 2. Patroni copying the existing slot from the primary by its name with Postgres restart. And the loop repeats while the "wrong" slot exists on the primary. Basically, replicas are continuously restarting, which badly affects availability. In order to solve the problem, we will perform additional checks while copying replication slot files from the primary and discard them if `slot_type`, `database`, or `plugin` don't match our expectations.	2022-04-14 12:10:37 +02:00
Alexander Kukushkin	aea0589404	Switch to boto3 (#2275 ) Close https://github.com/zalando/patroni/issues/2237	2022-04-14 10:47:16 +02:00
Alexander Kukushkin	81912c9cae	Handle rewind when demoted node was shut down (#2252 ) In case of DCS unavailability Patroni restarts Postgres in read-only. It will cause pg_control to be updated with the `Database cluster state: in archive recovery` and also could set the `MinRecoveryPoint`. When the next time Patroni is started it will assume that Postgres was running as a replica and rewind isn't required and will try to start the Postgres up. In this situation there is the chance that the start will be aborted with the FATAL error message that looks like `requested timeline 2 does not contain minimum recovery point 0/501E8B8 on timeline 1`. On the next heart-beat Patroni will again notice that Postgres isn't running which would lead to another start-fail attempt. This loop is endless. In order to mitigate the problem we do the following: 1. While figuring out whether the rewind is required we consider `in archive recovery` along with `shut down in recovery`. 2. If pg_rewind is required and the cluster state is `in archive recovery` we also perform recovery in a single-user mode. Close https://github.com/zalando/patroni/issues/2242	2022-03-24 13:51:59 +01:00
Alexander Kukushkin	333d41d9f0	Release 2.1.3 (#2219 ) * Implement missing unit-tests * Bump version * Update release notes	2022-02-18 14:16:15 +01:00
Alexander Kukushkin	aa91557a80	Fix bug in divergence timeline check (#2221 ) Patroni was falsely assuming that timelines have diverged. For pg_rewind it didn't create any problem, but if pg_rewind is not allowed and the `remove_data_directory_on_diverged_timelines` is set, it resulted in reinitializing the former leader. Close https://github.com/zalando/patroni/issues/2220	2022-02-17 15:53:13 +01:00
Michael Banck	2d15e0dae6	Add target_session_attrs=read-write to standby_leader primary_conninfo (#2193 ) This allows to have multiple hosts in a standby_cluster and ensures that the standby leader follows the main cluster's new leader after a switchover. Partially addresses #2189	2022-02-10 15:50:14 +01:00
Ants Aasma	0980838cb3	Fix port in use error on certificate replacement (#2185 ) When switching certificates there is a race condition with a concurrent API request. If there is one active during the replacement period then the replacement will error out with a port in use error and Patroni gets stuck in a state without an active API server. Fix is to call server_close after shutdown which will wait for already running requests to complete before returning. Close #2184	2022-01-26 13:52:25 +01:00
Alexander Kukushkin	cb3071adfb	Annual cleanup (#2159 ) - Simplify setup.py: remove unneeded features and get rid of deprecation warnings - Compatibility with Python 3.10: handle `threading.Event.isSet()` deprecation - Make sure setup.py could run without `six`: move Patroni class and main function to the `__main__.py`. The `__init__.py` will have only a few functions used by the Patroni class and from the setup.py	2022-01-06 10:20:31 +01:00
Alexander Kukushkin	bf354aeebd	Compatibility with legacy psycopg2 (#2158 ) For example, psycopg2 installed from Ubuntu 18.04 packages doesn't have `UndefinedFile` exception yet.	2022-01-06 10:14:50 +01:00
Alexander Kukushkin	01d40a4a13	Compatibility with latest psutil and setuptools (#2155 ) Issues don't affect Patroni code, only unit-tests	2022-01-05 09:53:33 +01:00
Alexander Kukushkin	dc9ff4cb8a	Release 2.1.2 (#2136 ) * Implement missing unit-tests * Bump version * Update release notes	2021-12-03 15:49:57 +01:00
Alexander Kukushkin	d7dc3c2d96	Handle missing timelines in history file when deciding to rewind (#2120 ) When restore_command is configured Postgres is trying to fetch/apply all possible WAL segments and also fetch history files in order to select the correct timeline. It could result in a situation where the new history file will be missing some timelines. Example: - node1 demotes/crashes on timeline 1 - node2 promotes to timeline 2 and archives `00000002.history` and crashes - node1 recovers as a replica, "replays" `00000002.history` and promotes to timeline 3 As a result, the `00000003.history` will not have the line with timeline 2, because it never replayed any WAL segment from it. The `pg_rewind` tool is supposed to correctly handle such case when rewinding node2 from node1, but Patroni when deciding whether the rewind should happen was searching for the exact timeline in the history file from the new primary. The solution is to assume that rewind is required if the current replica timeline is missing. In addition to that this PR makes sure that the primary isn't running in recovery before starting the procedure of rewind check. Close https://github.com/zalando/patroni/issues/2118 and https://github.com/zalando/patroni/issues/2124	2021-12-02 11:35:30 +01:00
Alexander Kukushkin	63ee42a85c	Clear event on the leader node when /status was updated (#2125 ) Not doing so causing excessive HA loop runs with Zookeeper. This moment wasn't fixed correctly in the #1875	2021-11-30 16:33:38 +01:00
Alexander Kukushkin	17e523b175	Optimize checkpoint after promote (#2114 ) 1. Avoid doing CHECKPOINT if `pg_control` is already updated. 2. Explicitly call ensure_checkpoint_after_promote() right after the bootstrap finished successfully.	2021-11-19 14:33:24 +01:00
Alexander Kukushkin	fce889cd04	Compatibility with psycopg 3.0 (#2088 ) By default `psycopg2` is preferred. The `psycopg>=3.0` will be used only if `psycopg2` is not available or its version is too old.	2021-11-19 14:32:54 +01:00
Alexander Kukushkin	00d125c512	Avoid unnecessary updates of the members ZNode. (#2115 ) When deciding whether the ZNode should be updated we rely on the cached version of the cluster, which is updated only when members ZNodes are deleted/created or the `/status`, `/sync`, `/failover`, `/config`, or `/history` ZNodes are updated. I.e. after the update of the current member ZNode succeeded the cache becomes stale and all further updates are always performed even if the value didn't change. In order to solve it, we introduce the new attribute in the Zookeeper class and will use it for memorizing the actual value and for later comparison.	2021-11-12 15:00:54 +01:00
Alexander Kukushkin	250328b84b	Use cached role as a fallback when postgres is slow (#2082 ) In some extreme cases Postgres could be so slow that the normal monitoring query doesn't finish in a few seconds. It results in the exception being raised from the `Postgresql._cluster_info_state_get()` method, which could lead to the situation that postgres isn't demoted on time. In order to make it reliable we will catch the exception and use the cached state of postgres (`is_running()` and `role`) to determine whether postgres is running as a primary. Close https://github.com/zalando/patroni/issues/2073	2021-10-07 16:08:21 +02:00
Alexander Kukushkin	89388c2e4b	Handle DCS exceptions when demoting (#2081 ) While doing demote due to failure to update leader lock it could happen that DCS goes completely down and the get_cluster() call raise the exception. Not being properly handled it results in postgres remaining stopped until DCS recovers.	2021-10-07 16:08:10 +02:00
Alexander Kukushkin	d394b63c9f	Release the leader lock when pg_controldata reports "shut down" (#2067 ) Due to different reasons, it could happen that WAL archiving on the primary stuck or significantly delayed. If we try to do a switchover or shut it down, the shutdown will take forever and will not finish until the whole backlog of WALs is processed. In the meantime, Patroni keeps updating the leader lock, which prevents other nodes from starting the leader race even if it is known that they received/applied all changes. The `Database cluster state:` is changed to `"shut down"` after: - all data is fsynced to disk and the latest checkpoint is written to WAL - all streaming replicas confirmed that they received all changes (including the latest checkpoint) - at the same time, the archiver process continues to do its job and the postmaster process is still running. In order to solve this problem and make the switchover more reliable/fast in a case when `archive_command` is slow/failing, Patroni will remove the leader key immediately after `pg_controldata` started reporting PGDATA as `"shut down"` cleanly and it verified that there is at least one replica that received all changes. If there are no replicas that fulfill the condition the leader key isn't removed and the old behavior is retained, i.e. Patroni will keep updating it.	2021-10-05 10:55:35 +02:00
Alexander Kukushkin	1c2bf258d6	Allow switchover only to sync nodes when synchronous replication is on (#2076 ) Close https://github.com/zalando/patroni/issues/2074	2021-10-04 16:23:45 +02:00
Michael Banck	2f31e88bdc	Add dcs_last_seen field to API (#2051 ) This field notes the last time (as unix epoch) a cluster member has successfully communicated with the DCS. This is useful to identify and/or analyze network partitions. Also, expose dcs_last_seen in the MemberStatus class and its from_api_response() method.	2021-09-22 10:01:35 +02:00
Michael Banck	fae96b3148	Improve "I am" status messages (#2056 )	2021-09-17 14:46:07 +02:00
Alexander Kukushkin	93efa91bbd	Release 2.1.1 (#2039 ) * Update release notes * Bump version * Improve unit-test coverage	2021-08-19 15:44:37 +02:00
Tommy Li	ed0e308b9b	Support dynamically registering/deregistering as a consul service and changing tags (#1993 ) Close #1988	2021-08-17 16:38:10 +02:00
DavidPavlicek	195b8bf049	Support for ETCD SRV name suffix (#2029 ) Add support for ETCD SRV name suffix as per description in ETCD dosc: > The -discovery-srv-name flag additionally configures a suffix to the SRV name that is queried during discovery. Use this flag to differentiate between multiple etcd clusters under the same domain. For example, if discovery-srv=example.com and -discovery-srv-name=foo are set, the following DNS SRV queries are made: > > _etcd-server-ssl-foo._tcp.example.com > _etcd-server-foo._tcp.example.com All test passes, but not been tested on the live ETCD system yet... Please, take a look and send feedback. Resolves #2028	2021-08-13 15:49:01 +02:00
Christian Clauss	75e52226a8	Fix typos discovered by codespell (#1997 )	2021-07-06 10:01:30 +02:00
Alexander Kukushkin	62aa1333cd	Implemented allowlist for REST API (#1959 ) If configured, only IPs that matching rules would be allowed to call unsafe endpoints. In addition to that, it is possible to automatically include IPs of members of the cluster to the list. If neither of the above is configured the old behavior is retained. Partially address https://github.com/zalando/patroni/issues/1734	2021-07-05 09:43:56 +02:00
Alexander Kukushkin	333d292eb3	Handle DNS issues in Raft implementation (#1960 ) - Resolve Node IP for every connection attempt - Handle exception with connection failures due to failed resolve - Set PySyncObj DNS Cache timeouts aligned with `loop_wait` and `ttl` In addition to that, postpone the leader race for freshly started Raft nodes. It will help with the situation when the leader node was alone and demoted the Postgres and after that, the replica arrives, and quickly takes the leader lock without really performing the leader race. Close https://github.com/zalando/patroni/issues/1930, https://github.com/zalando/patroni/issues/1931	2021-07-05 09:30:31 +02:00
Alexander Kukushkin	0ceb59b49d	Write prev LSN to before checkpoint to optime if wal_achive=on (#1889 ) The #1527 introduced a feature of updating `/optime/leader` with the location of the last checkpoint after the Postgres was shutdown cleanly. If wal archiving is enabled, Postgres always switching the WAL file before writing the checkpoint shutdown record. Normally it is not an issue, but for databases without too much write activity it could lead to the situation that the visible replication lag becomes equal to the size of a single WAL file. In fact, the previous WAL file is mostly empty and contains only a few records. Therefore it should be safe to report the LSN of the SWITCH record before the shutdown checkpoint. In order to do that, Patroni first gets the output of the pg_controldata and based on it calls pg_waldump two times: * The first call reads the checkpoint record (and verifies that this is really the shutdown checkpoint). * The next call reads the previous record and in case if it is the 'xlog switch' (for 9.3 and 9.4) or 'SWITCH' (for 9.5+), the LSN of the SWITCH record is written to the `/optime/leader`. In case of any mismatch, failure to call pg_waldump or parse its output, the old behavior is retained, i.e. `Latest checkpoint location` from the pg_controldata is used. Close https://github.com/zalando/patroni/issues/1860	2021-07-05 09:29:39 +02:00
Florian Bütler	e2d8a7d086	fix minor typo (#1991 ) close #1990	2021-07-02 08:27:17 +02:00
Alexander Kukushkin	77382e75dc	Compatibility with kazoo-2.7+ (#1982 ) Old versions of `kazoo` immediately discarded all requests to Zookeeper if the connection is in the `SUSPENDED` state. This is absolutely fine because Patroni is handling retries on its own. Starting from 2.7, kazoo started queueing requests instead of discarding and as a result, the Patroni HA loop was getting stuck until the connection to Zookeeper is reestablished, causing no demote of the Postgres. In order to return to the old behavior we override the `KazooClient._call()` method. In addition to that, we ensure that the `Postgresql.reset_cluster_info_state()` method is called even if DCS failed (the order of calls was changed in the #1820). Close https://github.com/zalando/patroni/issues/1981	2021-06-30 09:11:27 +02:00
Alexander Kukushkin	6616acff58	Postpone writing postgresql.conf when joining running Postgres 12+ (#1956 ) When joining already running Postgres, Patroni ensures that config files are set according to expectations. With recovery parameters converted to GUCs in Postgres v12 it became a little problem, because when the `Postgresql` object is being created it is not yet known where the given replica is supposed to stream from. It resulted in postgresql.conf first being written without recovery parameters, and on the next run of HA loop Patroni noticing inconsistencies and updating the config one more time. For Postgres v12 it is not a big issue, but for v13+ it resulted in interruption of streaming replication.	2021-06-30 09:11:12 +02:00
Alexander Kukushkin	f3420e2db5	Compatibility with PostgreSQL 14 (#1926 ) PostgreSQL 14 changed the behavior of replicas when certain parameters (like for example `max_connections`) are changed (increased): https://github.com/postgres/postgres/commit/15251c0a. Instead of immediately exiting Postgres 14 pauses replication and waits for actions from the operator. Since the `pg_is_wal_replay_paused()` returning `True` is the only indicator of such a change, Patroni on the replica will call the `pg_wal_replay_resume()`, which would cause either continue replication or shutdown (like previously). So far Patroni was never calling `pg_wal_replay_resume()` on its own, therefore, to remain backward compatible it will call it only for PostgreSQL 14+.	2021-06-25 13:41:45 +02:00
Alexander Kukushkin	448d703733	Explicitely request cluster version when connecting to etcd via proxy (#1974 ) Close https://github.com/zalando/patroni/issues/1971	2021-06-24 08:51:07 +02:00
Arman Jafari Tehrani	e48df9987d	Add health check on user defined tags (#1964 ) Close #1958	2021-06-23 08:30:10 +02:00
Alexander Kukushkin	f403719bb4	Reduce chattiness of Patroni logs (#1955 ) 1. When everything goes normal, only one line will be written for every run of HA loop (see examples): ``` INFO: no action. I am (postgresql0) the leader with the lock INFO: no action. I am a secondary (postgresql1) and following a leader (postgresql0) ``` 2. The `does not have lock` became a debug message. 3. The `Lock owner: postgresql0; I am postgresql1` will be shown only when stream doesn't look normal.	2021-06-22 09:13:30 +02:00
Alexander Kukushkin	03e71b6717	The /leader endpoint returns 200 if node holds the lock (#1917 ) Promoting the standby cluster requires updating load-balancer health checks, which is not very convenient and easy to forget. In order to solve it, we change the behavior of the `/leader` health-check endpoint. It will return 200 without taking into account whether PostgreSQL is running as the primary or the standby_leader.	2021-06-22 08:21:29 +02:00
Alexander Kukushkin	2d504a4f0a	Copy the logical slot over if advance failed due to the missing WAL (#1946 ) It could happen that the replica for some reason is missing the WAL file required by the replication slot. The nature of this phenomenon is a bit unclear, it might be that the WAL was recycled short before we copied the slot file, but, we still need a solution to this problem. If the `pg_replication_slot_advance()` fails with the `UndefinedFile` exception (requested WAL segment pg_wal/... has already been removed), the logical slot on the replica must be recreated.	2021-06-02 16:57:13 +02:00
Alexander Kukushkin	eaa98e71e3	Fix bug with unix socket connections (#1933 ) When the unix_socket_directories is not known Patroni was immediately going back to tcp connection via the localhost. The bug was introduced in https://github.com/zalando/patroni/pull/1865	2021-05-10 09:53:25 +02:00
Alexander Kukushkin	99626a07f2	Fix issues with raft traffic encryption (#1919 ) and run raft behave tests with encryption enabled. Using the new `pysyncobj` release allowed us to get rid of a lot of hacks with accessing private properties and methods of the parent class and reduce the size of the `raft.py`. Close https://github.com/zalando/patroni/issues/1746	2021-04-30 11:28:41 +02:00
melrifa	6d6b504cb8	Add support for patroni replication user socket connection (#1865 ) Close #1866	2021-04-20 09:43:05 +02:00
Alexander Kukushkin	9edbe7e3f7	Fix little issues with custom bootstrap (#1891 ) 1. Set hot_standby=off only when we do PITR 2. Restart postgres after PITR is done to avoid warnings 3. Address invalid config issue https://github.com/zalando/patroni/issues/1870#issuecomment-800088643	2021-03-29 08:06:12 +02:00

1 2 3 4 5 ...

735 Commits