2020 Commits

Author SHA1 Message Date
Alexander Kukushkin
76ed5784bb arm64 compatibility 2022-06-24 11:36:20 +02:00
Lev Kozlov
b8a6387236 Bump Postgres version in Dockerfile to 14 (#2333) 2022-06-13 15:26:01 +02:00
monsterxx03
c7ee5f008d Handle expired token for etcd lease_grant (#2331) (#2332)
Close #2331
2022-06-13 14:58:11 +02:00
Michael Banck
a77fbb1912 Fix markup - the -status is part of the command (#2323) 2022-06-13 14:57:28 +02:00
Alexander Kukushkin
fb06af9adb Release 2.1.4 (#2322)
- bump version
- update release notes
- implement missing unit-tests
v2.1.4
2022-06-01 16:00:56 +02:00
Dennis4b
b42550aad4 Add /read-only-sync endpoint (#2305) (#2311)
`/read-only-sync` mirrors `/read-only`, but only returns `200` on a replica if this replica is a synchronous standby.
2022-05-30 17:09:43 +02:00
Alexander Kukushkin
f67174d7cc Use replication credential in divergance check only with v10 and older (#2308)
and document in which case pg_hba.conf should allow access to "postgres" database with replication credentials.

Close #2261
2022-05-20 10:24:49 +02:00
Alexander Kukushkin
4c14b302f6 Remove [] characters from IPv6 hosts when splitting to host and port (#2309)
Close #2269
2022-05-20 10:24:25 +02:00
Alexander Kukushkin
729f1dddc8 Compatibility with PostgreSQL 15 beta1 (#2299)
* update postgresql/validator.py
* pg_rewind doesn't like if there are unix sockets in PGDATA
* pg_rewind now supports --config-file option
2022-05-19 15:36:09 +02:00
Alexander Kukushkin
ad3d953410 K8s: reset watchers if PATCH fails with 409 (#2283)
High CPU load on Etcd nodes and K8s API servers created a very strange situation. A few clusters were running without a leader and the pod which is ahead of others was failing to take a leader lock because updates were failing with HTTP response code `409` (`resource_version` mismatch).

Effectively that means that TCP connections to K8s master nodes were alive (in the opposite case tcp keepalives would have resolved it), but no `UPDATE` events were arriving via these connections, resulting in the stale cache of the cluster in memory.

The only good way to prevent this situation is to intercept 409 HTTP responses and terminate existing TCP connections used for watches.

Now a few words about implementation. Unfortunately, watch threads are waiting in the read() call most of the time and there is no good way to interrupt them. But, the `socket.shutdown()` seems to do this job. We already used this trick in the Etcd3 implementation.

This approach will help to mitigate the issue of not having a leader, but at the same time replicas might still end up with the stale cluster state cached and in the worst case will not stream from the leader. Non-streaming replicas are less dangerous and could be covered by monitoring and partially mitigated by correctly configured `archive_command` and `restore_command`.
2022-05-19 15:24:20 +02:00
Alexander Kukushkin
ef3401c17f Don't reset slots annotation if postgres isn't ready (#2306)
The current state of permanent logical replication slots on the primary is queried together with `pg_current_wal_lsn()` and hence they "fail" simultaneously if Postgres isn't yet ready for accepting connections and in this case we want to avoid updating the `/status` key altogether.

On K8s we don't use a dedicated object for the `/status` key, but use the same object (Endpoint or ConfigMap) as for the leader. If the `last_lsn` isn't set we avoid patching the corresponding annotation, but, the `slots` annotation was reset due to the oversight.
2022-05-19 15:06:59 +02:00
Alexander Kukushkin
496d14e6ca Better handling of failed pg_rewind attempt (#2304)
Close #2302
2022-05-19 14:52:26 +02:00
Alexander Kukushkin
6e8b2ce0a4 Don't try to run crash recovery if postgres is running (#2298)
It was a minor oversight of #2252 and didn't get to a release
2022-05-19 13:46:49 +02:00
Alexander Kukushkin
96b75fa7cb Special handling of check_recovery_conf for v12+ (#2292)
When starting as a replica it may take some time before Postgres starts accepting new connections, but meanwhile, it could happen that the leader transitioned to a different member and the `primary_conninfo` must be updated.

On pre v12 Patroni regularly checks `recovery.conf` in order to check that recovery parameters match the expectation. Starting from v12 recovery parameters were converted to GUC's and Patroni gets current values from the `pg_settings` view. The last one creates a problem when it takes more than a minute for Postgres to start accepting new connections.

Since Patroni attempts to execute at least `pg_is_in_recovery()` every HA loop, and it is raising at exception, the `check_recovery_conf()` effectively wasn't reachable until recovery is finished, but it changed when #2082 was introduced.

As a result of #2082 we got the following behavior:
1. Up to v12 (not including) everything was working as expected
2. v12 and v13 - Patroni restarting Postgres after 1m of recovery
3. v14+ - the `check_recovery_conf()` is not executed because the `replay_paused()` method raising an exception.

In order to properly handle changes of recovery parameters or leader transitioned to a different node on v12+, we will rely on the cached values of recovery parameters until Postgres becomes ready to execute queries.

Close https://github.com/zalando/patroni/issues/2289
2022-05-12 07:45:49 +02:00
Alexander Kukushkin
b901e62ad0 Enhanced checks of replica logical slots safety (#2285)
The logical slot on a replica is safe to use when the physical replica
slot on the primary:
1. has a nonzero/non-null `catalog_xmin`
2. has a `catalog_xmin` that is not newer (greater) than the  `catalog_xmin` of any slot on the standby
3. the `catalog_xmin` is known to overtake `catalog_xmin` of logical slots on the primary observed during `1`

In case if `1` doesn't take place, Patroni will run an additional check whether the `hot_standby_feedback` is actually in effect and shows the warning in case it is not.
2022-05-10 12:24:47 +02:00
haslersn
7a2d6dc3c0 Ensure that optime annotation is a string (#2291)
Fixes #2290
2022-05-10 09:20:39 +02:00
Haitao Li
aa0cd48060 k8s: Support refreshing service account tokens (#2287)
Since Kubernetes v1.21, with projected service account token feature, service account tokens expire in 1 hour. Kubernetes clients are expected to reread the token file to refresh the token.

This patch re-reads the token file very minute for in-cluster config.

Fixes #2286

Signed-off-by: Haitao Li <hli@atlassian.com>
2022-05-05 17:35:06 +02:00
Alexander Kukushkin
5f6197aaad Don't copy logical slot if there is mismatch with the config (#2274)
A couple of times we have seen in the wild that the database for the permanent logical slots was changed in the Patroni config.

It resulted in the below situation.
On the primary:
1. The slot must be dropped before creating it in a different DB.
2. Patroni fails to drop it because the slot is in use.

Replica:
1. Patroni notice that the slot exists in the wrong DB and successfully dropping it.
2. Patroni copying the existing slot from the primary by its name with Postgres restart.

And the loop repeats while the "wrong" slot exists on the primary.

Basically, replicas are continuously restarting, which badly affects availability.

In order to solve the problem, we will perform additional checks while copying replication slot files from the primary and discard them if `slot_type`, `database`, or `plugin` don't match our expectations.
2022-04-14 12:10:37 +02:00
Alexander Kukushkin
aea0589404 Switch to boto3 (#2275)
Close https://github.com/zalando/patroni/issues/2237
2022-04-14 10:47:16 +02:00
zejeanmi
40b5db4b85 Add ppc64le and fix inverted IOC_READ/WRITE vars (#2271)
Close #2265
2022-04-14 10:46:01 +02:00
Wesley Mendes
e491edd1bf Fixes missing import of dateutil.parser (#2259)
Close #2258
2022-04-14 10:45:00 +02:00
James Stroud
0057f9018b Spell out DCS (#2228) 2022-03-24 13:58:10 +01:00
grembo
c4e208ec50 Allow setting TLSServerName on consul service checks (#2231)
See also https://www.consul.io/api-docs/agent/check#tlsservername
Useful in case checks are done by IP and the consul `node_name` is not an FQDN.
2022-03-24 13:57:17 +01:00
Gunnar "Nick" Bluth
7626b5fef8 Fix pg_rewind on typical Debian/Ubuntu systems (#2225)
On Debian/Ubuntu systems it is common to keep Postgres config files outside of the data directory.
It created a couple of problems for pg_rewind support in Patroni.
1. The `--config_file` argument must be supplied while figuring out the `restore_command` GUC value on Postgres v12+
2. With Postgres v13+ pg_rewind by itself can't find postgresql.conf in order to figure out `restore_command` and therefore we have to use Patroni as a fallback for fetching missing WAL's that are required for rewind.

This commit addresses both problems.
2022-03-24 13:56:16 +01:00
Alexander Kukushkin
81912c9cae Handle rewind when demoted node was shut down (#2252)
In case of DCS unavailability Patroni restarts Postgres in read-only.
It will cause pg_control to be updated with the `Database cluster state: in archive recovery` and also could set the `MinRecoveryPoint`.

When the next time Patroni is started it will assume that Postgres was running as a replica and rewind isn't required and will try to start the Postgres up. In this situation there is the chance that the start will be aborted with the FATAL error message that looks like `requested timeline 2 does not contain minimum recovery point 0/501E8B8 on timeline 1`.
On the next heart-beat Patroni will again notice that Postgres isn't running which would lead to another start-fail attempt.

This loop is endless.

In order to mitigate the problem we do the following:
1. While figuring out whether the rewind is required we consider `in archive recovery` along with `shut down in recovery`.
2. If pg_rewind is required and the cluster state is `in archive recovery` we also perform recovery in a single-user mode.

Close https://github.com/zalando/patroni/issues/2242
2022-03-24 13:51:59 +01:00
Alexander Kukushkin
333d41d9f0 Release 2.1.3 (#2219)
* Implement missing unit-tests
* Bump version
* Update release notes
v2.1.3
2022-02-18 14:16:15 +01:00
Alexander Kukushkin
aa91557a80 Fix bug in divergence timeline check (#2221)
Patroni was falsely assuming that timelines have diverged.
For pg_rewind it didn't create any problem, but if pg_rewind is not allowed and the `remove_data_directory_on_diverged_timelines` is set, it resulted in reinitializing the former leader.

Close https://github.com/zalando/patroni/issues/2220
2022-02-17 15:53:13 +01:00
Hrvoje Milković
075918d447 Fixed AttributeError no attribute 'leader' (#2217)
Close https://github.com/zalando/patroni/issues/2218
2022-02-16 10:20:15 +01:00
Michael Banck
c4535ae208 Avoid running CHECKPOINT on remote master if credentials are missing (#2195)
Close #2194
2022-02-14 15:21:51 +01:00
Bastien Wirtz
38d84b1d15 Make sure no substitution attemps is made when params is empty. (#2212)
Close #2209
2022-02-14 15:20:38 +01:00
Michael Banck
2d15e0dae6 Add target_session_attrs=read-write to standby_leader primary_conninfo (#2193)
This allows to have multiple hosts in a standby_cluster and ensures that the standby leader follows the main cluster's new leader after a switchover.

Partially addresses #2189
2022-02-10 15:50:14 +01:00
Michael Banck
48d8c13e6b Write pgpass line per host if more than one is specified in connstr (#2192)
Partly addresses #2189
2022-02-10 15:40:24 +01:00
Alexander Kukushkin
d3e3b4e16f Minor tuning of tests (#2201)
- Reduce verbosity for unit tests
- Refactor GH actions config and try again macos behave tests
2022-02-10 15:38:16 +01:00
Alexandre Pereira
afab392ead Add metrics (#2199)
This PR adds metrics for additional information : 
  - If a node or cluster is pending restart,
  - If the cluster management is paused. 

This may be useful for Prometheus/Grafana monitoring.
Close #2198
2022-02-10 15:37:14 +01:00
Alexander Kukushkin
291754eeb0 Don't remove the leader lock while paused (#2187)
Close https://github.com/zalando/patroni/issues/2179
2022-02-10 15:36:25 +01:00
Alexander Kukushkin
cdc80a1d89 Restart etcd3 watcher if all etcd nodes don't respond (#2186)
Close https://github.com/zalando/patroni/issues/2180
2022-02-10 15:32:29 +01:00
Alexander Kukushkin
04c6f58b2b Make Kubernetes.cancel_initialization() method similar to other DCS (#2210)
I.e., do delete unconditionally and return the success
2022-02-10 15:29:29 +01:00
Ants Aasma
0980838cb3 Fix port in use error on certificate replacement (#2185)
When switching certificates there is a race condition with a concurrent API request. If there is one active during the replacement period then the replacement will error out with a port in use error and Patroni gets stuck in a state without an active API server.

Fix is to call server_close after shutdown which will wait for already running requests to complete before returning.

Close #2184
2022-01-26 13:52:25 +01:00
Alexander Kukushkin
3e1076a574 Use replication credentials when checking leader status (#2165)
It could be that `remove_data_directory_on_diverged_timelines` is set, but there is no `rewind_credentials` defined and superuser access between nodes is not allowed.

Close https://github.com/zalando/patroni/issues/2162
2022-01-11 16:23:13 +01:00
Alexander Kukushkin
cb3071adfb Annual cleanup (#2159)
-  Simplify setup.py: remove unneeded features and get rid of deprecation warnings
-  Compatibility with Python 3.10: handle `threading.Event.isSet()` deprecation
-  Make sure setup.py could run without `six`: move Patroni class and main function to the `__main__.py`. The `__init__.py` will have only a few functions used by the Patroni class and from the setup.py
2022-01-06 10:20:31 +01:00
Alexander Kukushkin
bf354aeebd Compatibility with legacy psycopg2 (#2158)
For example, psycopg2 installed from Ubuntu 18.04 packages doesn't have `UndefinedFile` exception yet.
2022-01-06 10:14:50 +01:00
Alexander Kukushkin
01d40a4a13 Compatibility with latest psutil and setuptools (#2155)
Issues don't affect Patroni code, only unit-tests
2022-01-05 09:53:33 +01:00
Alexander Kukushkin
3cc14cc059 Unquote integers in validator (#2154)
Close https://github.com/zalando/patroni/issues/2150
2022-01-04 10:47:02 +01:00
Alexander Kukushkin
a015e0e271 Fix bug with failover to cascading standby (#2138)
When figuring out which slots should be created on cascading standby we forgot to take into account that the leader might be absent.

Close: https://github.com/zalando/patroni/issues/2137
2021-12-21 11:20:35 +01:00
Alexander Kukushkin
d2b681b07e Fix bug in the bootstrap standby-leader (#2144)
When starting postgres after bootstrap of the standby-leader the `follow()` method is used to always return `True`.
This behavior was changed in the #2054 in order to avoid hammering logs if postgres is failing to start.

Since now the method returns `None` if postgres didn't start accepting connections after 60s, the change broke the standby-leader bootstrap code.

As the solution, we will assume that the clone was successful if the `follow()` method returned anything different from `False`.
2021-12-21 11:20:06 +01:00
Alexander Kukushkin
63586f0477 Add ctl.keyfile_password support (#2145)
It compliments restapi.keyfile_password added in the #1825
2021-12-21 11:19:39 +01:00
Alexander Kukushkin
4215565cb4 Rearrange tests (#2146)
- remove codacy steps: they removed legacy organizations and there seems to be no easy way of installing codacy app to the Zalando GH.
- Don't run behave on MacOS: recently worker became way to slow
- Disable behave for combination of kubernetes and python 2.7
- Remove python 3.5 (it will be removed by GH from workers in January) and add 3.10
- Run behave with 3.6 and 3.9 instead of 3.5 and 3.8
2021-12-21 09:36:22 +01:00
Alexander Kukushkin
dc9ff4cb8a Release 2.1.2 (#2136)
* Implement missing unit-tests
* Bump version
* Update release notes
v2.1.2
2021-12-03 15:49:57 +01:00
Alexander Kukushkin
d7dc3c2d96 Handle missing timelines in history file when deciding to rewind (#2120)
When restore_command is configured Postgres is trying to fetch/apply all possible WAL segments and also fetch history files in order to select the correct timeline. It could result in a situation where the new history file will be missing some timelines.

Example:
- node1 demotes/crashes on timeline 1
- node2 promotes to timeline 2 and archives `00000002.history` and crashes
- node1 recovers as a replica, "replays" `00000002.history` and promotes to timeline 3

As a result, the `00000003.history` will not have the line with timeline 2, because it never replayed any WAL segment from it.
The `pg_rewind` tool is supposed to correctly handle such case when rewinding node2 from node1, but Patroni when deciding whether the rewind should happen was searching for the exact timeline in the history file from the new primary.

The solution is to assume that rewind is required if the current replica timeline is missing.

In addition to that this PR makes sure that the primary isn't running in recovery before starting the procedure of rewind check.

Close https://github.com/zalando/patroni/issues/2118 and https://github.com/zalando/patroni/issues/2124
2021-12-02 11:35:30 +01:00
Michael Banck
90b3736fec Remove duplicate hosts from the etcd machine cache (#2127)
Close #2126
2021-12-02 11:35:11 +01:00