845 Commits

Author SHA1 Message Date
Alexander Kukushkin
096ee8f36f Read GUC's values when joining running Postgres (#2876)
If restarted in pause Patroni was discarding `synchronous_standby_names` from `postgresql.conf` because in the internal cache this values was set to `None`. As a result synchronous replication transitioned to a broken state, with no synchronous replicas according to the `synchronous_standby_names` and Patroni not selecting/setting the new synchronous replicas (another bug).

To solve the problem of broken initial state and to avoid similar issues with other GUC's we will read GUC's value if Patroni is joining running Postgres.
2023-09-26 11:34:19 +02:00
Alexander Kukushkin
91e2be092c Detect and solve inconsistency between /sync and actual sync nodes (#2877)
Patroni is changing `synchronous_standby_names` and the `/sync` key in a very specific order, first we add nodes to `synchronous_standby_names` and only after, when they are recognized as synchronous they are added to the `/sync` key. When removing nodes the order is different: they are first removed from the `/sync` key and only after that from the `synchronous_standby_names`.

As a result Patroni expects that either actual synchronous nodes will match with the nodes listed in the `/sync` key or that new candidates to synchronous nodes will not match with nodes listed in the `/sync` key. In case if `synchronous_standby_names` was removed from the `postgresql.conf`, manually, or due the the bug (#2876), the state becomes inconsistent because of the wrong order of updates.

To solve inconsistent state we introduce additional checks and will update the `/sync` key with actual names of synchronous nodes (usually empty set).
2023-09-26 11:17:12 +02:00
Alexander Kukushkin
4148e0b5b2 Take into account current role when deciding on removal of member ZNode (#2884)
Patroni doesn't watch on all changes of member keys in order to not create too much load on ZooKeeper, but only subscribes to changes (ZNodes added or deleted) in the `/member` directory. Therefore when some important fields in the value are updated we remove and recreate ZNode in order to notify the leader or other members.

The leader should remove the member key only when the `checkpoint_after_promote` value is changed and replicas when the `state` is changed to/from `running`.

We don't care about the `version` field, because Patroni version can't be changed without restart, what will case ZooKeeper `session_id` to change it anyway.

This fix hopefully will reduce failures of behave tests on GH Actions.
2023-09-26 11:16:52 +02:00
Alexander Kukushkin
5ceba81269 Bugfix for GUC's values with units (#2883)
Despite being validated by `IntValidator` some GUC's couldn't be casted directly to `int` because they include suffix. Example: `128MB`.

Close https://github.com/zalando/patroni/issues/2879
2023-09-26 11:16:47 +02:00
Alexander Kukushkin
dbbe065a27 Silence annoying warnings when checking for node uniqueness (#2878)
WARNING messages are produced by `urllib3` if Patroni is quickly restarted.
Instead we will check that the node is listen on a given port. This fact is actually enough to detect names clashes, while HTTP request could raise an exception is a few other cases, what might case false negatives.

Close https://github.com/zalando/patroni/issues/2881
2023-09-26 11:16:38 +02:00
Alexander Kukushkin
ce51eb02d0 Mock request() method when running tests (#2802)
1. Unit tests should not really try accessing any resources.
2. Not doing so results in significant execution time of unit tests on Windows

In addition to that perform a request with timeout 3s. Usually this is more than enough to figure out whether resource is accessible.

Followup on #2724
2023-09-20 14:26:12 +02:00
Matt Baker
bcafe91a55 Add docstrings to patroni.postgresql.slots.py (#2778)
Also, made some small code changes to satisfy formatting and pylint.
2023-09-20 12:30:57 +02:00
Alexander Kukushkin
6e82de8751 Don't rely on pg_stat_wal_receiver when deciding on pg_rewind (#2863)
As was reported by @ants on Slack it could happen that `received_tli` is ahead of replayed timeline, therefore we should stop using it when deciding on pg_rewind if postgres is running and use only `IDENTIFY_SYSTEM` via replication connection.
2023-09-18 13:01:08 +02:00
Polina Bungina
85db209c19 Always store CMDLINE_OPTIONS config values as int (#2861) 2023-09-18 12:59:55 +02:00
Israel
564dd7e7af Fix bug in patronictl query command (#2859)
Previous to this commit `patronictl query` was working only if `-r` argument was provided to the command. Otherwise it would face issues:

* If neither `-r` nor `-m` were provided:

```
 PGPASSWORD=zalando patronictl -c postgres0.yml query -U postgres -c "SHOW PORT"
2023-09-12 17:45:38	No connection to role=None is available
```

* If only `-m` was provided:

```
$ PGPASSWORD=zalando patronictl -c postgres0.yml query -U postgres -c "SHOW PORT" -m postgresql0
2023-09-12 17:46:15	No connection to member postgresql0 is available
```

This issue was a regression introduced by `4c3e0b9382820524239d2aa4d6b95379ef1291db` through PR #2687.

Through that PR we decided to move the common logic used to check mutually exclusiveness of `--role` and `--member` arguments to `get_any_member` function.

However, previous to that change `role` variable would assume the default value of `any` in `query` method, before `get_any_member` was called, which was not the case after the change.

This commit fixes that issue by adding a handler in `get_cursor` function to `role=None`. As `role` defaulting to `any` is handled in a sub-call to `get_any_member`, we are apparently safe in `get_cursor` to return the cursor if `role=None`.

Unit tests were updated accordingly.

References: PAT-204.
2023-09-18 12:59:55 +02:00
Alexander Kukushkin
95ed90183b Don't start stopped postgres in pause (#2848)
Due to a race condition Patroni was falsely assuming that the standby should be restarted because some recovery parameters (primary_conninfo or similar) were changed.

Close https://github.com/zalando/patroni/issues/2834
2023-09-18 12:59:55 +02:00
Alexander Kukushkin
33e02e14ca Override write_leader_optime method in K8s implementation (#2850)
It is being called when postgres is already shut down cleanly but there are no healthy replicas to take it over.

Close https://github.com/zalando/patroni/issues/2837
Close https://github.com/zalando/patroni/pull/2838
2023-09-18 12:59:55 +02:00
Polina Bungina
e38d7d4e9f Return system id to the ctl list title (#2840) 2023-09-18 12:59:55 +02:00
Alexander Kukushkin
24bf2f3fa0 Fix bug with kubernetes.standby_leader_label_value (#2832)
When running with the leader lock Patroni was just setting the `role` label to `master` and effectively `kubernetes.standby_leader_label_value` feature never worked.

Now it is fixed, but in order to not introduce breaking changes we just update default value of the `standby_leader_label_value` to the `master`.
2023-09-18 12:59:55 +02:00
Alexander Kukushkin
1849bd1a56 Don't return logical slots for standby cluster (#2816)
Cluster.get_replication_slots() didn't take into account that there can not be logical replication slots in a standby cluster replicas. It was only skipping logical slots for the standby_leader, but replicas were expecting that they will have to copy them over.

Also on replicas in a standby cluster these logical slots were falsely added to the `_replication_slots` dict.
2023-09-18 12:59:55 +02:00
Alexander Kukushkin
f659edd60f Explicitly enable synchronous mode (#2820)
Close https://github.com/zalando/patroni/issues/2819

Co-authored-by: Polina Bungina <27892524+hughcapet@users.noreply.github.com>
2023-09-18 12:59:36 +02:00
Alexander Kukushkin
84aac437c1 Release v3.1.0 (#2801)
- bump pyright and resolve reported issues
- bump Patroni version
- update release notes
2023-08-03 13:02:29 +02:00
Alexander Kukushkin
01d07f86cd Set permissions for files and directories created in PGDATA (#2781)
Postgres supports two types of permissions:
1. owner only
2. group readable

By default the first one is used because it provides better security. But, sometimes people want to run a backup tool with the user that is different from postgres. In this case the second option becomes very useful. Unfortunately it didn't work correctly because Patroni was creating files with owner access only permissions.

This PR changes the behavior and permissions on files and directories that are created by Patroni will be calculated based on permissions of PGDATA. I.e., they will get group readable access when it is necessary.

Close #1899
Close #1901
2023-08-02 13:15:43 +02:00
Alexander Kukushkin
94bfea1a81 Do not fail validation for a value that is fine (#2791)
In issue #2735 it was discussed that there should be some warning around PostgreSQL parameters that do not pass validation.

This commit ensures something is logged for parameters that fail validation and therefore fall back to default values.

Close #2735
Close #2740
2023-07-31 11:35:30 +02:00
Alexander Kukushkin
01976ec10b Don't allow stale primary to win the leader race (#2787)
Consider a following situation:
1. node1 is stressed so much that Patroni heart-beat can't run regularly and the leader lock expires.
2. node2 notice that there is no leader, gets the lock, promotes, and gets to a situation like it is described in 1.
3. Patroni on node1 finally wakes up, notice that Postgres is running as a primary, but without a leader lock and "happily" acquires the lock.

That is, node1 discarded promoting of node2, and the node2 after that it will not be possible to join the node2 back to the cluster, because pg_rewind is not possible when two nodes are on the same timeline.

To partially mitigate the problem we introduce an additional timeline check. If postgres is running as primary Patroni will consider it as a perfect candidate only if timeline isn't behind the last known cluster timeline recorder in the `/history` key. If postgres timeline is behind the cluster timeline postgres will be demoted to read-only. Further behavior would depend on `maximum_lag_on_failover` and `check_timeline` settings.

Since the `/history` key isn't updated instantly after promotion, there is still a short period of time when the issue could happen, but it seems that it is close to impossible to make it more reliable.

Close https://github.com/zalando/patroni/issues/2779
2023-07-31 11:22:18 +02:00
Alexander Kukushkin
8f3ed00886 Invalidate cache if txn failed due to revision mismatch (#2783)
It was reported in #2779 that the primary was constantly logging messages like `Synchronous replication key updated by someone else`.

It happened after Patroni was stuck due to resource starvation. Key updates are performed using create_revision/mod_revision field, which value is taken from the internal cached. Hence, it is a clear symptom of stale cache.

Similar issues in K8s implementation were addressed by invalidating the cache and restarting watcher connections every time when update failed due to resource_version mismatch, so we do the same for Etcd3.
2023-07-31 10:16:19 +02:00
Alexander Kukushkin
7e89583ec7 Please new flake8 (#2789)
it stopped liking lack of space character between `,` and `\`
```python
foo,\
    bar
```
2023-07-31 09:08:46 +02:00
Alexander Kukushkin
238aba3956 Fix patronictl list (#2775)
the `Cluster` name field was missing in tsv, json, and yaml formats

The bug was introduced in #2652
2023-07-26 12:33:17 +02:00
Alexander Kukushkin
ae2bbd28ae Fix in_recovery check (#2773)
The primary that is still alive wasn't properly recognized.
Regression was introduced in #2652
2023-07-25 11:50:40 +02:00
Waynerv
0e19e3e98e Make pod role label configurable (#2659)
Close #2495
2023-07-25 10:29:04 +02:00
Alexander Kukushkin
06db296612 Fixes in patroni.request (#2768)
1.  Take client certificates only from the `ctl` section. Motivation: sometimes there are server-only certificates that can't be used as client certificates. As a result neither Patroni not patronictl work correctly even if `--insecure` option is used.
2. Document that if `restapi.verify_client` is set to `required` then client certificates **must** be provided in the `ctl` section.
3.  Add support for `ctl.authentication` and prefer to use it over `restapi.authentication`.
4. Silence annoying InsecureRequestWarning when `patronictl -k` is used, so that behavior becomes is similar to `curl -k`.
2023-07-25 08:48:18 +02:00
Matt Baker
817f39ad6d Refactor get_dcs (#2747)
Now uses generators instead of for loops and implements importing modules once.
2023-07-25 08:01:35 +02:00
Matt Baker
c5a4befdc4 Refactor check_logical_slots_readiness split to reduce complexity (#2749)
Includes:
* renaming of `_unready_logical_slots` to better represent that it is a processing queue which is emptied on successful completion.
* ensuring that return type is consistent.
* made logic variable names explicit to help explain how the decision of whether a slot is "ready" is made.
2023-07-25 08:00:59 +02:00
Waynerv
84c574e1ec Run archive_command through shell (#2766)
Close #2764
2023-07-21 14:30:22 +02:00
Alexander Kukushkin
0c5bf3c4cd Validate more parameters in the config file (#2761)
- parameters for different DCS
- more bootstrap.dcs parameters
- ctl, restapi, and watchdog parameters
2023-07-19 12:42:14 +02:00
Alexander Kukushkin
d46ca88e6b Make it visible replication state on standbys (#2733)
To do that we use `pg_stat_get_wal_receiver()` function, which is available since 9.6. For older versions the `patronictl list` output and REST API responses remain as before.

In case if there is no wal receiver process we check if `restore_command` is set and show the state as `in archive recovery`.

Example of `patronictl list` output:
```bash
$ patronictl list
+ Cluster: batman -------------+---------+---------------------+----+-----------+
| Member      | Host           | Role    | State               | TL | Lag in MB |
+-------------+----------------+---------+---------------------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader  | running             | 12 |           |
| postgresql1 | 127.0.0.1:5433 | Replica | in archive recovery | 12 |         0 |
+-------------+----------------+---------+---------------------+----+-----------+

$ patronictl list
+ Cluster: batman -------------+---------+-----------+----+-----------+
| Member      | Host           | Role    | State     | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader  | running   | 12 |           |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 12 |         0 |
+-------------+----------------+---------+-----------+----+-----------+
```

Example of REST API response:
```bash
$ curl -s localhost:8009 | jq .
{
  "state": "running",
  "postmaster_start_time": "2023-07-06 13:12:00.595118+02:00",
  "role": "replica",
  "server_version": 150003,
  "xlog": {
    "received_location": 335544480,
    "replayed_location": 335544480,
    "replayed_timestamp": null,
    "paused": false
  },
  "timeline": 12,
  "replication_state": "in archive recovery",
  "dcs_last_seen": 1688642069,
  "database_system_identifier": "7252327498286490579",
  "patroni": {
    "version": "3.0.3",
    "scope": "batman"
  }
}

$ curl -s localhost:8009 | jq .
{
  "state": "running",
  "postmaster_start_time": "2023-07-06 13:12:00.595118+02:00",
  "role": "replica",
  "server_version": 150003,
  "xlog": {
    "received_location": 335544816,
    "replayed_location": 335544816,
    "replayed_timestamp": null,
    "paused": false
  },
  "timeline": 12,
  "replication_state": "streaming",
  "dcs_last_seen": 1688642089,
  "database_system_identifier": "7252327498286490579",
  "patroni": {
    "version": "3.0.3",
    "scope": "batman"
  }
}
```
2023-07-13 09:24:20 +02:00
Alexander Kukushkin
e4fe239a9d A few fixes in synchronous_mode (#2741)
- make sure that physical replication slots are created even before the promote happened (when async executor is busy with promote).
- execute `txid_current()` with `synchronous_commit=off` so it doesn't accidentally wait for absent synchronous standbys when `synchronous_mode_strict` is enable and `synchronous_standby_names=*`. These standbys can't connect because replication slots weren't there.
- `synchronous_standby_names` wasn't set to `*` after bootstrap with `synchronous_mode` and `synchronous_mode_strict`.
- add `-c statement_timeout=0` to `PGOPTIONS` when executing `post_bootstrap` script.

Close https://github.com/zalando/patroni/issues/2738
2023-07-12 09:43:40 +02:00
Alexander Kukushkin
6e96db173f Start postgres not in recovery in some cases (#2726)
If we know for sure that a few moments ago postgres was still running as a primary and we still have the leader lock and can successfully update it, in this case we can safely start postgres back not in recovery. That will allow to avoid bumping timeline without a reason and hopefully improve reliability because it will address issues similar to #2720.

In addition to that remove `if self.state_handler.is_starting()` check from the `recover()` method. This branch could never be reached because the `starting` state is handled earlier in the `_run_cycle()`. Besides that remove redundant `self._crash_recovery_executed`.

P.S. now we do not cover cases when Patroni was killed along with Postgres.
Lets consider that we just started Patroni, there is no leader, and `pg_controldata` reports `Database cluster state` as `shut down`. It feels logical to use `Latest checkpoint location` and `Latest checkpoint's TimeLineID` to do a usual leader race and start directly as a primary, but it could be totally wrong. The thing is that we run `postgres --single` if standby wasn't shut down cleanly before executing `pg_rewind`. As a result `Database cluster state` transition from `in archive recovery` to `shut down`, but if such a node becomes a leader the timeline must be increased.
2023-07-12 09:42:34 +02:00
Mark Pekala
412c51ddf1 Prevent splitbrain from duplicate names in configuration (#2724)
When starting check if node with the same is registered in DCS and try to query it's REST API.
If REST API is accessible exit with the error.

Close #1804
2023-07-11 07:43:57 +02:00
Israel
ed02826103 REST API would not reload SSL certificate upon receiving an SIGHUP (#2722)
Revert to using `ssl._ssl._test_decode_cert`

A change has been included as part of Patroni 3.0.3 release: use
public functions instead of `ssl._ssl._test_decode_cert` to get
serial number of certificates.

There was a slight bug in that implementation: it was only loading
the certificates through `load_verify_locations`, but was missing
to get the certificates through `get_ca_certs`. As a consequence
Patroni was not able anymore to reload REST API cert on SIGHUP.

An attempt to fix that issue was made through commit
`20f578f09f3aa604e5288710d4fd4e611152ed5f`. However, even with the
correct call of `get_ca_certs`, it was detected a corner case where
`load_verify_locations` would skip loading a certificate: if it was
issued with `CA:FALSE`. That essentially means the implementation is
still buggy in that situation. See [CPython](c283a0cff5/Modules/_ssl.c (L4618C14-L4619))
for the underlying problem.

In order to get back a functional implementation again we are reverting
the code to use the private function `ssl._ssl._test_decode_cert`.

We can later study a possible more elegant alternative for solving this,
if any.

---------

Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
2023-07-04 18:53:24 +03:00
Alexander Kukushkin
2354f8f004 Fix a few concurrency bugs in Citus support (#2710)
- the `_in_fligh` attribute is accessed from multiple threads and must be protected with mutex when it is changed
- allow adding tasks for `_in_fligh.group` from the `sync_pg_dist_node()` method when timeout is reached. Not doing so might result is indefinite transaction if REST API request from worker node failed.
2023-06-12 07:52:46 +02:00
Polina Bungina
21e92fd166 Add env vars for custom bin names (#2706) 2023-06-01 14:06:11 +02:00
Israel
df18885f20 Extend Postgres GUCs validator (#2671)
* Use YAML files to validate Postgres GUCs through Patroni.

Patroni used to have a static list of Postgres GUCs validators in
`patroni.postgresql.validator`.

One problem with that approach, for example, is that it would not
allow GUCs from custom Postgres builds to be validated/accepted.

The idea that we had to work around that issue was to move the
validators from the source code to an external and extendable source.
With that Patroni will start reading the current validators from that
external source plus whatever custom validators are found.

From this commit onwards Patroni will read and parse all YAML files
that are found under the `patroni/postgresql/available_parameters`
directory to build its Postgres GUCs validation rules.

All the details about how this work can be found in the docstring
of the introduced function `_load_postgres_gucs_validators`.
2023-05-31 13:54:54 +02:00
Israel
d11328020d Add support for custom Postgres binary names (#2692)
When using a custom Postgres distribution it may be the case that the Postgres binaries are compiled with different names other than the ones used by the community Postgres distribution.

With that in mind we implemented a new set of settings for Patroni, so the user is able to override the default binary names with custom binary names through the new section postgresql.bin_name in the local configuration.

References: PAT-17.
2023-05-30 13:57:57 +02:00
Polina Bungina
37fffa618f Refactor daemon entrypoints (#2697)
- abstract_main only creates Config object using the passed configfile
  and instantiates the passed daemon class
- common args parser is extracted into a separate func that is called
  from daemons' main funcs (specific args can be added afterwards)

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2023-05-26 15:13:04 +02:00
Alexander Kukushkin
af8e5f0d0f Refactor update_leader interface (#2690)
pass reference to a last known leader object in order to avoid obtaining it from the `AbstractDCS.cluster` cache.

This change is useful for Consul, Etcd3 and Zookeeper implementations.
2023-05-25 14:21:05 +02:00
Alexander Kukushkin
1c7bf2f59e Fix a problem with etcd3.update_leader() (#2693)
It didn't took into account the fact that we can get a new lease after changing a TTL. In this case we have to update the exiting leader key with the new lease.
To solve it we introduce the on transaction 'failure' callback. The whole workflow looks like (schematically):
```python
txn(
    compare=(old_value == self._name)),
    success=put(self.leader_path, self._name, self._lease),
    failure=txn(
        compare=(create_revision == '0'),
        success=put(self.leader_path, self._name, self._lease)
    )
)
```

The problem was introduced in d98d6d0b02c9cc67464a7b1b31b1c5570c26e12d
2023-05-25 10:01:43 +02:00
Alexander Kukushkin
b4afc6830b Little fixes in etcd3 and kubernetes (#2689)
- Always pass etcd3 key revision as a string
- Make sure the leader key isn't unconditionally overwritten. It may happen that the leader heart-beat loop didn't run properly and the session has expired. In this case the leader may create a new session and a new leader key. But, there are chances that the other node already created a leader key and we don't want to overwrite it.
- Try to sync HA loops between nodes by adding 0.5 seconds to timeout on non-leader nodes
2023-05-24 10:54:26 +02:00
Polina Bungina
6c8a3b0d25 Remove bootstrap.pg_hba (#2684)
* Remove bootstrap.pg_hba
* Extend docs for postgresql.pg_hba/pg_ident
* Add postgresql.pg_hba/pg_ident to dynamic config docs

---------

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2023-05-24 09:01:56 +02:00
Polina Bungina
506b5bec48 Validate-config fixes (#2678)
- fix --validate-config not to error out if bin_dir is an empty string in the yaml config
- mention bin_dir optionality in the docs
- validate bin_dir even if it is not in the yaml config (add optional
  default value for Optional config params in validator)
- make rewind user optional
2023-05-15 13:40:22 +02:00
Alexander Kukushkin
66a0e44371 Enable pyright job for every commit (#2675)
And fix remaining issues that the job doesn't fail.
2023-05-15 11:38:40 +02:00
Israel
fdcf8b1997 Add docstrings and type hints to patroni/api.py (#2648)
References: PAT-77
2023-05-12 15:38:59 +02:00
Alexander Kukushkin
7941c86775 Refactor write_sync_state() (#2669)
Make it return the new `SyncState` object in order to avoid reading the new cluster state in the Ha.process_sync_replication().

Now it is a small optimization, but it will become very handy in the quorum commit feature.
2023-05-11 09:58:15 +02:00
Alexander Kukushkin
76b3b99de2 Enable pyright strict mode (#2652)
- added pyrightconfig.json with typeCheckingMode=strict
- added type hints to all files except api.py
- added type stubs for dns, etcd, consul, kazoo, pysyncobj and other modules
- added type stubs for psycopg2 and urllib3 with some little fixes
- fixes most of the issues reported by pyright
- remaining issues will be addressed later, along with enabling CI linting task
2023-05-09 09:38:00 +02:00
Le Duane
bebe6754fc Add before stop hook (#2642)
The two cases we have in mind are:
* In spite of following all best practices client-side, logical replication connections can sometimes hang the Postgres shutdown sequence. We'd like to sigterm any misbehaving logical replication connections which remain after x seconds. These will inevitably get killed anyway on master stop timeout.
* remove "role=master" label on current primary when not using k8s as DCS. Waiting until after Postgres fully stops can sometimes be too long for this.
* Pause pgbouncer connections before switchover

Close #2596
2023-04-27 13:07:32 +02:00