884 Commits

Author SHA1 Message Date
Alexander Kukushkin
c8e32775df Release v3.2.2 (#3007)
- update release notes
- bump Patroni version
- bump pyright version and fix reported issues
- improve compatibility with legacy psycopg2

Co-authored-by: Polina Bungina <bungina@gmail.com>
2024-01-17 08:35:35 +01:00
Polina Bungina
f2919f9c2f Fixes around pending_restart flag (#3003)
* Do not set pending_restart flag if hot_standby is set to 'off' during a custom bootstrap (even though we will have this flag actually set in PG, this configuration parameter is irrelevant on primary and there is no actual need for restart)
* Skip hot_standby and wal_log_hints when querying parameters pending restart on config reload. They actually can be changed manually (e.g. via ALTER SYSTEM) and it will cause the pending_restart state in PG but Patroni anyway always passes those params to postmaster as command line options. And there they only can have one value - 'on' (except on primary when performing custom bootstrap)
2024-01-16 10:44:30 +01:00
Alexander Kukushkin
2a64bfd459 Restore recovery GUCs when joining running standby (#2998)
Close https://github.com/zalando/patroni/issues/2993
2024-01-08 09:17:17 +01:00
Polina Bungina
3e9bceac11 Don't filter out contradictory nofailover tag (#2992)
* Ensure that nofailover will always be used if both nofailover and
failover_priority tags are provided
* Call _validate_failover_tags from reload_local_configuration() as well
* Properly check values in the _validate_failover_tags(): nofailover value should be casted to boolean like it is done when accessed in other places
2024-01-05 10:16:52 +01:00
zhjwpku
9cc1f8e763 Fix Citus bootstrap - CREATE DATABASE cannot be executed from a function (#2994)
This was introduced by #2990: pod cannot be started and show the
following logs:

```
2023-12-26 03:29:25.569 UTC [47] CONTEXT:  SQL statement "CREATE DATABASE "citus""
        PL/pgSQL function inline_code_block line 5 at SQL statement
2023-12-26 03:29:25.569 UTC [47] STATEMENT:  DO $$
        BEGIN
            PERFORM * FROM pg_catalog.pg_database WHERE datname = 'citus';
            IF NOT FOUND THEN
                CREATE DATABASE "citus";
            END IF;
        END;$$
2023-12-26 03:29:25,570 ERROR: post_bootstrap
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/bootstrap.py", line 474, in post_bootstrap
    self._postgresql.citus_handler.bootstrap()
  File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/mpp/citus.py", line 401, in bootstrap
    cur.execute(sql.encode('utf-8'))
psycopg2.errors.ActiveSqlTransaction: CREATE DATABASE cannot be executed from a function
CONTEXT:  SQL statement "CREATE DATABASE "citus""
PL/pgSQL function inline_code_block line 5 at SQL statement
```
---------

Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
2024-01-05 10:16:26 +01:00
Polina Bungina
15b57c5bdc Exclude leader from failover candidates in ctl (#2983)
Exclude actual leader (not the passed leader argument) from the
candidates list in the `patronictl failover` prompt.
Abort `patronictl failover` execution if candidate specified is
the same as the current cluster leader
2024-01-05 10:12:33 +01:00
Polina Bungina
f10e4805db Actually allow failover to an async candidate in sync mode (#2980) 2024-01-05 10:05:28 +01:00
Polina Bungina
3e0e91f905 Reload postgres config if a server param was reset (#2975)
Fix the case when a parameter value was changed and then reset back to
the initial value without restart - before this fix, the second change
was not reflected in the Postgres config.
This commit also includes the related unit test refactoring.
2024-01-05 09:56:01 +01:00
Alexander Kukushkin
722b4b72a8 Don't let replica restore initialize key when DCS was wiped (#2970)
It was happening from the branch where Patroni was supposed to be complain about converting standalone PG cluster to be governed by Patroni and exit.
2024-01-05 09:55:10 +01:00
Alexander Kukushkin
42cd803619 Fix bug with custom bootstrap (#2948)
Patroni was falsely applying `--command` argument.

Close https://github.com/zalando/patroni/issues/2947
2023-11-30 09:01:47 +01:00
Alexander Kukushkin
bae72df5b1 Fix pg_rewind behavior with Postgres v16+ (#2944)
The error message format was changed in
4ac30ba4f2, what caused `pg_rewind` being called by Patroni even when it was not necessary.
2023-11-30 09:01:41 +01:00
Alexander Kukushkin
f2a129f209 Fix Etcd v2 with Citus (#2943)
When deploying a new Citus cluster with Etcd v2 Patroni was failing to start with the following exception:
```python
2023-11-09 10:51:41,246 INFO: Selected new etcd server http://localhost:2379
Traceback (most recent call last):
  File "/home/akukushkin/git/patroni/./patroni.py", line 6, in <module>
    main()
  File "/home/akukushkin/git/patroni/patroni/__main__.py", line 343, in main
    return patroni_main(args.configfile)
  File "/home/akukushkin/git/patroni/patroni/__main__.py", line 237, in patroni_main
    abstract_main(Patroni, configfile)
  File "/home/akukushkin/git/patroni/patroni/daemon.py", line 172, in abstract_main
    controller = cls(config)
  File "/home/akukushkin/git/patroni/patroni/__main__.py", line 66, in __init__
    self.ensure_unique_name()
  File "/home/akukushkin/git/patroni/patroni/__main__.py", line 112, in ensure_unique_name
    cluster = self.dcs.get_cluster()
  File "/home/akukushkin/git/patroni/patroni/dcs/__init__.py", line 1654, in get_cluster
    cluster = self._get_citus_cluster() if self.is_citus_coordinator() else self.__get_patroni_cluster()
  File "/home/akukushkin/git/patroni/patroni/dcs/__init__.py", line 1638, in _get_citus_cluster
    cluster = groups.pop(CITUS_COORDINATOR_GROUP_ID, Cluster.empty())
AttributeError: 'Cluster' object has no attribute 'pop'
```

It is broken since #2909.

In addition to that fix `_citus_cluster_loader()` interface by allowing it to return only dict obj.
2023-11-30 09:01:19 +01:00
Alexander Kukushkin
df0fd91614 Do a real http request when performing name uniqueness check (#2942)
When running in containers it is possible that the traffic is routed using `docker-proxy`, which listens on the port and accepting incoming connections.

This commit effectively sticks to the original solution from #2878
2023-11-30 09:01:11 +01:00
Alexander Kukushkin
43f23df974 Verify that replica nodes received checkpoint LSN on shutdown (#2939)
In case if archiving is enabled the `Postgresql.latest_checkpoint_location()` method returns LSN of the prev (SWITCH) record, which points to the beginning of the WAL file. It is done in order to make it possible to safely promote replica which recovers WAL files from the archive and wasn't streaming when the primary was stopped (primary doesn't archive this WAL file).

But, in certain cases using the LSN pointing to SWITCH record was causing unnecessary pg_rewind, if replica didn't managed to replay shutdown checkpoint record before it was promoted.

In order to mitigate the problem we need to check that replica received/replayed exactly the shutdown checkpoint LSN. But, at the same time we will still write LSN of the SWITCH record to the `/status` key when releasing the leader lock.
2023-11-30 09:01:05 +01:00
Alexander Kukushkin
3d527f5728 Improve formatting of generated config and validation of ints (#2928)
- order sections similar to sample configs
- add warnings and comments to `bootstrap.dcs` section.
- add `tags` and `log` sections.
- use discovered IPs in `postgresql.connect_address` and `postgresql.listen`
- set `wal_level` to `replica` for PostgreSQL 9.6+
- make unit tests pass with python 3.6
- improve config validator so it doesn't complain when some ints are strings in YAML file.
2023-10-25 14:23:57 +02:00
Mark Pekala
f5ee67fa1c Feature: failover priority (#2780)
The priority is configured with `failover_priority` tag. Possible values are from `0` till infinity, where `0` means that the node will never become the leader, which is the same as `nofailover` tag set to `true`. As a result, in the configuration file one should set only one of `failover_priority` or `nofailover` tags.

The failover priority kicks in only when there are more than one node have the same receive/replay LSN and are ahead of other nodes in the cluster. In this case the node with higher value of `failover_priority` is preferred. If there is a node with higher values of receive/replay LSN, it will become the new leader even if it has lower value of `failover_priority` (except when priority is set to 0).

Close https://github.com/zalando/patroni/issues/2759
2023-10-24 12:22:48 +02:00
Alexander Kukushkin
d471f1156d Handle AuthOldRevision error (#2913)
The error is raised if Etcd is configured to use JWT auth tokens and when the user database in Etcd is updated, because the update invalidates all tokens.

If retries are requested - try to get a new new token and repeat the request. Repeat it in a loop until request is successfully executed or until `retry_timeout` is exhausted. This is the only way of solving a race condition, because between authentication and executing the request yet another modification of the user database in Etcd might happen.

In case if the request doesn't have to be immediately retried - set a flag that the next API request should perform the authentication first and let Patroni to naturally repeat the request on the next heartbeat loop.

Co-authored-by: Kenny Do <kedo@render.com>
Ref: https://github.com/zalando/patroni/pull/2911
2023-10-23 14:00:37 +02:00
Alexander Kukushkin
c5fffb3c97 Further work on permanent physical slots (#2891)
- Fixed issues with has_permanent_slots() method. It didn't took into account the case of permanent physical slots for members, falsely concluding that there are no permanent slots.
- Write to the status key only LSNs for permanent slots (not just for slots that exist on the primary).
  - Include pg_current_wal_flush_lsn() to slots feedback, so that slots on standby nodes could be advanced
- Improved behave tests:
  - Verify that permanent slots are properly created on standby nodes
  - Verify that permanent slots are properly advanced, including DCS failsafe mode
  - Verify that only permanent slots are written to the `/status`
2023-10-23 08:24:28 +02:00
zhjwpku
260ab36f2e mock getaddrinfo in case test failure (#2918)
Close #2915
2023-10-17 19:53:19 +02:00
Alexander Kukushkin
fc67ba73f0 Allow to specify psycopg* in extras and switch to build (#2907)
* remove check_psycopg() call from the setup.py, when installing from wheel it doesn't work anyway.
* call check_psycopg() function before process_arguments(), because the last one is trying to import psycopg and fails with the stacktrace, while the first one shows a nice human-readable error message.
* add psycopg2, psycopg2-binary, and psycopg3 extras, that will install psycopg2>=2.5.4, psycopg2-binary, or psycopg[binary]>=3.0.0 modules respectively.
* move check_psycopg() function to the __main__.py.
* introduce the new extra called `all`, it will allow to install all dependencies at once (except psycopg related).
* use the `build` module in order to create sdist bdist_wheel packages.
* update the documentation regarding psycopg and extras (dependencies).
2023-10-17 14:46:15 +02:00
Alexander Kukushkin
aa3ebe0af8 Don't cache anything in Zookeeper implementation (#2909)
Cache creates a lot of problems and prevents implementing a feature of automatic retention of physical replication slots for members with configurable retention policy.

Just read the entire cluster from Zookeeper instead and use watchers only for the `/leader` and `/config` keys.
2023-10-17 08:56:31 +02:00
Alexander Kukushkin
d93db20baa Set citus.local_hostname (#2903)
There are cases when Citus wants to have a connection to the local postgres. By default it uses `localhost` for that, which is not alwasy available. To solve it we will set `citus.local_hostname` GUC to custom value, which is the same as Patroni uses to connect to Postgres.
2023-10-16 10:21:50 +02:00
Alexander Kukushkin
9b8c40a6e1 Start thread that will handle SIGCHLD for on_reload callback (#2898)
Close #2897
2023-10-10 09:54:24 +02:00
Alexander Kukushkin
e19a8730ea Take IP from the pod if kubernetes.pod_ip is missing (#2895)
It used to work before #2652

Besides that fix a couple of more problems:
- make sure `_patch_or_create()` method isn't instantiating the `k8s_client.V1ConfigMap` object instead of `k8s_client.V1Endpoints` for non leader endpoints. The only reason it worked is that the JSON serialization for both object types is the same and doesn't include the object type name.
- `attempt_to_acquire_leader()` should immediately put the IP address of the primary to the leader endpoint. It didn't happen because of the oversight in the https://github.com/zalando/patroni/pull/1820.
2023-10-09 10:43:43 +02:00
Polina Bungina
efacc6c16b Ignore synchronous_mode setting in a standby cluster (#2896)
is_synchronous_mode() should always return False in standby clusters
2023-10-06 10:21:37 +02:00
Alexander Kukushkin
9283ebda64 Enforce loop_wait/retry_timeout/ttl rule (#2869)
* hard-code minimal possible values
* make adjustments if values are lower or if the rule is violated and show warnings
* update documentation
2023-10-04 11:44:57 +02:00
Polina Bungina
220cacd95f Don't call socket functions from tests (#2886)
We used to call `socket` module's functions from the config_generator tests
to later compare with the output produced by --generate-config. That
however sometimes ends up with the whole test module failure if gethostname()
returned None.
Also includes a little code deduplication (NO_VALUE_MSG imported directly from the config_generator module)
and removes debug maxDiff option
2023-09-26 15:52:00 +02:00
Alexander Kukushkin
c855b0bff9 Detect and solve inconsistency between /sync and actual sync nodes (#2877)
Patroni is changing `synchronous_standby_names` and the `/sync` key in a very specific order, first we add nodes to `synchronous_standby_names` and only after, when they are recognized as synchronous they are added to the `/sync` key. When removing nodes the order is different: they are first removed from the `/sync` key and only after that from the `synchronous_standby_names`.

As a result Patroni expects that either actual synchronous nodes will match with the nodes listed in the `/sync` key or that new candidates to synchronous nodes will not match with nodes listed in the `/sync` key. In case if `synchronous_standby_names` was removed from the `postgresql.conf`, manually, or due the the bug (#2876), the state becomes inconsistent because of the wrong order of updates.

To solve inconsistent state we introduce additional checks and will update the `/sync` key with actual names of synchronous nodes (usually empty set).
2023-09-26 11:14:20 +02:00
Alexander Kukushkin
4c1c804cfd Read GUC's values when joining running Postgres (#2876)
If restarted in pause Patroni was discarding `synchronous_standby_names` from `postgresql.conf` because in the internal cache this values was set to `None`. As a result synchronous replication transitioned to a broken state, with no synchronous replicas according to the `synchronous_standby_names` and Patroni not selecting/setting the new synchronous replicas (another bug).

To solve the problem of broken initial state and to avoid similar issues with other GUC's we will read GUC's value if Patroni is joining running Postgres.
2023-09-26 10:40:51 +02:00
Alexander Kukushkin
48514db84b Take into account current role when deciding on removal of member ZNode (#2884)
Patroni doesn't watch on all changes of member keys in order to not create too much load on ZooKeeper, but only subscribes to changes (ZNodes added or deleted) in the `/member` directory. Therefore when some important fields in the value are updated we remove and recreate ZNode in order to notify the leader or other members.

The leader should remove the member key only when the `checkpoint_after_promote` value is changed and replicas when the `state` is changed to/from `running`.

We don't care about the `version` field, because Patroni version can't be changed without restart, what will case ZooKeeper `session_id` to change it anyway.

This fix hopefully will reduce failures of behave tests on GH Actions.
2023-09-26 09:12:31 +02:00
Alexander Kukushkin
2bd821a768 Bugfix for GUC's values with units (#2883)
Despite being validated by `IntValidator` some GUC's couldn't be casted directly to `int` because they include suffix. Example: `128MB`.

Close https://github.com/zalando/patroni/issues/2879
2023-09-26 08:31:55 +02:00
Alexander Kukushkin
fe16c3610e Silence annoying warnings when checking for node uniqueness (#2878)
WARNING messages are produced by `urllib3` if Patroni is quickly restarted.
Instead we will check that the node is listen on a given port. This fact is actually enough to detect names clashes, while HTTP request could raise an exception is a few other cases, what might case false negatives.

Close https://github.com/zalando/patroni/issues/2881
2023-09-26 08:29:35 +02:00
Alexander Kukushkin
bc15813de0 Permanent physical slots on standby nodes (#2852)
Create permanent physical replication slots on standby nodes and use `pg_replication_slot_advance()` function to move them forward.

The `restart_lsn` is advanced based on values stored in the `/status` key by the primary node.

When slot is created on a replica it could be ahead the same slot on the primary and therefore there is some period of time when it doesn't protect WAL files from being recycled.
2023-09-20 16:50:37 +02:00
Polina Bungina
71863cedcb Always store CMDLINE_OPTIONS config values as int (#2861) 2023-09-14 18:34:45 +02:00
Israel
728abfcc37 Fix bug in patronictl query command (#2859)
Previous to this commit `patronictl query` was working only if `-r` argument was provided to the command. Otherwise it would face issues:

* If neither `-r` nor `-m` were provided:

```
 PGPASSWORD=zalando patronictl -c postgres0.yml query -U postgres -c "SHOW PORT"
2023-09-12 17:45:38	No connection to role=None is available
```

* If only `-m` was provided:

```
$ PGPASSWORD=zalando patronictl -c postgres0.yml query -U postgres -c "SHOW PORT" -m postgresql0
2023-09-12 17:46:15	No connection to member postgresql0 is available
```

This issue was a regression introduced by `4c3e0b9382820524239d2aa4d6b95379ef1291db` through PR #2687.

Through that PR we decided to move the common logic used to check mutually exclusiveness of `--role` and `--member` arguments to `get_any_member` function.

However, previous to that change `role` variable would assume the default value of `any` in `query` method, before `get_any_member` was called, which was not the case after the change.

This commit fixes that issue by adding a handler in `get_cursor` function to `role=None`. As `role` defaulting to `any` is handled in a sub-call to `get_any_member`, we are apparently safe in `get_cursor` to return the cursor if `role=None`.

Unit tests were updated accordingly.

References: PAT-204.
2023-09-14 15:27:23 +02:00
Alexander Kukushkin
238b8db91e Introduce Status class (#2853)
It represents the `/status` key in DCS and makes it easier to introduce new values stored in the `/status` key without need to refactor all DCS implementations.
2023-09-14 14:40:44 +02:00
Polina Bungina
b31a4d55c9 Ensure strict failover/switchover definition difference (#2784)
- Don't set leader in failover key from patronictl failover
- Show warning and execute switchover if leader option is provided for patronictl failover command
- Be more precise in the log messages
- Allow to failover to an async candidate in sync mode
- Check if candidate is the same as the leader specified in api
- Fix and extend some tests
- Add documentation
2023-09-12 08:51:17 +02:00
Alexander Kukushkin
19f20ec2eb Refactor replication slots handling (#2851)
1. make _get_members_slots() method return data in the same format as _get_permanent_slots() method
2. move conflicting name handling from get_replication_slots() to _get_members_slots() method
3. enrich structure returned by get_replication_slots() with the LSN of permanent logical slots reported by primary
4. use the added information in the SlotsHandler instead of fetching it from the Cluster.slots
5. bugfix: don't try to advance logical slot that doesn't match required configuration
2023-09-07 12:56:07 +02:00
Alexander Kukushkin
30f0f132e8 Don't start stopped postgres in pause (#2848)
Due to a race condition Patroni was falsely assuming that the standby should be restarted because some recovery parameters (primary_conninfo or similar) were changed.

Close https://github.com/zalando/patroni/issues/2834
2023-09-06 08:57:56 +02:00
Alexander Kukushkin
941e883dde Override write_leader_optime method in K8s implementation (#2850)
It is being called when postgres is already shut down cleanly but there are no healthy replicas to take it over.

Close https://github.com/zalando/patroni/issues/2837
Close https://github.com/zalando/patroni/pull/2838
2023-09-05 07:41:45 +02:00
Polina Bungina
89a162e000 Return system id to the ctl list title (#2840) 2023-09-05 07:27:34 +02:00
Alexander Kukushkin
0ab5b49757 Introduce a dedicated postgres connection for REST API (#2833)
Sharing a single connection between REST API and the main thread (doing heartbeats) was working mostly fine, except when Postgres becomes so slow that REST API queries start blocking the main loop.

If the dedicated REST API connection isn't available we use the heartbeat connection as a fallback.
2023-09-05 07:26:44 +02:00
Alexander Kukushkin
6b7f914da7 Fix bug with kubernetes.standby_leader_label_value (#2832)
When running with the leader lock Patroni was just setting the `role` label to `master` and effectively `kubernetes.standby_leader_label_value` feature never worked.

Now it is fixed, but in order to not introduce breaking changes we just update default value of the `standby_leader_label_value` to the `master`.
2023-09-04 10:03:37 +02:00
Alexander Kukushkin
89d794facc Introduce connection pool (#2829)
Make it hold connection kwargs for local connections and all `NamedConnection` objects use them automatically.

Also get rid of redundant `ConfigHandler.local_connect_kwargs`.

On top of that we will introduce a dedicated connection for the REST API thread.
2023-08-24 16:13:22 +02:00
Alexander Kukushkin
3333e78500 Factor out tags handling into a dedicated class (#2823)
The same (almost) logic was used in three different places:
1. `Patroni` class
2. `Member` class
3. `_MemberStatus` class

Now they all inherit newly intoduced `Tags` class.
2023-08-21 17:03:14 +02:00
Alexander Kukushkin
2be64e5131 Don't return logical slots for standby cluster (#2816)
Cluster.get_replication_slots() didn't take into account that there can not be logical replication slots in a standby cluster replicas. It was only skipping logical slots for the standby_leader, but replicas were expecting that they will have to copy them over.

Also on replicas in a standby cluster these logical slots were falsely added to the `_replication_slots` dict.
2023-08-18 13:36:32 +02:00
Alexander Kukushkin
366829e379 Refactor Connection class (#2815)
1. stop using the same cursor all the time, it creates problems when not carefully used from different threads.
2. introduce query() method in the Connection class and make it return a result set when it is possible.
3. refactor most of the code that is relying (directly or indirectly) on the Connection object to use the query() method as much as possible.

This refactoring helps with reducing code complexity and will help with future introduction of a separate database connection for the REST API thread. The last one will help to improve reliability when system is under significant stress when simple monitoring queries are taking seconds to execute and the REST API starts blocking the main thread.
2023-08-17 15:42:11 +02:00
Alexander Kukushkin
704d36815a Explicitly enable synchronous mode (#2820)
Close https://github.com/zalando/patroni/issues/2819

Co-authored-by: Polina Bungina <27892524+hughcapet@users.noreply.github.com>
2023-08-17 12:33:15 +02:00
Alexander Kukushkin
6a75b1591b Use pg_current_wal_flush_lsn() starting from 9.6 (#2813)
Due to historical reasons (not available before 9.6) we used `pg_current_wal_lsn()`/`pg_current_xlog_location()` functions to get current WAL LSN on the primary. But, this LSN is not necessarily synced to disk, and could be lost if the primary node crashed.
2023-08-15 09:01:37 +02:00
Alexander Kukushkin
9209a5a133 Refactor delete_leader interface (#2810)
similar to https://github.com/zalando/patroni/pull/2690, but it helps mostly Consul implementation.
2023-08-11 10:19:29 +02:00