Currently Patroni replaces config files only if it detected a change in global configuration + patroni.yaml, however it could be that configs on filesystem were updated by humans and we want to "restore" them.
We should ignor the former leader with higher priority when it reports the same LSN as the current node.
This bug could be a contributing factor to issues described in #3295
In addition to that mock socket.getaddrinfo() call in test_api.py to avoid hitting DNS servers.
1. when evaluating whether there are healthy nodes for a leader race before demoting we need to take into account quorum requirements. Without it the former leader may end up in recovery surrounded by asynchronous nodes.
2. QuorumStateResolver wasn't correctly handling the case when the replica node quickly joined and disconnected, what was resulting in the following errors:
```
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 427, in _generate_transitions
yield from self.__remove_gone_nodes()
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 327, in __remove_gone_nodes
yield from self.sync_update(numsync, sync)
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 227, in sync_update
raise QuorumError(f'Sync {numsync} > N of ({sync})')
patroni.quorum.QuorumError: Sync 2 > N of ({'postgresql2'})
2025-02-14 10:18:07,058 INFO: Unexpected exception raised, please report it as a BUG
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 246, in __iter__
transitions = list(self._generate_transitions())
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 423, in _generate_transitions
yield from self.__handle_non_steady_cases()
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 281, in __handle_non_steady_cases
yield from self.quorum_update(len(voters) - self.numsync, voters)
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 184, in quorum_update
raise QuorumError(f'Quorum {quorum} < 0 of ({voters})')
patroni.quorum.QuorumError: Quorum -1 < 0 of ({'postgresql1'})
2025-02-18 15:50:48,243 INFO: Unexpected exception raised, please report it as a BUG
```
Allow to define labels that will be assigned to a postgres instance pod when in 'initializing new cluster', 'running custom bootstrap script', 'starting after custom bootstrap', or 'creating replica' state
The first one if available starting from PostgreSQL v13 and contains the
real write LSN. We will prefer it over value returned by
pg_last_wal_receive_lsn(), which is in fact flush LSN.
The second one is available starting from PostgreSQL v9.6 and points to
WAL flush on the source host. In case of primary it will allow to better
calculate the replay lag, because values stored in DCS are updated only
every loop_wait seconds.
If logical replication slot is created with failover => true option, we
get respective field set to true in `pg_replication_slots` view.
By avoiding interacting with such slots we make logical failover slots
feature fully functional in PG17.
It doesn't accept multiple hosts with [] character in URL anymore.
To mitigate the problem we switch to native wrappers of
PQconninfoParse() function from libpq when it is possible and use own
implementation only when psycopg2 is too old.
the problem existed because _build_retain_slots() method was falsely relying on members being present in DCS, while on failover the member key for the former leader is expiring exactly at the same time.
Consider a situation: there is a permanent logical slot and primary and replica are temporary down.
When Patroni is started on the former primary it starts Postgres in a standby mode, what leads to removal of physical replication slot for the replica because it has xmin.
We should postpone removal of such physical slots:
- on replica until there will be a leader in the cluster
- on primary until Postgres is promoted
- fix unit tests (logging now uses time.time_ns() instead of time.time())
- update setup.py
- update tox.ini
- enable unix and behave tests with 3.13
Close https://github.com/patroni/patroni/issues/3243
Test if config (file) parsed with yaml_load() contains a valid Mapping
object, otherwise Patroni throws an explicit exception. It also makes
the Patroni output more explicit when using that kind of "invalid"
configuration.
``` console
$ touch /tmp/patroni.yaml
$ patroni --validate-config /tmp/patroni.yaml
/tmp/patroni.yaml does not contain a dict
invalid config file /tmp/patroni.yaml
```
reportUnnecessaryIsInstance is explicitly ignored since we can't
determine what yaml_safeload can bring from a YAML config (list,
dict,...).
* Compatibility with python-json-logger>=3.1
After refactoring the old API is still working, but producing warnings
and pyright also fails.
Besides that improve coverage of watchdog/base.py and ctl.py
* Stick to ubuntu 22.04
* Please pyright
Patroni could be doing replica bootstrap and we don't want want pg_basebackup/wal-g/pgBackRest/barman or similar keep running.
Besides that, remove data directory on replica bootstrap failure if configuration allows.
Close#3224
When doing `patronictl restart <clustername> --pending`, the confirmation lists all members, regardless if their restart is really pending:
```
> patronictl restart pgcluster --pending
+ Cluster: pgcluster (7436691039717365672) ----+----+-----------+-----------------+---------------------------------+
| Member | Host | Role | State | TL | Lag in MB | Pending restart | Pending restart reason |
+--------+----------+--------------+-----------+----+-----------+-----------------+---------------------------------+
| win1 | 10.0.0.2 | Sync Standby | streaming | 8 | 0 | * | hba_file: [hidden - too long] |
| | | | | | | | ident_file: [hidden - too long] |
| | | | | | | | max_connections: 201->202 |
+--------+----------+--------------+-----------+----+-----------+-----------------+---------------------------------+
| win2 | 10.0.0.3 | Leader | running | 8 | | * | hba_file: [hidden - too long] |
| | | | | | | | ident_file: [hidden - too long] |
| | | | | | | | max_connections: 201->202 |
+--------+----------+--------------+-----------+----+-----------+-----------------+---------------------------------+
| win3 | 10.0.0.4 | Replica | streaming | 8 | 0 | | |
+--------+----------+--------------+-----------+----+-----------+-----------------+---------------------------------+
When should the restart take place (e.g. 2024-11-27T08:27) [now]:
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2) []:
Are you sure you want to restart members win1, win2, win3? [y/N]:
```
When we proceed with the restart despite the scary message mentioning all members, not just the ones needing a restart, there will be an error message stating the node not to be restarted was indeed not restarted:
```
Are you sure you want to restart members win1, win2, win3? [y/N]: y
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2) []:
Success: restart on member win1
Success: restart on member win2
Failed: restart for member win3, status code=503, (restart conditions are not satisfied)
```
The misleading confirmation message can also be seen when using the `--any` flag.
The current PR is fixing this.
However, we do not apply filtering in case of scheduled pending restart, because the condition must be evaluated at the scheduled time.
This allows to set whether a particular permanent replication slot should always be created ('cluster_type=any', the default), or just on a primary ('cluster_type=primary') or standby ('cluster_type=standby') cluster, respectively.
When in automatic mode we probably don't need to warn user about failure to set up watchdog. This is the common case and makes many users think that this feature is somehow necessary to run Patroni safely. For most users it is completely fine to run without and it makes sense to reduce their log spam.
1. Implemented compatibility.
2. Constrained the upper version in requirements.txt to avoid future failures.
3. Setup an additional pipeline to check with the latest ydiff.
Close#3209Close#3212Close#3218
Additionally run on_role_change callback in post_recover() for a primary
that failed to start after a crash to increase chances the callback is executed,
even if the further start as a replica fails
---------
Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
python-consul is unmaintained for a long time and py-consul is an official replacement.
However, we still keep backward compatibility with python-consul.
Close: #3189
Declaring variables with `Union` and using `isinstance()` hack doesn't work anymore. Therefore the code is updated to use `Any` for variable and `cast` function after firguring out the correct type in order to avoid getting errors about `Unknown` types.