During the network split some Etcd nodes could become stale, however they are still available for read requests. It is not a problem for the old primary, because such etcd nodes are not writable and primary demotes when fails to update the leader lock.
When the network split resolves and Etcd node connects back to the cluster, it may trigger the leader election (in Etcd), what typically results in some failures of client requests. Such state quickly resolves, and client requests could be retried. but it takes some time for the disconnected node to catch up. During this time it shows stale data for read requests without quorum requirement.
It could happen that the current Patroni primary is impacted by failed client requests while Etcd cluster is doing leader elections and there is a chance that it may switch to the stale Etcd node, discover that someone else is a leader, and demote.
To protect from this situation we will memorize the last known "term" of the Etcd cluster and when executing client requests we will compare the "term" reported by Etcd node with the term we memorized. It allows to detect stale Etcd nodes and temporary ignore them by switching to some other available Etcd nodes.
An alternative approach to solve this problem would be using quorum/serializable reads for read requests, but it will increase resource usage on Etcd nodes.
Close#3314
Fix the following errors in section "*PostgreSQL parameters controlled by Patroni*":
1. Supplement the missing parameter `wal_log_hints`.
2. In fact, these controlled parameters are written into `postgresql.conf`.
3. In fact, these controlled parameters are passed as a list of arguments to the `postgres` (not `pg_ctl start`).
4. Add a note about that `wal_keep_segments` and `wal_keep_size` are not passed to the `postgres`.
Since 01d07f86c, the permissions of postgresql.conf created in PGDATA was
explicitly set. However, the umask of the Patroni process was adjusted as well
and as a result of this, Patroni would write postgresql.conf with 600
permissions if the configuration files are outside PGDATA.
Fix this by using the original umask as mode for files created outside PGDATA.
Fixes: #3302
Currently Patroni replaces config files only if it detected a change in global configuration + patroni.yaml, however it could be that configs on filesystem were updated by humans and we want to "restore" them.
We should ignor the former leader with higher priority when it reports the same LSN as the current node.
This bug could be a contributing factor to issues described in #3295
In addition to that mock socket.getaddrinfo() call in test_api.py to avoid hitting DNS servers.
1. when evaluating whether there are healthy nodes for a leader race before demoting we need to take into account quorum requirements. Without it the former leader may end up in recovery surrounded by asynchronous nodes.
2. QuorumStateResolver wasn't correctly handling the case when the replica node quickly joined and disconnected, what was resulting in the following errors:
```
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 427, in _generate_transitions
yield from self.__remove_gone_nodes()
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 327, in __remove_gone_nodes
yield from self.sync_update(numsync, sync)
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 227, in sync_update
raise QuorumError(f'Sync {numsync} > N of ({sync})')
patroni.quorum.QuorumError: Sync 2 > N of ({'postgresql2'})
2025-02-14 10:18:07,058 INFO: Unexpected exception raised, please report it as a BUG
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 246, in __iter__
transitions = list(self._generate_transitions())
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 423, in _generate_transitions
yield from self.__handle_non_steady_cases()
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 281, in __handle_non_steady_cases
yield from self.quorum_update(len(voters) - self.numsync, voters)
File "/home/akukushkin/git/patroni/patroni/quorum.py", line 184, in quorum_update
raise QuorumError(f'Quorum {quorum} < 0 of ({voters})')
patroni.quorum.QuorumError: Quorum -1 < 0 of ({'postgresql1'})
2025-02-18 15:50:48,243 INFO: Unexpected exception raised, please report it as a BUG
```
Allow to define labels that will be assigned to a postgres instance pod when in 'initializing new cluster', 'running custom bootstrap script', 'starting after custom bootstrap', or 'creating replica' state
The first one if available starting from PostgreSQL v13 and contains the
real write LSN. We will prefer it over value returned by
pg_last_wal_receive_lsn(), which is in fact flush LSN.
The second one is available starting from PostgreSQL v9.6 and points to
WAL flush on the source host. In case of primary it will allow to better
calculate the replay lag, because values stored in DCS are updated only
every loop_wait seconds.
If logical replication slot is created with failover => true option, we
get respective field set to true in `pg_replication_slots` view.
By avoiding interacting with such slots we make logical failover slots
feature fully functional in PG17.
It doesn't accept multiple hosts with [] character in URL anymore.
To mitigate the problem we switch to native wrappers of
PQconninfoParse() function from libpq when it is possible and use own
implementation only when psycopg2 is too old.
the problem existed because _build_retain_slots() method was falsely relying on members being present in DCS, while on failover the member key for the former leader is expiring exactly at the same time.
Consider a situation: there is a permanent logical slot and primary and replica are temporary down.
When Patroni is started on the former primary it starts Postgres in a standby mode, what leads to removal of physical replication slot for the replica because it has xmin.
We should postpone removal of such physical slots:
- on replica until there will be a leader in the cluster
- on primary until Postgres is promoted
- fix unit tests (logging now uses time.time_ns() instead of time.time())
- update setup.py
- update tox.ini
- enable unix and behave tests with 3.13
Close https://github.com/patroni/patroni/issues/3243
Test if config (file) parsed with yaml_load() contains a valid Mapping
object, otherwise Patroni throws an explicit exception. It also makes
the Patroni output more explicit when using that kind of "invalid"
configuration.
``` console
$ touch /tmp/patroni.yaml
$ patroni --validate-config /tmp/patroni.yaml
/tmp/patroni.yaml does not contain a dict
invalid config file /tmp/patroni.yaml
```
reportUnnecessaryIsInstance is explicitly ignored since we can't
determine what yaml_safeload can bring from a YAML config (list,
dict,...).
* Compatibility with python-json-logger>=3.1
After refactoring the old API is still working, but producing warnings
and pyright also fails.
Besides that improve coverage of watchdog/base.py and ctl.py
* Stick to ubuntu 22.04
* Please pyright
Patroni could be doing replica bootstrap and we don't want want pg_basebackup/wal-g/pgBackRest/barman or similar keep running.
Besides that, remove data directory on replica bootstrap failure if configuration allows.
Close#3224
When doing `patronictl restart <clustername> --pending`, the confirmation lists all members, regardless if their restart is really pending:
```
> patronictl restart pgcluster --pending
+ Cluster: pgcluster (7436691039717365672) ----+----+-----------+-----------------+---------------------------------+
| Member | Host | Role | State | TL | Lag in MB | Pending restart | Pending restart reason |
+--------+----------+--------------+-----------+----+-----------+-----------------+---------------------------------+
| win1 | 10.0.0.2 | Sync Standby | streaming | 8 | 0 | * | hba_file: [hidden - too long] |
| | | | | | | | ident_file: [hidden - too long] |
| | | | | | | | max_connections: 201->202 |
+--------+----------+--------------+-----------+----+-----------+-----------------+---------------------------------+
| win2 | 10.0.0.3 | Leader | running | 8 | | * | hba_file: [hidden - too long] |
| | | | | | | | ident_file: [hidden - too long] |
| | | | | | | | max_connections: 201->202 |
+--------+----------+--------------+-----------+----+-----------+-----------------+---------------------------------+
| win3 | 10.0.0.4 | Replica | streaming | 8 | 0 | | |
+--------+----------+--------------+-----------+----+-----------+-----------------+---------------------------------+
When should the restart take place (e.g. 2024-11-27T08:27) [now]:
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2) []:
Are you sure you want to restart members win1, win2, win3? [y/N]:
```
When we proceed with the restart despite the scary message mentioning all members, not just the ones needing a restart, there will be an error message stating the node not to be restarted was indeed not restarted:
```
Are you sure you want to restart members win1, win2, win3? [y/N]: y
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2) []:
Success: restart on member win1
Success: restart on member win2
Failed: restart for member win3, status code=503, (restart conditions are not satisfied)
```
The misleading confirmation message can also be seen when using the `--any` flag.
The current PR is fixing this.
However, we do not apply filtering in case of scheduled pending restart, because the condition must be evaluated at the scheduled time.
This allows to set whether a particular permanent replication slot should always be created ('cluster_type=any', the default), or just on a primary ('cluster_type=primary') or standby ('cluster_type=standby') cluster, respectively.
When in automatic mode we probably don't need to warn user about failure to set up watchdog. This is the common case and makes many users think that this feature is somehow necessary to run Patroni safely. For most users it is completely fine to run without and it makes sense to reduce their log spam.
1. Implemented compatibility.
2. Constrained the upper version in requirements.txt to avoid future failures.
3. Setup an additional pipeline to check with the latest ydiff.
Close#3209Close#3212Close#3218