If configured, only IPs that matching rules would be allowed to call unsafe endpoints.
In addition to that, it is possible to automatically include IPs of members of the cluster to the list.
If neither of the above is configured the old behavior is retained.
Partially address https://github.com/zalando/patroni/issues/1734
The #1820 introduced a new key in DCS, named /status, which contains the leader optime and maybe flush LSN of logical slots.
In case of etcd3 and raft we should use updates of this key for syncing HA loop across nodes (before we were using /optime/leader).
- Resolve Node IP for every connection attempt
- Handle exception with connection failures due to failed resolve
- Set PySyncObj DNS Cache timeouts aligned with `loop_wait` and `ttl`
In addition to that, postpone the leader race for freshly started Raft nodes. It will help with the situation when the leader node was alone and demoted the Postgres and after that, the replica arrives, and quickly takes the leader lock without really performing the leader race.
Close https://github.com/zalando/patroni/issues/1930, https://github.com/zalando/patroni/issues/1931
The #1527 introduced a feature of updating `/optime/leader` with the location of the last checkpoint after the Postgres was shutdown cleanly.
If wal archiving is enabled, Postgres always switching the WAL file before writing the checkpoint shutdown record. Normally it is not an issue, but for databases without too much write activity it could lead to the situation that the visible replication lag becomes equal to the size of a single WAL file. In fact, the previous WAL file is mostly empty and contains only a few records.
Therefore it should be safe to report the LSN of the SWITCH record before the shutdown checkpoint.
In order to do that, Patroni first gets the output of the pg_controldata and based on it calls pg_waldump two times:
* The first call reads the checkpoint record (and verifies that this is really the shutdown checkpoint).
* The next call reads the previous record and in case if it is the 'xlog switch' (for 9.3 and 9.4) or 'SWITCH' (for 9.5+), the LSN
of the SWITCH record is written to the `/optime/leader`.
In case of any mismatch, failure to call pg_waldump or parse its output, the old behavior is retained, i.e. `Latest checkpoint location` from the pg_controldata is used.
Close https://github.com/zalando/patroni/issues/1860
If `PyInstaller`-frozen application using `patroni` is placed in a location that contains a dot in the path, the `dcs_modules()` function in `patroni.dcs` breaks because `pkgutil.iter_importers()` treats the given path as a package name.
Ref: pyinstaller/pyinstaller#5944
This can be avoided altogether by not passing a path to `iter_importers()`, because `PyInstaller`'s `FrozenImporter` is a singleton and registered as top-level finder.
Old versions of `kazoo` immediately discarded all requests to Zookeeper if the connection is in the `SUSPENDED` state. This is absolutely fine because Patroni is handling retries on its own.
Starting from 2.7, kazoo started queueing requests instead of discarding and as a result, the Patroni HA loop was getting stuck until the connection to Zookeeper is reestablished, causing no demote of the Postgres.
In order to return to the old behavior we override the `KazooClient._call()` method.
In addition to that, we ensure that the `Postgresql.reset_cluster_info_state()` method is called even if DCS failed (the order of calls was changed in the #1820).
Close https://github.com/zalando/patroni/issues/1981
When joining already running Postgres, Patroni ensures that config files are set according to expectations.
With recovery parameters converted to GUCs in Postgres v12 it became a little problem, because when the `Postgresql` object is being created it is not yet known where the given replica is supposed to stream from.
It resulted in postgresql.conf first being written without recovery parameters, and on the next run of HA loop Patroni noticing inconsistencies and updating the config one more time.
For Postgres v12 it is not a big issue, but for v13+ it resulted in interruption of streaming replication.
PostgreSQL 14 changed the behavior of replicas when certain parameters (like for example `max_connections`) are changed (increased): https://github.com/postgres/postgres/commit/15251c0a.
Instead of immediately exiting Postgres 14 pauses replication and waits for actions from the operator.
Since the `pg_is_wal_replay_paused()` returning `True` is the only indicator of such a change, Patroni on the replica will call the `pg_wal_replay_resume()`, which would cause either continue replication or shutdown (like previously).
So far Patroni was never calling `pg_wal_replay_resume()` on its own, therefore, to remain backward compatible it will call it only for PostgreSQL 14+.
1. When everything goes normal, only one line will be written for every run of HA loop (see examples):
```
INFO: no action. I am (postgresql0) the leader with the lock
INFO: no action. I am a secondary (postgresql1) and following a leader (postgresql0)
```
2. The `does not have lock` became a debug message.
3. The `Lock owner: postgresql0; I am postgresql1` will be shown only when stream doesn't look normal.
Promoting the standby cluster requires updating load-balancer health checks, which is not very convenient and easy to forget.
In order to solve it, we change the behavior of the `/leader` health-check endpoint. It will return 200 without taking into account whether PostgreSQL is running as the primary or the standby_leader.
It could happen that the replica for some reason is missing the WAL file required by the replication slot.
The nature of this phenomenon is a bit unclear, it might be that the WAL was recycled short before we copied the slot file, but, we still need a solution to this problem. If the `pg_replication_slot_advance()` fails with the `UndefinedFile` exception (requested WAL segment pg_wal/... has already been removed), the logical slot on the replica must be recreated.
When the unix_socket_directories is not known Patroni was immediately going back to tcp connection via the localhost.
The bug was introduced in https://github.com/zalando/patroni/pull/1865
and run raft behave tests with encryption enabled.
Using the new `pysyncobj` release allowed us to get rid of a lot of hacks with accessing private properties and methods of the parent class and reduce the size of the `raft.py`.
Close https://github.com/zalando/patroni/issues/1746
This adds the default Postgres settings enforced by Patroni to the `postgres{n}.yml` files provided in the repo. The documentation does call out the defaults that Patroni will set, but it can be missed if you download postgres0.yml and use that as a starting point. Hopefully the extra commented out configs serve as a visual cue to save the next person from the same mistake :)
Despite all recovery parameters became GUCs in PostgreSQL 12, there are very good reasons to keep them separated in the Patroni internals.
While implementing PostgreSQL parameters validation in #1674 one little oversight occurred. The parameters validation happens before the recovery parameters are skipped from the list, which produces a false warning.
Close https://github.com/zalando/patroni/issues/1907
1. Commit 04b9fb9dd4 introduced additional conditions for updating cached version of the leader optime. It was required for implementing health-checks based on replication lag in the https://github.com/zalando/patroni/pull/1599.
What in fact was forgotten, the event should be cleared after the new value of the optime was fetched. Not doing so results in running the HA loop more frequently than is required.
2. Don't watch for sync members.
The watch for sync member(s) was introduced in order to give a signal to the leader that one of the members set the `nosync` tag to true.
Since that time we have got a few more conditions that should be notified about, therefore instead of watching for all members of the cluster every cluster member checks whether the condition is met, and instead of updating ZNode performs delete+create.
Since every member is already watching for new ZNodes to be created inside the $scope/members/, they automatically get notified about important changes, and therefore watching for sync members is redundant.
3. In addition to that, slightly increase watch timeout, it will keep HA loops in sync across all nodes in the cluster.
Close https://github.com/zalando/patroni/pull/1873
This PR fixes#1886.
We get the certificate serial number on server startup and store it in `api.__ssl_serial_number`
On reload, we get again the serial number from disk and compare it to the one stored in `api.__ssl_serial_number`: if different, then the api will be reloaded (even if the config file didn't change)
Effectively, this PR consists of a few changes:
1. The easy part:
In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots.
In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster.
2. The tricky part:
On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()` function.
When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward.
3. Additional requirements:
In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots.
4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use.
Compatibility.
The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary.
The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed.
Close: https://github.com/zalando/patroni/issues/1749
The old strategy was waiting for 1 second and hoping that we will get an update event from the WATCH connection.
Unfortunately, it didn't work well in practice. Instead, we will get the current value from the API by performing an explicit read request.
Close https://github.com/zalando/patroni/issues/1767
This commit makes it possible to configure the maximum lag (`maximum_lag_on_syncnode`) after which Patroni will "demote" the node from synchronous and replace it with another node.
The previous implementation always tried to stick to the same synchronous nodes (even if they are not optimal ones).
If an existing instance was configured with WAL residing outside of
PGDATA then currently a 'reinit' would lose such symlinks. So add some
bits of information on that to draw attention to this cornercase issue
and also add the --waldir option to the sample `postgresql.basebackup`
configuration sections to increase visibility.
Discussion: https://github.com/zalando/patroni/issues/1817
It could happen that the cluster role wither not configured or doesn't provide enough permissions. In this case bypass_api_service is ignored, but the warning is logged, which is rather annoying when patronictl is used.
Since the bypass_api_service is most useful for Patroni, we will simply ignore it when patronictl is used.
The Python SSL library allows for the inclusion of a password in its "load_cert_chain" function when setting up a SSLContext[1].
This allows for loading an encrypted key file in PEM representation to be loaded into the certificate chain.
This commit adds the optional "keyfile_password" parameter to the REST API block of configuration so that Patroni can load in encrypted private keys when establishing its TLS socket.
This also adds the corollary "PATRONI_RESTAPI_KEYFILE_PASSWORD" environmental variable, which has the same effect.
[1] https://docs.python.org/3/library/ssl.html#ssl.SSLContext.load_cert_chain
It could happen that one of etcd servers is not accessible on Patroni start.
In this case Patroni was trying to perform authentication and exiting, while it should exit only if Etcd explicitly responded with the `AuthFailed` error.
Close https://github.com/zalando/patroni/issues/1805
the good old python-consul is not maintained for a few years in a row, therefore someone forked under a different name, but package files are installed into the same location as for the old.
The API of both modules is mostly compatible therefore it wasn't hard to add the support of both modules in Patroni.
Taking into account that python-consul is not a direct requirement for Patroni, but extra, now the end-user has a choice what to install.
Close https://github.com/zalando/patroni/issues/1810
1. Fix flaky behave tests with zookeeper. First, install/start binaries (zookeeper/localkube) and only after that continue with installing requirements and running behave. Previously zookeeper didn't had enough time to start and tests sometimes were failing.
2. Fix flaky raft tests. Despite observations of MacOS slowness, for some unknown reason the delete test with a very small timeout was not timing out, but succeeding, causing unit-tests to fail. The solution - do not rely on the actual timeout, but mock it.
1. If the superuser name is different from postgres, the pg_rewind in the standby cluster was failing because the connection string didn't contain the database name.
2. Provide output if the single-user mode recovery failed.
Close https://github.com/zalando/patroni/pull/1736
Python 3.8 changed the way how exceptions raised from the Thread.run() method are handled.
It resulted in unit-tests showing a couple of warnings. They are not important and we just silence them.
When the__del__() method is executed the python interpreter already unloaded some of the modules that are still used down the http.clear() method.
The only we could do in this case - silence some exceptions like ReferenceError, TypeError, AttributeError.
Close https://github.com/zalando/patroni/issues/1785