- monkey patch `jsonlogger.RESERVED_ATTRS` to hide new attribute in `LogRecord`
- "silence" warning about `atetime.datetime.utcnow()`
- run some tests with python 3.12
- bump actions versions to silence complains about Node version
- fix PATH to Postgres binaries on MacOS
* All dockerfiles to use PG16 by default
* PGVERSION env in the test pipelines to 16.1-1 by default
* 11->14 in the dcs-pg mapping for test pipelines
* Code comments fixes
- the new MacOS doesn't play well with old go binaries (bump etcd)
- use brew to install Postgres and expect (unbuffer, to make behave output colorful) and use the latest version
- upload failed logs instead of grepping them to stdout
The only advantage of localkube was being a low weight. Anything else started creating only problems:
1. It is not properly maintained for many years.
2. It effectively worked only on Linux, but stopped on modern version due to changes in iptables.
Instead, we will use widely adoped tools like kind or k3s. The "kind-kind" is the default K8s context (see ~/.kube/config), but it could be overriden using `PATRONI_KUBERNETES_CONTEXT` environment variable. When executed from GH actions the context is set to k3d-k3s-default, because K3s is much faster to start.
PostgreSQL 14 changed the behavior of replicas when certain parameters (like for example `max_connections`) are changed (increased): https://github.com/postgres/postgres/commit/15251c0a.
Instead of immediately exiting Postgres 14 pauses replication and waits for actions from the operator.
Since the `pg_is_wal_replay_paused()` returning `True` is the only indicator of such a change, Patroni on the replica will call the `pg_wal_replay_resume()`, which would cause either continue replication or shutdown (like previously).
So far Patroni was never calling `pg_wal_replay_resume()` on its own, therefore, to remain backward compatible it will call it only for PostgreSQL 14+.
Effectively, this PR consists of a few changes:
1. The easy part:
In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots.
In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster.
2. The tricky part:
On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()` function.
When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward.
3. Additional requirements:
In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots.
4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use.
Compatibility.
The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary.
The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed.
Close: https://github.com/zalando/patroni/issues/1749