- Fixed issues with has_permanent_slots() method. It didn't took into account the case of permanent physical slots for members, falsely concluding that there are no permanent slots.
- Write to the status key only LSNs for permanent slots (not just for slots that exist on the primary).
- Include pg_current_wal_flush_lsn() to slots feedback, so that slots on standby nodes could be advanced
- Improved behave tests:
- Verify that permanent slots are properly created on standby nodes
- Verify that permanent slots are properly advanced, including DCS failsafe mode
- Verify that only permanent slots are written to the `/status`
Create permanent physical replication slots on standby nodes and use `pg_replication_slot_advance()` function to move them forward.
The `restart_lsn` is advanced based on values stored in the `/status` key by the primary node.
When slot is created on a replica it could be ahead the same slot on the primary and therefore there is some period of time when it doesn't protect WAL files from being recycled.
Effectively, this PR consists of a few changes:
1. The easy part:
In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots.
In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster.
2. The tricky part:
On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()` function.
When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward.
3. Additional requirements:
In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots.
4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use.
Compatibility.
The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary.
The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed.
Close: https://github.com/zalando/patroni/issues/1749
When there is no config key in DCS Patroni shouldn't try accessing ignore_slots, otherwise an exception is raised.
In addition to that implement missing unit-tests and fix linting issues in behave tests.
There are sometimes good reasons to manage replication slots externally
to Patroni. For example, a consumer may wish to manage its own slots (so
that it can more easily track when a failover has a occurred and whether
it is ahead of or behind the WAL position on the new primary).
Additionally tooling like pglogical actually replicates slots to all
replicas so that the current position can be maintained on failover
targets (this also aids consumers by supplying primitives so that they
can verify data hasn't been lost or a split brain occurred relative to
the physical cluster).
To support these use cases this new feature allows configuring Patroni
to entirely ignore sets of slots specified by any subset of name,
database, slot type, and plugin.