6 Commits

Author SHA1 Message Date
Alexander Kukushkin
b901e62ad0 Enhanced checks of replica logical slots safety (#2285)
The logical slot on a replica is safe to use when the physical replica
slot on the primary:
1. has a nonzero/non-null `catalog_xmin`
2. has a `catalog_xmin` that is not newer (greater) than the  `catalog_xmin` of any slot on the standby
3. the `catalog_xmin` is known to overtake `catalog_xmin` of logical slots on the primary observed during `1`

In case if `1` doesn't take place, Patroni will run an additional check whether the `hot_standby_feedback` is actually in effect and shows the warning in case it is not.
2022-05-10 12:24:47 +02:00
Alexander Kukushkin
5f6197aaad Don't copy logical slot if there is mismatch with the config (#2274)
A couple of times we have seen in the wild that the database for the permanent logical slots was changed in the Patroni config.

It resulted in the below situation.
On the primary:
1. The slot must be dropped before creating it in a different DB.
2. Patroni fails to drop it because the slot is in use.

Replica:
1. Patroni notice that the slot exists in the wrong DB and successfully dropping it.
2. Patroni copying the existing slot from the primary by its name with Postgres restart.

And the loop repeats while the "wrong" slot exists on the primary.

Basically, replicas are continuously restarting, which badly affects availability.

In order to solve the problem, we will perform additional checks while copying replication slot files from the primary and discard them if `slot_type`, `database`, or `plugin` don't match our expectations.
2022-04-14 12:10:37 +02:00
Alexander Kukushkin
bf354aeebd Compatibility with legacy psycopg2 (#2158)
For example, psycopg2 installed from Ubuntu 18.04 packages doesn't have `UndefinedFile` exception yet.
2022-01-06 10:14:50 +01:00
Alexander Kukushkin
fce889cd04 Compatibility with psycopg 3.0 (#2088)
By default `psycopg2` is preferred. The `psycopg>=3.0` will be used only if `psycopg2` is not available or its version is too old.
2021-11-19 14:32:54 +01:00
Alexander Kukushkin
2d504a4f0a Copy the logical slot over if advance failed due to the missing WAL (#1946)
It could happen that the replica for some reason is missing the WAL file required by the replication slot.
The nature of this phenomenon is a bit unclear, it might be that the WAL was recycled short before we copied the slot file, but, we still need a solution to this problem. If the `pg_replication_slot_advance()` fails with the `UndefinedFile` exception (requested WAL segment pg_wal/... has already been removed), the logical slot on the replica must be recreated.
2021-06-02 16:57:13 +02:00
Alexander Kukushkin
c7173aadd7 Failover logical slots (#1820)
Effectively, this PR consists of a few changes:

1. The easy part:
  In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots.
  In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster.

2. The tricky part:
  On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()`  function.
  When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward.

3. Additional requirements:
  In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots.

4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use.

Compatibility.
The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary.

The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed.

Close: https://github.com/zalando/patroni/issues/1749
2021-03-25 16:18:23 +01:00