9 Commits

Author SHA1 Message Date
Alexander Kukushkin
4215565cb4 Rearrange tests (#2146)
- remove codacy steps: they removed legacy organizations and there seems to be no easy way of installing codacy app to the Zalando GH.
- Don't run behave on MacOS: recently worker became way to slow
- Disable behave for combination of kubernetes and python 2.7
- Remove python 3.5 (it will be removed by GH from workers in January) and add 3.10
- Run behave with 3.6 and 3.9 instead of 3.5 and 3.8
2021-12-21 09:36:22 +01:00
Alexander Kukushkin
fce889cd04 Compatibility with psycopg 3.0 (#2088)
By default `psycopg2` is preferred. The `psycopg>=3.0` will be used only if `psycopg2` is not available or its version is too old.
2021-11-19 14:32:54 +01:00
Alexander Kukushkin
f3420e2db5 Compatibility with PostgreSQL 14 (#1926)
PostgreSQL 14 changed the behavior of replicas when certain parameters (like for example `max_connections`) are changed (increased): https://github.com/postgres/postgres/commit/15251c0a.
Instead of immediately exiting Postgres 14 pauses replication and waits for actions from the operator.

Since the `pg_is_wal_replay_paused()` returning `True` is the only indicator of such a change, Patroni on the replica will call the `pg_wal_replay_resume()`, which would cause either continue replication or shutdown (like previously).

So far Patroni was never calling `pg_wal_replay_resume()` on its own, therefore, to remain backward compatible it will call it only for PostgreSQL 14+.
2021-06-25 13:41:45 +02:00
Alexander Kukushkin
99626a07f2 Fix issues with raft traffic encryption (#1919)
and run raft behave tests with encryption enabled.

Using the new `pysyncobj` release allowed us to get rid of a lot of hacks with accessing private properties and methods of the parent class and reduce the size of the `raft.py`.

Close https://github.com/zalando/patroni/issues/1746
2021-04-30 11:28:41 +02:00
Alexander Kukushkin
c7173aadd7 Failover logical slots (#1820)
Effectively, this PR consists of a few changes:

1. The easy part:
  In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots.
  In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster.

2. The tricky part:
  On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()`  function.
  When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward.

3. Additional requirements:
  In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots.

4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use.

Compatibility.
The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary.

The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed.

Close: https://github.com/zalando/patroni/issues/1749
2021-03-25 16:18:23 +01:00
Alexander Kukushkin
b698df374f Fix build (#1843)
run apt-get update before installing packages
2021-02-16 09:34:36 +01:00
Alexander Kukushkin
4a8c4cfc53 Make tests more reliable (#1808)
1.  Fix flaky behave tests with zookeeper. First, install/start binaries (zookeeper/localkube) and only after that continue with installing requirements and running behave. Previously zookeeper didn't had enough time to start and tests sometimes were failing.
2.  Fix flaky raft tests. Despite observations of MacOS slowness, for some unknown reason the delete test with a very small timeout was not timing out, but succeeding, causing unit-tests to fail. The solution - do not rely on the actual timeout, but mock it.
2021-01-15 14:29:55 +01:00
Alexander Kukushkin
e3ef9ac306 Fix issues with zookeeper (#1792)
1. The `ttl` was incorrectly returned 1000 times higher then it should
2. The `watch()` method must return True if the parent method returned True. Not doing so resulted in the incorrect calculation of sleep time.
3. Move mock of exhibitor api to the features/environment.py. It simplifies testing with behave.
2020-12-14 15:12:57 +01:00
Alexander Kukushkin
1530ed0b9c Switch to GH actions (#1778)
it allows up to 20 parallel builds
2020-12-04 21:52:34 +01:00