* Update release notes
* Bump version
* Bump pyright version and solve reported issues
---------
Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
- monkey patch `jsonlogger.RESERVED_ATTRS` to hide new attribute in `LogRecord`
- "silence" warning about `atetime.datetime.utcnow()`
- run some tests with python 3.12
- bump actions versions to silence complains about Node version
- fix PATH to Postgres binaries on MacOS
* Make sure tests are not making external calls
and pass url with scheme to urllib3 to avoid warnings
* Make sure unit tests not rely on filesystem state
* Bump pyright and "solve" reported "issues"
Most of them are related to partially unknown types of values from empty
dict or list. To solve it for the empty dict we use `EMPTY_DICT` object of
newly introduced `_FrozenDict` class.
* Improve unit-tests code coverage
* Add release notes for 3.3.0
* Bump version
* Fix pyinstaller spec file
* python 3.6 compatibility
---------
Co-authored-by: Polina Bungina <27892524+hughcapet@users.noreply.github.com>
* All dockerfiles to use PG16 by default
* PGVERSION env in the test pipelines to 16.1-1 by default
* 11->14 in the dcs-pg mapping for test pipelines
* Code comments fixes
* remove check_psycopg() call from the setup.py, when installing from wheel it doesn't work anyway.
* call check_psycopg() function before process_arguments(), because the last one is trying to import psycopg and fails with the stacktrace, while the first one shows a nice human-readable error message.
* add psycopg2, psycopg2-binary, and psycopg3 extras, that will install psycopg2>=2.5.4, psycopg2-binary, or psycopg[binary]>=3.0.0 modules respectively.
* move check_psycopg() function to the __main__.py.
* introduce the new extra called `all`, it will allow to install all dependencies at once (except psycopg related).
* use the `build` module in order to create sdist bdist_wheel packages.
* update the documentation regarding psycopg and extras (dependencies).
Expanding on the addition of docstrings in code, this adds python module API docs to sphinx documentation.
A developer can preview what this might look like by running this locally:
```
tox -m docs
```
The option `-W` is added to the tox env so that warning messages are considered errors.
Adds doc generation using the above method to the test GitHub workflow to catch documentation problems on PRs.
Some docstrings have been reformatted and fixed to satisfy errors generated with the above setup.
They somehow messed up with type hints what made pyright unhappy.
To solve it we explicitly pass the Group class to the group() decorator.
In addition to that bump pyright version.
* bump version
* update release notes
* removed 2.7, 3.4, 3.5, and 3.6 from supported versions in setup.py
* switched GH actions back to ubuntu-latest, removed tests with 2.7 and 3.6, and added 3.11
* some little fixes in Citus documentation and behave tests
If PR is open from the external GH repo secrets are not set due to security reasons. It makes codacy coverage report to fail.
Co-authored-by: Polina Bungina <bungina@gmail.com>
- the new MacOS doesn't play well with old go binaries (bump etcd)
- use brew to install Postgres and expect (unbuffer, to make behave output colorful) and use the latest version
- upload failed logs instead of grepping them to stdout
The only advantage of localkube was being a low weight. Anything else started creating only problems:
1. It is not properly maintained for many years.
2. It effectively worked only on Linux, but stopped on modern version due to changes in iptables.
Instead, we will use widely adoped tools like kind or k3s. The "kind-kind" is the default K8s context (see ~/.kube/config), but it could be overriden using `PATRONI_KUBERNETES_CONTEXT` environment variable. When executed from GH actions the context is set to k3d-k3s-default, because K3s is much faster to start.
* bump version
* update release notes
* run some behave tests on v15
* automate release process by building/pushing packages on tag creation and release publication
* Bump actions/checkout and actions/setup-python versions
* Install wheel to silence some warnings
Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
- remove codacy steps: they removed legacy organizations and there seems to be no easy way of installing codacy app to the Zalando GH.
- Don't run behave on MacOS: recently worker became way to slow
- Disable behave for combination of kubernetes and python 2.7
- Remove python 3.5 (it will be removed by GH from workers in January) and add 3.10
- Run behave with 3.6 and 3.9 instead of 3.5 and 3.8
PostgreSQL 14 changed the behavior of replicas when certain parameters (like for example `max_connections`) are changed (increased): https://github.com/postgres/postgres/commit/15251c0a.
Instead of immediately exiting Postgres 14 pauses replication and waits for actions from the operator.
Since the `pg_is_wal_replay_paused()` returning `True` is the only indicator of such a change, Patroni on the replica will call the `pg_wal_replay_resume()`, which would cause either continue replication or shutdown (like previously).
So far Patroni was never calling `pg_wal_replay_resume()` on its own, therefore, to remain backward compatible it will call it only for PostgreSQL 14+.
and run raft behave tests with encryption enabled.
Using the new `pysyncobj` release allowed us to get rid of a lot of hacks with accessing private properties and methods of the parent class and reduce the size of the `raft.py`.
Close https://github.com/zalando/patroni/issues/1746
Despite all recovery parameters became GUCs in PostgreSQL 12, there are very good reasons to keep them separated in the Patroni internals.
While implementing PostgreSQL parameters validation in #1674 one little oversight occurred. The parameters validation happens before the recovery parameters are skipped from the list, which produces a false warning.
Close https://github.com/zalando/patroni/issues/1907
Effectively, this PR consists of a few changes:
1. The easy part:
In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots.
In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster.
2. The tricky part:
On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()` function.
When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward.
3. Additional requirements:
In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots.
4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use.
Compatibility.
The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary.
The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed.
Close: https://github.com/zalando/patroni/issues/1749
1. Fix flaky behave tests with zookeeper. First, install/start binaries (zookeeper/localkube) and only after that continue with installing requirements and running behave. Previously zookeeper didn't had enough time to start and tests sometimes were failing.
2. Fix flaky raft tests. Despite observations of MacOS slowness, for some unknown reason the delete test with a very small timeout was not timing out, but succeeding, causing unit-tests to fail. The solution - do not rely on the actual timeout, but mock it.
1. The `ttl` was incorrectly returned 1000 times higher then it should
2. The `watch()` method must return True if the parent method returned True. Not doing so resulted in the incorrect calculation of sleep time.
3. Move mock of exhibitor api to the features/environment.py. It simplifies testing with behave.