Fix the following errors in section "*PostgreSQL parameters controlled by Patroni*":
1. Supplement the missing parameter `wal_log_hints`.
2. In fact, these controlled parameters are written into `postgresql.conf`.
3. In fact, these controlled parameters are passed as a list of arguments to the `postgres` (not `pg_ctl start`).
4. Add a note about that `wal_keep_segments` and `wal_keep_size` are not passed to the `postgres`.
Allow to define labels that will be assigned to a postgres instance pod when in 'initializing new cluster', 'running custom bootstrap script', 'starting after custom bootstrap', or 'creating replica' state
This allows to set whether a particular permanent replication slot should always be created ('cluster_type=any', the default), or just on a primary ('cluster_type=primary') or standby ('cluster_type=standby') cluster, respectively.
python-consul is unmaintained for a long time and py-consul is an official replacement.
However, we still keep backward compatibility with python-consul.
Close: #3189
This commit is a breaking change:
1. `role` in DCS is written as "primary" instead of "master".
2. `role` in REST API responses is also written as "primary".
3. REST API no longer accepts role=master in requests (for example switchover/failover/restart endpoints).
4. `/metrics` REST API endpoint will no longer report `patroni_master`.
5. `patronictl` no longer accepts `--master` argument.
6. `no_master` option in declarative configuration of custom replica creation methods is no longer treated as a special option, please use `no_leader` instead.
7. `patroni_wale_restore` doesn't accept `--no_master` anymore.
8. `patroni_barman` doesn't accept `--role=master` anymore.
9. callback scripts will be executed with role=primary instead of role=master
10. On Kubernetes Patroni by default will set role label to primary. In case if you want to keep old behavior and avoid downtime or lengthy complex migrations you can configure `kubernetes.leader_label_value` and `kubernetes.standby_leader_label_value` to `master`.
However, a few exceptions regarding master are still in place:
1. `GET /master` REST API endpoint will continue to work.
2. `master_start_timeout` and `master_stop_timeout` in global configuration are still accepted.
3. `master` tag is still preserved in Consul services in addition to `primary`.
Rationale for these exceptions: DBA doesn't always 100% control the infrastructure and can't adjust the configuration.
- don't register secondaries with `noloadbalance` tag.
- mention in the documentation that secondaries are also registered in `pg_dist_node`.
- update docker/kubernetes README files to include examples with secondaries being registered in `pg_dist_node`.
Current problem of Patroni that strikes many people is that it removes replication slot for member which key is expired from DCS. As a result, when the replica comes back from a scheduled maintenance WAL segments could be already absent, and it can't continue streaming without pulling files from archive.
With PostgreSQL 16 and newer we get another problem: logical slot on a standby node could be invalidated if physical replication slot on the primary was removed (and `pg_catalog` vacuumed).
The most problematic environment is Kubernetes, where slot is removed nearly instantly when member Pod is deleted.
So far, one of the recommended solutions was to configure permanent physical slots with names that match member names to avoid removal of replication slots. It works, but depending on environment might be non-trivial to implement (when for example members may change their names).
This PR implements support of `member_slots_ttl` global configuration parameter, that controls for how long member replication slots should be kept when the member key is absent. Default value is set to `30min`.
The feature is supported only starting from PostgreSQL 11 and newer, because we want to retain slots not only on the leader node, but on all nodes that could potentially become the new leader, and they should be moved forward using `pg_replication_slot_advance()` function.
One could disable feature and get back to the old behavior by setting `member_slots_ttl` to `0`.
To enable quorum commit:
```diff
$ patronictl.py edit-config
---
+++
@@ -5,3 +5,4 @@
use_pg_rewind: true
retry_timeout: 10
ttl: 30
+synchronous_mode: quorum
Apply these changes? [y/N]: y
Configuration changed
```
By default Patroni will use `ANY 1(list,of,stanbys)` in `synchronous_standby_names`. That is, only one node out of listed replicas will be used for quorum.
If you want to increase the number of quorum nodes it is possible to do it with:
```diff
$ patronictl edit-config
---
+++
@@ -6,3 +6,4 @@
retry_timeout: 10
synchronous_mode: quorum
ttl: 30
+synchronous_node_count: 2
Apply these changes? [y/N]: y
Configuration changed
```
Good old `synchronous_mode: on` is still supported.
Close https://github.com/patroni/patroni/issues/664
Close https://github.com/zalando/patroni/pull/672
There was one oversight of #2781 - to influence external tools that Patroni could execute, we set global `umask` value based on permissions of the $PGDATA directory. As a result, it also influenced permissions of log files created by Patroni.
To address the problem we implement two measures:
1. Make `log.mode` configurable.
2. If the value is not set - calculate permissions from the original value of the umask setting.
* Update release notes
* Bump version
* Bump pyright version and solve reported issues
---------
Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
Besides that:
1. fix problem with is_physical_slot() methods, it was returning false positives for logical slots.
2. Fix a little issue with replicatefrom docs.
Close https://github.com/zalando/patroni/issues/3068
* Make sure tests are not making external calls
and pass url with scheme to urllib3 to avoid warnings
* Make sure unit tests not rely on filesystem state
* Bump pyright and "solve" reported "issues"
Most of them are related to partially unknown types of values from empty
dict or list. To solve it for the empty dict we use `EMPTY_DICT` object of
newly introduced `_FrozenDict` class.
* Improve unit-tests code coverage
* Add release notes for 3.3.0
* Bump version
* Fix pyinstaller spec file
* python 3.6 compatibility
---------
Co-authored-by: Polina Bungina <27892524+hughcapet@users.noreply.github.com>
Add support for ``nostream`` tag. If set to ``true`` the node will not use replication protocol to stream WAL. It will rely instead on archive recovery (if ``restore_command`` is configured) and ``pg_wal``/``pg_xlog`` polling. It also disables copying and synchronization of permanent logical replication slots on the node itself and all its cascading replicas. Setting this tag on primary node has no effect.
We currently have a script named `patroni_barman_recover` in Patroni, which is intended to be used as a custom bootstrap method, or as a custom replica creation method.
Now there is need of one more Barman related script in Patroni to handle switching of config models in Barman upon `on_role_change` events.
However, instead of creating another Patroni script, let's say `patroni_barman_config_switch`, and duplicating a lot of logic in the code, we decided to refactor the code so:
* Instead of two separate scripts (`patroni_barman_recover` and `patroni_barman_config_switch`), we have a single script (`patroni_barman`) with 2 sub-commands (`recover` and `config-switch`)
This is the overview of changes that have been performed:
* File `patroni.scripts.barman_recover` has been removed, and its logic has been split into a few files:
* `patroni.scripts.barman.cli`: handles the entrypoint of the new `patroni_barman` command, exposing the argument parser and calling the appropriate functions depending on the sub-command
* `patroni.scripts.barman.utils`: implements utilitary enums, functions and classes wich can be used by `cli` and by sub-commands implementation:
* retry mechanism
* logging set up
* communication with pg-backup-api
* `patroni.scripts.barman.recover`: implements the `recover` sub-command only
* File `patroni.tests.test_barman_recover` has been renamed as `patroni.tests.test_barman`
* File `patroni.scripts.barman.config_switch` was created to implement the `config-switch` sub-command only
* `setup.py` has been changed so it generates a `patroni_barman` application instead of `patroni_barman_recover`
* Docs and unit tests were updated accordingly
References: PAT-154.
A contrib script, which can be used as a custom bootstrap method, or as a custom create replica method.
The script communicates with the pg-backup-api on the Barman node so Patroni is able to restore a Barman backup remotely.
The `--help` option of the script, along with the script docstring, should provide some context on how to use fill its parameters.
Patroni docs were updated accordingly to share examples about how to configure the script as a custom bootstrap method, or as a custom create replica method.
References: PAT-216.