This commit is a breaking change:
1. `role` in DCS is written as "primary" instead of "master".
2. `role` in REST API responses is also written as "primary".
3. REST API no longer accepts role=master in requests (for example switchover/failover/restart endpoints).
4. `/metrics` REST API endpoint will no longer report `patroni_master`.
5. `patronictl` no longer accepts `--master` argument.
6. `no_master` option in declarative configuration of custom replica creation methods is no longer treated as a special option, please use `no_leader` instead.
7. `patroni_wale_restore` doesn't accept `--no_master` anymore.
8. `patroni_barman` doesn't accept `--role=master` anymore.
9. callback scripts will be executed with role=primary instead of role=master
10. On Kubernetes Patroni by default will set role label to primary. In case if you want to keep old behavior and avoid downtime or lengthy complex migrations you can configure `kubernetes.leader_label_value` and `kubernetes.standby_leader_label_value` to `master`.
However, a few exceptions regarding master are still in place:
1. `GET /master` REST API endpoint will continue to work.
2. `master_start_timeout` and `master_stop_timeout` in global configuration are still accepted.
3. `master` tag is still preserved in Consul services in addition to `primary`.
Rationale for these exceptions: DBA doesn't always 100% control the infrastructure and can't adjust the configuration.
There are two cases when libpq may search for "localhost":
1. When host in the connection string is not specified and it is using default socket directory path.
2. When specified host matches default socket directory path.
Since we don't know the value of default socket directory path and effectively can't detect the case 2, the best strategy to mitigate the problem would be to add "localhost" if we detected a "host" be a unix socket directory (it starts with '/' character).
Close#3134
Current problem of Patroni that strikes many people is that it removes replication slot for member which key is expired from DCS. As a result, when the replica comes back from a scheduled maintenance WAL segments could be already absent, and it can't continue streaming without pulling files from archive.
With PostgreSQL 16 and newer we get another problem: logical slot on a standby node could be invalidated if physical replication slot on the primary was removed (and `pg_catalog` vacuumed).
The most problematic environment is Kubernetes, where slot is removed nearly instantly when member Pod is deleted.
So far, one of the recommended solutions was to configure permanent physical slots with names that match member names to avoid removal of replication slots. It works, but depending on environment might be non-trivial to implement (when for example members may change their names).
This PR implements support of `member_slots_ttl` global configuration parameter, that controls for how long member replication slots should be kept when the member key is absent. Default value is set to `30min`.
The feature is supported only starting from PostgreSQL 11 and newer, because we want to retain slots not only on the leader node, but on all nodes that could potentially become the new leader, and they should be moved forward using `pg_replication_slot_advance()` function.
One could disable feature and get back to the old behavior by setting `member_slots_ttl` to `0`.
Due to postgres --describe-config not showing GUCs defined as GUC_NO_SHOW_ALL | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE, Patroni was always ignoring some GUCs that a user might want to have configured with non-default values.
- remove postgres --describe-config validation.
- define minor versions for availability bounds of some back-patched GUCs
1. All nodes with role == 'replica' and state == 'running' are are registered. In case is state isn't running the node is removed.
2. In case of failover/switchover we always first update the primary
3. When switching to a registered secondary we call citus_update_node() three times: rename primary to primary-demoted, put the primary name to a promoted secondary row and put the promoted secondary name to the primary row
State transitions are produced by the transition() method. First of all the method makes sure that the actual primary is registered in the metadata. In case if for a given group the primary didn't change, the method registers new secondaries and removes secondaries that are gone. It prefers to use citus_update_node() UDF to replace gone secondaries with added.
Communication protocol between primary nodes remains the same and all old features work without any changes.
To enable quorum commit:
```diff
$ patronictl.py edit-config
---
+++
@@ -5,3 +5,4 @@
use_pg_rewind: true
retry_timeout: 10
ttl: 30
+synchronous_mode: quorum
Apply these changes? [y/N]: y
Configuration changed
```
By default Patroni will use `ANY 1(list,of,stanbys)` in `synchronous_standby_names`. That is, only one node out of listed replicas will be used for quorum.
If you want to increase the number of quorum nodes it is possible to do it with:
```diff
$ patronictl edit-config
---
+++
@@ -6,3 +6,4 @@
retry_timeout: 10
synchronous_mode: quorum
ttl: 30
+synchronous_node_count: 2
Apply these changes? [y/N]: y
Configuration changed
```
Good old `synchronous_mode: on` is still supported.
Close https://github.com/patroni/patroni/issues/664
Close https://github.com/zalando/patroni/pull/672
There was one oversight of #2781 - to influence external tools that Patroni could execute, we set global `umask` value based on permissions of the $PGDATA directory. As a result, it also influenced permissions of log files created by Patroni.
To address the problem we implement two measures:
1. Make `log.mode` configurable.
2. If the value is not set - calculate permissions from the original value of the umask setting.
The last one is only available since psycopg 2.8, while the first one since 2.0.8.
For backward compatibility monkeypatch connection object returned by psycopg3.
Close https://github.com/patroni/patroni/issues/3116
It could happen that there is "something" streaming from the current primary node with `application_name` that matches name of the current primary, for instance due to a faulty configuration. When processing `pg_stat_replication` we only checked that the `application_name` matches with the name one of the member nodes, but we forgot to exclude our own name.
As a result there were following side-effects:
1. The current primary could be declared as a synchronous node.
2. As a result of [1] it wasn't possible to do a switchover.
3. During shutdown the current primary was waiting for itself to release it from synchronous nodes.
Close#3111
Pass the `Cluster` object instead of `Leader`.
It will help to implement a new feature, "Configurable retention of replication slots for cluster members".
Besides that fix a couple of issues with docstrings.
Lets consider a following replication setup:
```
primary->standby1->standby2(replicatefrom: standby1)
```
In this case the `primary` will not create a physical replication slot for standby2, because it is streaming from the `standby1`.
Things will look differently if we have the following dynamic configuration:
```yaml
slots:
primary:
type: physical
standby1:
type: physical
standby2:
type: physical
```
In this case `primary` will also have `standby2` physical replication slot, which periodically must be advanced. So far it was working by taking value of `xlog_location` from the `/members/standby2` key in DCS.
But, when DCS is down and failsafe mode is activate, the `standby2` physical slot on the `primary` will not not be moved, because there was not way to get the latest value of `xlog_location`.
This PR is addressing the problem by making replica nodes to return their `xlog_location` as `lsn` header in the response on `POST /failsafe` REST API request. The current primary will use these values to advance replication slots for nodes with `replicatefrom` tag.
The `Status` class was introduced in #2853, but we kept old properties in the `Cluster` object in order to have fewer changes in the rest of the code.
This PR is finishing the refactoring.
The following adjustments were made:
- Introduced `Status.is_empty()` method, which is used in the `Cluster.is_empty()` instead of checking actual values to simplify introduction of further fields to the Status object.
- Removed `Cluster.last_lsn` property
- Changed `Cluster.slots` property to always return dict and perform sanity checks on values.
Besides that, this PR addressing a couple of problems:
- the `AbstractDCS.get_cluster()` method some properties without holding a lock on `_cluster_thread_lock`.
- `Cluster.__permanent_slots` property was setting 'lsn' from all cluster members, while it should be doing that only for members with `replicatefrom` tag.
The `SlotsAdvanceThread` is asynchronously calling
pg_replication_slot_advance() and providing feedback about logical
replication slots that must be reinitialized by copying from the
primary. That is, the parent thread will learn about slots to be copied
only when scheduling the next pg_replication_slot_advance() call.
As a result it was possible situation when logical slot was copied with
PostgreSQL restart more than once.
To improve it we implement following measures:
1. do not schedule slot sync if it is in the list to be copied
2. remove to be copued slots from the `self._scheduled` structure
3. clean state of `SlotsAdvanceThread` when slot files are copied.
Since PG16 logical replication slots on a standby can be invalidated due
to horizon. In this case, pg_replication_slot_advance() will fail with
ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE. We should force slot copy
(i.e., recreation) of such slots.
Since `synchronous_mode` was introduced to Patroni, the plain Postgres synchronous replication has been no longer working.
The issue occurs because `process_sync_replication` always resets the value of `synchronous_standby_names` in Postgres when `synchronous_mode` is disabled in Patroni.
This commit fixes that issue by setting the value of `synchronous_standby_names` as configured by the user, if that is the case, when `synchronous_mode` is disabled.
Closes#3093
References: PAT-254.
* Update release notes
* Bump version
* Bump pyright version and solve reported issues
---------
Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
the get_address() function was called when config_generator.py is loaded because it was required to initialize `_HOSTNAME` and `_IP` properties of `AbstractConfigGenerator` and in some cases making unit tests very slow.
`synchronous_standby_names` and synchronous replication only work on a real primary node and in case of cascading replication simply ignored by Postgres.
This fact was already addressed by `global_config.is_synchronous_mode`, but in case if in a standby cluster the `/sync` key in DCS is not empty, `patronictl list` and `GET /cluster` were falsely reporting some nodes as synchronous because this check was missing.
Close https://github.com/zalando/patroni/issues/3078
- updated list of GUCs
- updated regex for filtering backend processes by name
- `primary_conninfo` will contain `dbname` parameter
The last one is required for synchronizing logical replication slots by slotsync worker and doesn't create problems on older versions.
Besides that:
1. fix problem with is_physical_slot() methods, it was returning false positives for logical slots.
2. Fix a little issue with replicatefrom docs.
Close https://github.com/zalando/patroni/issues/3068
- monkey patch `jsonlogger.RESERVED_ATTRS` to hide new attribute in `LogRecord`
- "silence" warning about `atetime.datetime.utcnow()`
- run some tests with python 3.12
- bump actions versions to silence complains about Node version
- fix PATH to Postgres binaries on MacOS
* Make sure tests are not making external calls
and pass url with scheme to urllib3 to avoid warnings
* Make sure unit tests not rely on filesystem state
* Bump pyright and "solve" reported "issues"
Most of them are related to partially unknown types of values from empty
dict or list. To solve it for the empty dict we use `EMPTY_DICT` object of
newly introduced `_FrozenDict` class.
* Improve unit-tests code coverage
* Add release notes for 3.3.0
* Bump version
* Fix pyinstaller spec file
* python 3.6 compatibility
---------
Co-authored-by: Polina Bungina <27892524+hughcapet@users.noreply.github.com>
When packaged into pyz (zip file), resources are not directly available on filesystem and therefore we can't always rely on os.listdir() and open() to enumerate and read them.
We are going to use importlib.resources() to solve this problem, except python 3.8 and older, where there is no function (files()) available to enumerate resources. For legacy (3.8 actually becomes EOL in October 2024) python versions we are going to use os.listdir() as a fallback.
Close#3017
Fix the oversight of 193c73f
We need to set global config from the local cache if cluster.config is not initialized.
If there is nothing written into the DCS (yet), we need the setup info for the decision making (e.g., if it is a standby cluster)
Add support for ``nostream`` tag. If set to ``true`` the node will not use replication protocol to stream WAL. It will rely instead on archive recovery (if ``restore_command`` is configured) and ``pg_wal``/``pg_xlog`` polling. It also disables copying and synchronization of permanent logical replication slots on the node itself and all its cascading replicas. Setting this tag on primary node has no effect.
We currently have a script named `patroni_barman_recover` in Patroni, which is intended to be used as a custom bootstrap method, or as a custom replica creation method.
Now there is need of one more Barman related script in Patroni to handle switching of config models in Barman upon `on_role_change` events.
However, instead of creating another Patroni script, let's say `patroni_barman_config_switch`, and duplicating a lot of logic in the code, we decided to refactor the code so:
* Instead of two separate scripts (`patroni_barman_recover` and `patroni_barman_config_switch`), we have a single script (`patroni_barman`) with 2 sub-commands (`recover` and `config-switch`)
This is the overview of changes that have been performed:
* File `patroni.scripts.barman_recover` has been removed, and its logic has been split into a few files:
* `patroni.scripts.barman.cli`: handles the entrypoint of the new `patroni_barman` command, exposing the argument parser and calling the appropriate functions depending on the sub-command
* `patroni.scripts.barman.utils`: implements utilitary enums, functions and classes wich can be used by `cli` and by sub-commands implementation:
* retry mechanism
* logging set up
* communication with pg-backup-api
* `patroni.scripts.barman.recover`: implements the `recover` sub-command only
* File `patroni.tests.test_barman_recover` has been renamed as `patroni.tests.test_barman`
* File `patroni.scripts.barman.config_switch` was created to implement the `config-switch` sub-command only
* `setup.py` has been changed so it generates a `patroni_barman` application instead of `patroni_barman_recover`
* Docs and unit tests were updated accordingly
References: PAT-154.
But do it only in case if we didn't authenticate right before executing a request. Previously retries only happened when the caller was executed with `Retry.__call__()`, which is not the case for methods like `set_failover_value()` or `set_config_value()`. Also, it seems that existing watchers aren't affected, therefore we will not restart them after reauthentication.
In addition to that fix issues with `Retry.ensure_deadline(0)`:
1. the return value was ignored
2. we don't have to set `Retry.deadline` attr, it is not used anywhere
Close https://github.com/zalando/patroni/issues/3023
obey the following 5 meanings of terminology _cluster_ in Patroni.
1. PostgreSQL cluster: a cluster of postgresql instances which have the same system identifier.
2. MPP cluster: a cluster of PostgreSQL clusters that one of them acts as Coodinator and others act as workers.
3. Coordinator cluster: a PostgreSQL cluster which act the role of 'coordinator' within a MPP cluster.
4. Worker cluster: a PostgreSQL cluster which act the role 'worker' within a MPP cluster.
5. Patroni cluster: all cluster managed by Patroni can be called Patroni cluster, but we usually use this term to refering a single PostgreSQL cluster or an MPP cluster.
Provide info about the PG parameters that caused "pending restart"
flag to be set. Both `patronictl list` and `/patroni` REST API endpoint
now show the parameters names and the diff as the "pending restart
reason".
When running `pg_basebackup` to bootstrap a replica, Patroni sanitizes
the custom user options that come from `postgresql.basebackup` configuration
section using the `process_user_options` method.
However, there is a bug in that method: it filters out not allowed options
that are in the format `- setting`, but not the ones in the format
`- setting: value` from `postgresql.basebackup`.
An example of that issue is the `dbname` setting. If you specify something
like this in the configuration file:
```yaml
postgresql:
basebackup:
- dbname: "host=RANDOM"
```
You end up with `--dbname` being specified twice for `pg_basebackup`, with
`--dbname='host=RANDOM'` taking precedence as it comes up later in the
command.
This commit fixes that issue by adding a `continue` statement when
the setting in format `- setting: value` is not allowed, thus skipping
it.
---------
Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
1. RotatingFileHandler is a child of StreamHandler, therefore we can't rely on `not isinstance(handler, logging.StreamHandler)`.
2. If the legacy version of `python-json-logger` is installed (that doesn't support rename_fields or static_fields), we want do what is possible rather than fail with the exception.
Besides that:
1. improve code coverage
2. make unit tests pass without python-json-logger installed or if only some old version is installed.
* Do not set pending_restart flag if hot_standby is set to 'off' during a custom bootstrap (even though we will have this flag actually set in PG, this configuration parameter is irrelevant on primary and there is no actual need for restart)
* Skip hot_standby and wal_log_hints when querying parameters pending restart on config reload. They actually can be changed manually (e.g. via ALTER SYSTEM) and it will cause the pending_restart state in PG but Patroni anyway always passes those params to postmaster as command line options. And there they only can have one value - 'on' (except on primary when performing custom bootstrap)
* Ensure that nofailover will always be used if both nofailover and
failover_priority tags are provided
* Call _validate_failover_tags from reload_local_configuration() as well
* Properly check values in the _validate_failover_tags(): nofailover value should be casted to boolean like it is done when accessed in other places
This was introduced by #2990: pod cannot be started and show the
following logs:
```
2023-12-26 03:29:25.569 UTC [47] CONTEXT: SQL statement "CREATE DATABASE "citus""
PL/pgSQL function inline_code_block line 5 at SQL statement
2023-12-26 03:29:25.569 UTC [47] STATEMENT: DO $$
BEGIN
PERFORM * FROM pg_catalog.pg_database WHERE datname = 'citus';
IF NOT FOUND THEN
CREATE DATABASE "citus";
END IF;
END;$$
2023-12-26 03:29:25,570 ERROR: post_bootstrap
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/bootstrap.py", line 474, in post_bootstrap
self._postgresql.citus_handler.bootstrap()
File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/mpp/citus.py", line 401, in bootstrap
cur.execute(sql.encode('utf-8'))
psycopg2.errors.ActiveSqlTransaction: CREATE DATABASE cannot be executed from a function
CONTEXT: SQL statement "CREATE DATABASE "citus""
PL/pgSQL function inline_code_block line 5 at SQL statement
```
---------
Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>