`patronictl edit-config` requires a pager to show the diff output back to the user. It used to be hard-coded to use either `less` or `more`.
When these tools were not available in the host that would cause `patronictl` to face an exception in `ydiff` module and to show the stack trace in the console.
This PR changes `patronictl edit-config` command to behave like this:
- If `PAGER` environment variable is set, attempt to find the corresponding executable.
- If `PAGER` is not set or is set with an invalid executable, then attempt to use either `less` or `more` as it used to do.
- If no executable is find at all then throw a `PatroniCtlException` to show an user friendly message
Unit tests in `tests/test_ctl.py` were modified accordingly.
References: PAT-21
Close#2604
It was possible to have it empty if the all cluster keys are missing in DCS. In this case the `Cluster` object was manually created with all values set to `None` or `[]` (including sync).
It already resulted in #2217, which is in fact wasn't a correct fix.
In order to solve it and reduce code duplication we introduce `Cluster.empty()` and `SyncState.empty()` methods, which will create corresponding empty objects and start using `Cluster.empty()` from all places where the empty `Cluster` object was manually created.
If communication with etcd nodes failed it is logical to start from scratch, from nodes that are listed in the config. But, it could happen that config is in fact outdated and all nodes in the real cluster were replaced.
Previously we used to track whether config file was changed, which turned out not to work in all possible cases.
The new strategy is a bit more different - if communication with all nodes failed we will continue keeping the last know topology and at the same time will try to figure out the new one by merging two lists together, the cached list and the list from the config file.
We made incorrect assumption that `citus_set_coordinator_host()` will trigger `pg_dist_node` sync. Instead we should also use `citus_update_node()` and call `citus_set_coordinator_host()` only during the bootstrap.
Adjust behave tests to verify that coordinator failover is visible on workers.
Configuration parameter is `kubernetes.retriable_http_codes` or `PATRONI_KUBERNETES_RETRIABLE_HTTP_CODES` environment variable.
These status codes are added to the default list of 500, 503, 504.
Close https://github.com/zalando/patroni/issues/2536
It could happen that Patroni is started up before PGDATA was mounted. In this case Patroni can't determine major Postgres version from PG_VERSION file. Later, when PGDATA is mounted, Patroni was trying to create the recovery.conf even if the actual Postgres major version is newver than 12.
To mitigate the problem we double check that the `Postgresql._major_version` is set before writing recovery configuration or starting postgres up.
Close https://github.com/zalando/patroni/issues/2434
Since a long time Patroni enforcing only one callback script running at a time. If the new callback is executed while the old one is still running, the old one is killed (including all child processes).
Such behavior is fine for all callbacks but on_reload, because the last one may accidentally cancel important ones, that for example updating DNS or assigning/removing Virtual IP.
To mitigate the problem we introduce a dedicated executor for on_reload callbacks, so that on_reload may only cancel another on_reload.
Ref: https://github.com/zalando/patroni/issues/2445
The patronictl code tries to initialize DCS twice, first for the current Citus group and the second time for the selected group. However, kubernetes.py was overwriting the namespace config. As a result, after the second initialization patronictl was trying to work with the `default` namespace instead of the configured one.
* bump version
* update release notes
* removed 2.7, 3.4, 3.5, and 3.6 from supported versions in setup.py
* switched GH actions back to ubuntu-latest, removed tests with 2.7 and 3.6, and added 3.11
* some little fixes in Citus documentation and behave tests
keep as much backward compatibility as possible.
Following changes were made:
1. All internal checks are performed as `role in ('master', 'primary')`
2. All internal variables/functions/methods are renamed
3. `GET /metrics` endpoint returns `patroni_primary` in addition to `patroni_master`.
4. Logs are changed to use leader/primary/member/remote depending on the context
5. Unit-tests are using only role = 'primary' instead of 'master' to verify that 1 works.
6. patronictl still supports old syntax, but also accepts `--leader` and `--primary`.
7. `master_(start|stop)_timeout` is automatically translated to `primary_(start|stop)_timeout` if the last one is not set.
8. updated the documentation and some examples
Future plan: in the next major release switch role name from `master` to `primary` and maybe drop `master` altogether.
The Kubernetes implementation will require more work and keep two labels in parallel. Label values should probably be configurable as described in https://github.com/zalando/patroni/issues/2495.
it doesn't like relative imports and not recognise `http.server` imported with `six`.
The last one is explicitly added to the list of `hiddenimports()` and will break compatibility with python 2.7, which support will be dropped in the next Patroni release anyway.
Close https://github.com/zalando/patroni/issues/2535
Citus cluster (coordinator and workers) will be stored in DCS as a fleet of Patroni logically grouped together:
```
/service/batman/
/service/batman/0/
/service/batman/0/initialize
/service/batman/0/leader
/service/batman/0/members/
/service/batman/0/members/m1
/service/batman/0/members/m2
/service/batman/
/service/batman/1/
/service/batman/1/initialize
/service/batman/1/leader
/service/batman/1/members/
/service/batman/1/members/m1
/service/batman/1/members/m2
...
```
Where 0 is a Citus group for coordinator and 1, 2, etc are worker groups.
Such hierarchy allows reading the entire Citus cluster with a single call to DCS (except Zookeeper).
The get_cluster() method will be reading the entire Citus cluster on the coordinator because it needs to discover workers. For the worker cluster it will be reading the subtree of its own group.
Besides that we introduce a new method get_citus_coordinator(). It will be used only by worker clusters.
Since there is no hierarchical structures on K8s we will use the citus group suffix on all objects that Patroni creates.
E.g.
```
batman-0-leader # the leader config map for the coordinator
batman-0-config # the config map holding initialize, config, and history "keys"
...
batman-1-leader # the leader config map for worker group 1
batman-1-config
...
```
Citus integration is enabled from patroni.yaml:
```yaml
citus:
database: citus
group: 0 # 0 is for coordinator, 1, 2, etc are for workers
```
If enabled, Patroni will create the database, citus extension in it, and INSERTs INTO `pg_dist_authinfo` information required for Citus nodes to communicate between each other, i.e. 'password', 'sslcert', 'sslkey' for superuser if they are defined in the Patroni configuration file.
When the new Citus coordinator/worker is bootstrapped, Patroni adds `synchronous_mode: on` to the `bootstrap.dcs` section.
Besides that, Patroni takes over management of some Postgres GUCs:
- `shared_preload_libraries` - Patroni ensures that the "citus" is added to the first place
- `max_prepared_transactions` - if not set or set to 0, Patroni changes the value to `max_connections*2`
- wal_level - automatically set to logical. It is used by Citus to move/split shards. Under the hood Citus is creating/removing replication slots and they are automatically added by Patroni to the `ignore_slots` configuration to avoid accidental removal.
The coordinator primary actively discovers worker primary nodes and registers/updates them in the `pg_dist_node` table using
citus_add_node() and citus_update_node() functions.
Patroni running on the coordinator provides the new REST API endpoint: `POST /citus`. It is used by workers to facilitate controlled switchovers and restarts of worker primaries.
When the worker primary needs to shut down Postgres because of restart or switchover, it calls the `POST /citus` endpoint on the coordinator and the Patroni on the coordinator starts a transaction and calls `citus_update_node(nodeid, 'host-demoted', port)` in order to pause client connections that work with the given worker.
Once the new leader is elected or postgres started back, they perform another call to the `POST/citus` endpoint, that does another `citus_update_node()` call with actual hostname and port and commits a transaction. After transaction is committed, coordinator reestablishes connections to the worker node and client connections are unblocked.
If clients don't run long transaction the operation finishes without client visible errors, but only a short latency spike.
All operations on the `pg_dist_node` are serialized by Patroni on the coordinator. It allows to have more control and ROLLBACK transaction in progress if its lifetime exceeding a certain threshold and there are other worker nodes should be updated.
When `synchronous_standby_names` GUC is changed PostgreSQL nearly immediately starts reporting corresponding walsenders as synchronous, while in fact they maybe didn't reach this state yet. To mitigate this problem we memorize current flush lsn on the primary right after change of `synchronous_standby_names` got visible and use it as an additional check for walsenders.
The walsender will be counted as truly "sync" only when write/flush/replay_lsn on it reached memorized LSN and the `application_name` is known to be a part of `synchronous_standby_names`.
The size of PR mostly related to refactoring and moving the code responsible for working with `synchronous_standby_names` and `pg_stat_replication` to the dedicated file.
And `parse_sync_standby_names()` function was mostly copied from #672.
1. Fix problem with logical slots not advancing when only the primary lost access to DCS
2. Don't let Patroni to join as a raft voting member when running failsafe behave tests. It allows to test exactly the same conditions as for other DCS
3. Speed up dcs_failsafe_mode behave tests by getting rid from long sleeps, slight reshuffling of places when we start/stop outage, and by killing Patroni/Postgres to avoid long shutdown due to the leader key removal attempts.
When a replication slot is not registered with Patroni but is active, Patroni would log an error during each HA cycle in certain conditions (after a restart or role change). To avoid this, first check if the replication slot we are about to drop is still active and if so, only log a warning. Otherwise, log the slot we are dropping for informational purposes.
Close: #2499
If PR is open from the external GH repo secrets are not set due to security reasons. It makes codacy coverage report to fail.
Co-authored-by: Polina Bungina <bungina@gmail.com>
If enabled it will allow Patroni to cope with DCS outages.
In case of a DCS outage the leader tries to call all remaining members in the cluster via API and if all of them respond with success the leader will not be demoted.
The failsafe_mode could be enabled by running
```sh
patronictl edit-config -s failsafe_mode=true
```
or by calling the `/config` REST API endpoint.
Co-authored-by: Polina Bungina <bungina@gmail.com>
They are frequently failing because sometimes replicas are a bit slow realizing that they are synchronous. Instead of instroducing more sleeps we will poll for required http status code with some timeout.
If the cluster is stable (no nodes are joining/leaving/lagging) we want to run at most one monitor query per every HA loop. So far it worker perfectly except when synchronous_mode is enabled, where we run two additional queries:
1. SHOW synchronous_mode
2. SELECT ... FROM pg_stat_replication
In order to solve it, we will include these "queries" to the common monitoring query is synchronous_mode is enabled.
In addition to that make sure that `synchronous_standby_names` is reset on replicas that used to be a primary and avoid using replicas which are not in the 'running' state.
P.S.: in the monitoring query we also extract the current value of synchronous_standby_names, because it will be useful for the quorum commit feature.
Close https://github.com/zalando/patroni/issues/2469
- the new MacOS doesn't play well with old go binaries (bump etcd)
- use brew to install Postgres and expect (unbuffer, to make behave output colorful) and use the latest version
- upload failed logs instead of grepping them to stdout
Otherwise, the etcd (not etcd3) behave tests fail to connect:
```
Jan 02 09:56:18 HOOK-ERROR in before_all: AssertionError: etcd instance is not available for queries after 5 seconds
```
When doing the leader race we need to check that the former primary isn't alive anymore. For that we relied on non-inclusive terms. In order to simplify future work on getting rid from all non-inclusive words we change the check to rely on a difference in format of wal/xlog field. There is only "location" for the primary and "replayed_location" + "received_location" for standbys.
In addition to that we start supporting "wal" field as well as deprecated "xlog".
Co-authored-by: Polina Bungina <bungina@gmail.com>