Commit Graph

50 Commits

Author SHA1 Message Date
Alexander Kukushkin
48514db84b Take into account current role when deciding on removal of member ZNode (#2884)
Patroni doesn't watch on all changes of member keys in order to not create too much load on ZooKeeper, but only subscribes to changes (ZNodes added or deleted) in the `/member` directory. Therefore when some important fields in the value are updated we remove and recreate ZNode in order to notify the leader or other members.

The leader should remove the member key only when the `checkpoint_after_promote` value is changed and replicas when the `state` is changed to/from `running`.

We don't care about the `version` field, because Patroni version can't be changed without restart, what will case ZooKeeper `session_id` to change it anyway.

This fix hopefully will reduce failures of behave tests on GH Actions.
2023-09-26 09:12:31 +02:00
Feike Steenbergen
4725f12f9a Allow integer gucs without units in validation (#2734)
Previously, integer gucs, for example `max_connections` would not pass the validation, as these settings have no unit, if and only if they were specified as a string.

This causes problems if the `max_connections` is configured in `patroni.yaml` as a string, for example, the following configuration would not result in the right `max_connections` settings, as `max_connections` is configured as a string:

    bootstrap:
      dcs:
        postgresql:
          parameters:
            log_checkpoints: "on"
            log_connections: "off"
            max_connections: "57"

Allowing a user to specify *all* parameters as a string was accepted before in Patroni and also seems very useful, as many of us will be using Ansible/Helm/Golang to build a Patroni configuration, in which creating a `map[string]string` is easier than having to deal with data types.

Attemps to address issue #2735 

Regression was introduced in 76b3b99de2
2023-07-10 13:44:54 +02:00
Alexander Kukushkin
24af774adb Attempt to reduce behave flakiness on MacOS (#2645)
Sometimes MacOS workers are so slow that Postgres shutdown might take more than 30s-40s, what breaks a test with replica reinit in parallel with the primary restart, because basebackup() does only two attempts and in a pause the replica remains running with empty PGDATA.

In addition to that increase timeouts in ignore_slots test.
Close #2637
2023-04-13 12:21:08 +02:00
Alexander Kukushkin
4c3af2d1a0 Change master->primary/leader/member (#2541)
keep as much backward compatibility as possible.

Following changes were made:
1. All internal checks are performed as `role in ('master', 'primary')`
2. All internal variables/functions/methods are renamed
3. `GET /metrics` endpoint returns `patroni_primary` in addition to `patroni_master`.
4. Logs are changed to use leader/primary/member/remote depending on the context
5. Unit-tests are using only role = 'primary' instead of 'master' to verify that 1 works.
6. patronictl still supports old syntax, but also accepts `--leader` and `--primary`.
7. `master_(start|stop)_timeout` is automatically translated to `primary_(start|stop)_timeout` if the last one is not set.
8. updated the documentation and some examples

Future plan: in the next major release switch role name from `master` to `primary` and maybe drop `master` altogether.
The Kubernetes implementation will require more work and keep two labels in parallel. Label values should probably be configurable as described in https://github.com/zalando/patroni/issues/2495.
2023-01-27 07:40:24 +01:00
Alexander Kukushkin
4872ac51e0 Citus integration (#2504)
Citus cluster (coordinator and workers) will be stored in DCS as a fleet of Patroni logically grouped together:
```
/service/batman/
/service/batman/0/
/service/batman/0/initialize
/service/batman/0/leader
/service/batman/0/members/
/service/batman/0/members/m1
/service/batman/0/members/m2
/service/batman/
/service/batman/1/
/service/batman/1/initialize
/service/batman/1/leader
/service/batman/1/members/
/service/batman/1/members/m1
/service/batman/1/members/m2
...
```

Where 0 is a Citus group for coordinator and 1, 2, etc are worker groups.

Such hierarchy allows reading the entire Citus cluster with a single call to DCS (except Zookeeper).

The get_cluster() method will be reading the entire Citus cluster on the coordinator because it needs to discover workers. For the worker cluster it will be reading the subtree of its own group.

Besides that we introduce a new method  get_citus_coordinator(). It will be used only by worker clusters.

Since there is no hierarchical structures on K8s we will use the citus group suffix on all objects that Patroni creates.
E.g.
```
batman-0-leader  # the leader config map for the coordinator
batman-0-config  # the config map holding initialize, config, and history "keys"
...
batman-1-leader  # the leader config map for worker group 1
batman-1-config
...
```

Citus integration is enabled from patroni.yaml:
```yaml
citus:
  database: citus
  group: 0  # 0 is for coordinator, 1, 2, etc are for workers
```

If enabled, Patroni will create the database, citus extension in it, and INSERTs INTO `pg_dist_authinfo` information required for Citus nodes to communicate between each other, i.e. 'password', 'sslcert', 'sslkey' for superuser if they are defined in the Patroni configuration file.

When the new Citus coordinator/worker is bootstrapped, Patroni adds `synchronous_mode: on` to the `bootstrap.dcs` section.

Besides that, Patroni takes over management of some Postgres GUCs:
- `shared_preload_libraries` - Patroni ensures that the "citus" is added to the first place
- `max_prepared_transactions` - if not set or set to 0, Patroni changes the value to `max_connections*2`
- wal_level - automatically set to logical. It is used by Citus to move/split shards. Under the hood Citus is creating/removing replication slots and they are automatically added by Patroni to the `ignore_slots` configuration to avoid accidental removal.

The coordinator primary actively discovers worker primary nodes and registers/updates them in the `pg_dist_node` table using
citus_add_node() and citus_update_node() functions.

Patroni running on the coordinator provides the new REST API endpoint: `POST /citus`. It is used by workers to facilitate controlled switchovers and restarts of worker primaries.
When the worker primary needs to shut down Postgres because of restart or switchover, it calls the `POST /citus` endpoint on the coordinator and the Patroni on the coordinator starts a transaction and calls `citus_update_node(nodeid, 'host-demoted', port)` in order to pause client connections that work with the given worker.
Once the new leader is elected or postgres started back, they perform another call to the `POST/citus` endpoint, that does another `citus_update_node()` call with actual hostname and port and commits a transaction. After transaction is committed, coordinator reestablishes connections to the worker node and client connections are unblocked.
If clients don't run long transaction the operation finishes without client visible errors, but only a short latency spike.

All operations on the `pg_dist_node` are serialized by Patroni on the coordinator. It allows to have more control and ROLLBACK transaction in progress if its lifetime exceeding a certain threshold and there are other worker nodes should be updated.
2023-01-24 16:14:58 +01:00
Alexander Kukushkin
580530b30f Behave tests on Windows (#2432)
Windows doesn't support `SIGTERM`, but our behave tests in majority of cases relying on Patroni graceful shutdown.
In order to emulate the behaviour we introduced the new REST API endpoint `POST /sigterm`. The endpoint works only on Windows and when `BEHAVE_DEBUG` environment variable is set.
Besides that some minor adjustments in behave tests were done. Mainly related to backslash-slash handling.

In addition to that improve test coverage on Windows by properly mocking access to filesystem and avoiding calling
 `subprocess.call()`. Specifically, symlink creation on Windows requires Admin privileges and there is no `true.exe`.
2022-10-21 12:24:24 +02:00
Alexander Kukushkin
ead798d9ac Speed up behave tests by always using loop_wait=2 (#2361)
run time is reduced from ~5m30s to ~5m
2022-07-18 15:23:55 +02:00
Alexander Kukushkin
4215565cb4 Rearrange tests (#2146)
- remove codacy steps: they removed legacy organizations and there seems to be no easy way of installing codacy app to the Zalando GH.
- Don't run behave on MacOS: recently worker became way to slow
- Disable behave for combination of kubernetes and python 2.7
- Remove python 3.5 (it will be removed by GH from workers in January) and add 3.10
- Run behave with 3.6 and 3.9 instead of 3.5 and 3.8
2021-12-21 09:36:22 +01:00
Alexander Kukushkin
d24051c31c Optimize case when we don't have permanent logical slots (#2121)
The unnecessary call of SlotsHandler.process_permanent_slots() results in one additional query to `pg_replication_slots` view every HA loop.
2021-11-30 14:20:55 +01:00
Alexander Kukushkin
8a8409999d Change the behavior in pause (#1687)
1. Don't call bootstrap if PGDATA is missing/empty, because it might be for purpose, and someone/something working on it.
2. Consider postgres running as a leader in pause not healthy if pg_control sysid doesn't match with the /initialize key (empty initialize key will allow the "race" and the leader will "restore" initialize key).
3. Don't exit on sysid mismatch in pause, only log a warning.
4. Cover corner cases when Patroni started in pause with empty PGDATA and it was restored by somebody else
5. Empty string is a valid `recovery_target`.
2020-09-18 08:25:00 +02:00
Alexander Kukushkin
e95e54b94e Handle correctly health-checks for standby cluster (#1553)
Close https://github.com/zalando/patroni/issues/1388
2020-06-05 10:37:02 +02:00
Alexander Kukushkin
a5ff38a034 Improve behave tests (#1313)
Hopefully, make them less flaky
2019-12-02 10:33:44 +01:00
Alexander Kukushkin
367d787ff9 Implement /history and /cluster endpoints (#1191)
The /history endpoint shows the content of the `history` key in DCS
The /cluster endpoint show all cluster members and some service info like pending and scheduled restarts or switchovers.

In addition to that implement `patronictl history`

Close #586
Close #675
Close #1133
2019-10-22 17:19:02 +02:00
Alexander Kukushkin
3d29cb7e50 Perform pg_ctl reload regardless of config changes (#1204)
It is possible that some config files are not controlled by Patroni and when somebody is doing reload via REST API or by sending SIGHUP to Patroni process the usual expectation is that postgres will also be reloaded, but it didn't happen when there were no changes in the postgresql section of Patroni config.

For example one might replace ssl_cert_file and ssl_key_file on the filesystem and starting from PostgreSQL 10 it just requires a reload, but Patroni wasn't doing it.

In addition to that fix the issue with handling of `wal_buffers`. The default value depends on `shared_buffers` and `wal_segment_size` and therefore Patroni was exposing pending_restart when the new value in the config was explicitly set to -1 (default).

Close https://github.com/zalando/patroni/issues/1198
2019-10-10 14:49:30 +02:00
wilfriedroset
2384d9e735 Add API route /health (#1079)
close #119
2019-06-11 15:22:52 +02:00
Alexander Kukushkin
1a0876e5ca Refactor acceptance tests to improve stability (#884)
Hope it will crash less often when executed on travis against k8s
2018-11-30 12:40:56 +01:00
Alexander Kukushkin
87e9aab04c Improve tests (#778)
* Implement missing unit-tests
* Add acceptance tests for ISSUE #776
* Update list of classifiers, keywords and authors
2018-08-29 11:29:37 +02:00
Alexander Kukushkin
18786464a1 Rename failover to switchover and make new failover work without leader (#588)
In addition to that implement /switchover endpoint as an alias to /failover endpoint and implement more checks like:
* candidate must be provided for a failover
* switchover can't be scheduled in a pause state
* and so on

Fixes https://github.com/zalando/patroni/issues/585
Fixes https://github.com/zalando/patroni/issues/520
2018-01-05 15:17:56 +01:00
Alexander Kukushkin
25aa49b240 Run one manual failover test via rest API instead of patronictl
and bump Patroni version
2017-07-31 11:18:01 +02:00
Alexander Kukushkin
322aa45e09 BUGFIX: patronictl edit-config didn't worked with zookeeper (#492)
When updating config key we should use `ClusterConfig.index` instead of
`ClusterConfig.modify_index`. The second one should be used by Patroni
internally to check that key was really changed, because when key is
deleted and recreated it's version always starts from the same value: 0

In addition to that use patronictl instead of http PATCH in some of
acceptance tests to change cluster config.

Fixes https://github.com/zalando/patroni/issues/491
2017-07-31 11:07:00 +02:00
Alexander Kukushkin
39f5f7982c Scheduled failovers in 1 second don't work reliably with loop_wait=2 2017-01-13 11:25:07 +01:00
Alexander Kukushkin
1f829a4b34 Switch to trusty and run acceptance tests with postgres 9.6 2017-01-13 09:32:38 +01:00
Alexander Kukushkin
1e573aec8f Do session/renew call to Consul when update_leader is called (#336) 2016-10-10 10:05:55 +02:00
Alexander Kukushkin
33ff372ef6 Always try to rewind on manual failover 2016-09-01 11:08:26 +02:00
Alexander Kukushkin
1dcdd6eaa0 Acceptance tests for pause mode 2016-08-30 16:50:07 +02:00
Alexander Kukushkin
366ed9cc52 fix pep8 formatting and implement missing tests 2016-08-29 15:39:24 +02:00
Murat Kabilov
a47a2bceff Manage scheduled restarts using patronictl (#248)
Manage scheduled restarts using patronictl
2016-08-09 12:54:48 +02:00
Oleksii Kliukin
ffd27b5705 Rename with_pending_restart to restart_pending. 2016-07-13 11:07:37 +02:00
Oleksii Kliukin
bf95b75489 Use the parameter that really sets the pending_restart flag. 2016-07-11 18:20:15 +02:00
Oleksii Kliukin
c91eda8d78 Merge branch 'master' into feature/scheduled_restarts 2016-07-11 12:56:24 +02:00
Oleksii Kliukin
29845dd383 Restart the node according to the schedule.
The scheduled restart data structures are now independent of those
used by the normal restarts. This would be fixed in subsequent
commits.
Add the behave tests, that cover the POST /restart (but not DELETE).
2016-06-23 10:43:54 +02:00
Alexander Kukushkin
fcde17583c Acceptance tests for patronictl
Call patronictl.py when it's possible instead of doing REST API calls.
2016-06-16 15:06:18 +02:00
Alexander Kukushkin
24822bd9ac Returning 304 for POST, PATCH, PUT is not good idea 2016-06-06 10:50:42 +02:00
Alexander Kukushkin
ebb9e252d8 Rename restart_pending to pending_restart for compatibility 2016-06-02 09:31:30 +02:00
Alexander Kukushkin
1c30948ef9 Implement PUT /config and enhance some checks 2016-06-01 17:06:31 +02:00
Alexander Kukushkin
f7912991a8 Reshuffle acceptance tests one more time 2016-05-30 12:37:14 +02:00
Alexander Kukushkin
e085c866dc Reshuffle acceptance tests
Move dynamic config tests from basic_replication to patroni_api
2016-05-30 11:30:41 +02:00
Alexander Kukushkin
073ef3784f Implement PATCH /config 2016-05-27 16:29:33 +02:00
Alexander Kukushkin
d57310bbc0 Fix one more corner-case
It could take up to 10 seconds to create replication slot.
In addition to that when replica fails to connect to the master via
streaming replication it doesn't retry immediately, but with some
timeout (5 seconds). 10 + 5 == 15 what causes replication check
scenarios fail.
2016-04-13 14:09:45 +02:00
Alexander Kukushkin
b4e86f0809 Make it possible to schedule failover in less then 10 seconds
But only when API request was posted to the leader
2016-04-13 13:32:39 +02:00
Alexander Kukushkin
15d30a2d35 Try to stabilize acceptance tests 2016-04-13 13:32:39 +02:00
Alexander Kukushkin
24a2ea6cef Refactor acceptance tests to make them work against ZooKeeper
and make it easier to implement controllers for new DCS, i.e. consul
2016-04-10 10:37:43 +02:00
Alexander Kukushkin
e6af18f0bb Former leader was not able to reattach to cluster without pg_rewind
It was shutdown correctly and I expected such 'join' working, but it was
not, because new leader didn't had enough time to catch up with the
master before promote.
2016-03-24 14:45:21 +01:00
Alexander Kukushkin
54055c1ff8 Rename ambiguous Failover.member to candidate
But! 'member' is still accepted by REST API and also name 'member' is
used to strore/read this value to/from DCS (for backward comatibility)
2016-03-18 15:59:47 +01:00
Alexander Kukushkin
42d798a3de acceptance tests on travis 2016-03-10 17:19:10 +01:00
Oleksii Kliukin
3f1c34f557 Add tests for the scheduled failover.
The actual amount of time to establish the master and the replication
after the scheduled failover seems sufficient (15 seconds with the
failover in 10 seconds), but occasionally leads to test failures.
This is unlikely the test issue and should be investigated inside
the patroni.
2016-03-02 19:39:12 +01:00
Oleksii Kliukin
069440be15 Improve the "replication work" sentence definition.
Add an ability to specify the origin and the destination for
the replication works clause. Use this ability in the API
promotion test to ensure the replication from the former
replica to the former master.
2016-03-02 15:43:44 +01:00
Oleksii Kliukin
24ebcc72f6 Add more tests for the restart and promotion. 2016-03-01 22:07:18 +01:00
Oleksii Kliukin
0d44e3eb7c Add simple API tests for 2 nodes, to be extended. 2016-02-26 18:00:11 +01:00
Oleksii Kliukin
4e9ebf48a8 Add API tests for a stand-alone node. Bugfixes.
Add tests for patroni API.
Fix test failures when an already running etcd is used.
2016-02-26 17:37:37 +01:00