mirror of
https://github.com/outbackdingo/patroni.git
synced 2026-01-27 10:20:10 +00:00
This commit is a breaking change: 1. `role` in DCS is written as "primary" instead of "master". 2. `role` in REST API responses is also written as "primary". 3. REST API no longer accepts role=master in requests (for example switchover/failover/restart endpoints). 4. `/metrics` REST API endpoint will no longer report `patroni_master`. 5. `patronictl` no longer accepts `--master` argument. 6. `no_master` option in declarative configuration of custom replica creation methods is no longer treated as a special option, please use `no_leader` instead. 7. `patroni_wale_restore` doesn't accept `--no_master` anymore. 8. `patroni_barman` doesn't accept `--role=master` anymore. 9. callback scripts will be executed with role=primary instead of role=master 10. On Kubernetes Patroni by default will set role label to primary. In case if you want to keep old behavior and avoid downtime or lengthy complex migrations you can configure `kubernetes.leader_label_value` and `kubernetes.standby_leader_label_value` to `master`. However, a few exceptions regarding master are still in place: 1. `GET /master` REST API endpoint will continue to work. 2. `master_start_timeout` and `master_stop_timeout` in global configuration are still accepted. 3. `master` tag is still preserved in Consul services in addition to `primary`. Rationale for these exceptions: DBA doesn't always 100% control the infrastructure and can't adjust the configuration.
81 lines
5.0 KiB
Gherkin
81 lines
5.0 KiB
Gherkin
Feature: citus
|
|
We should check that coordinator discovers and registers workers and clients don't have errors when worker cluster switches over
|
|
|
|
Scenario: check that worker cluster is registered in the coordinator
|
|
Given I start postgres0 in citus group 0
|
|
And I start postgres2 in citus group 1
|
|
Then postgres0 is a leader in a group 0 after 10 seconds
|
|
And postgres2 is a leader in a group 1 after 10 seconds
|
|
When I start postgres1 in citus group 0
|
|
And I start postgres3 in citus group 1
|
|
Then replication works from postgres0 to postgres1 after 15 seconds
|
|
Then replication works from postgres2 to postgres3 after 15 seconds
|
|
And postgres0 is registered in the postgres0 as the primary in group 0 after 5 seconds
|
|
And postgres1 is registered in the postgres0 as the secondary in group 0 after 5 seconds
|
|
And postgres2 is registered in the postgres0 as the primary in group 1 after 5 seconds
|
|
And postgres3 is registered in the postgres0 as the secondary in group 1 after 5 seconds
|
|
|
|
Scenario: coordinator failover updates pg_dist_node
|
|
Given I run patronictl.py failover batman --group 0 --candidate postgres1 --force
|
|
Then postgres1 role is the primary after 10 seconds
|
|
And "members/postgres0" key in a group 0 in DCS has state=running after 15 seconds
|
|
And replication works from postgres1 to postgres0 after 15 seconds
|
|
And postgres1 is registered in the postgres2 as the primary in group 0 after 5 seconds
|
|
And postgres0 is registered in the postgres2 as the secondary in group 0 after 15 seconds
|
|
And "sync" key in a group 0 in DCS has sync_standby=postgres0 after 15 seconds
|
|
When I run patronictl.py switchover batman --group 0 --candidate postgres0 --force
|
|
Then postgres0 role is the primary after 10 seconds
|
|
And replication works from postgres0 to postgres1 after 15 seconds
|
|
And postgres0 is registered in the postgres2 as the primary in group 0 after 5 seconds
|
|
And postgres1 is registered in the postgres2 as the secondary in group 0 after 15 seconds
|
|
And "sync" key in a group 0 in DCS has sync_standby=postgres1 after 15 seconds
|
|
|
|
Scenario: worker switchover doesn't break client queries on the coordinator
|
|
Given I create a distributed table on postgres0
|
|
And I start a thread inserting data on postgres0
|
|
When I run patronictl.py switchover batman --group 1 --force
|
|
Then I receive a response returncode 0
|
|
And postgres3 role is the primary after 10 seconds
|
|
And "members/postgres2" key in a group 1 in DCS has state=running after 15 seconds
|
|
And replication works from postgres3 to postgres2 after 15 seconds
|
|
And postgres3 is registered in the postgres0 as the primary in group 1 after 5 seconds
|
|
And postgres2 is registered in the postgres0 as the secondary in group 1 after 15 seconds
|
|
And "sync" key in a group 1 in DCS has sync_standby=postgres2 after 15 seconds
|
|
And a thread is still alive
|
|
When I run patronictl.py switchover batman --group 1 --force
|
|
Then I receive a response returncode 0
|
|
And postgres2 role is the primary after 10 seconds
|
|
And replication works from postgres2 to postgres3 after 15 seconds
|
|
And postgres2 is registered in the postgres0 as the primary in group 1 after 5 seconds
|
|
And postgres3 is registered in the postgres0 as the secondary in group 1 after 15 seconds
|
|
And "sync" key in a group 1 in DCS has sync_standby=postgres3 after 15 seconds
|
|
And a thread is still alive
|
|
When I stop a thread
|
|
Then a distributed table on postgres0 has expected rows
|
|
|
|
Scenario: worker primary restart doesn't break client queries on the coordinator
|
|
Given I cleanup a distributed table on postgres0
|
|
And I start a thread inserting data on postgres0
|
|
When I run patronictl.py restart batman postgres2 --group 1 --force
|
|
Then I receive a response returncode 0
|
|
And postgres2 role is the primary after 10 seconds
|
|
And replication works from postgres2 to postgres3 after 15 seconds
|
|
And postgres2 is registered in the postgres0 as the primary in group 1 after 5 seconds
|
|
And postgres3 is registered in the postgres0 as the secondary in group 1 after 15 seconds
|
|
And a thread is still alive
|
|
When I stop a thread
|
|
Then a distributed table on postgres0 has expected rows
|
|
|
|
Scenario: check that in-flight transaction is rolled back after timeout when other workers need to change pg_dist_node
|
|
Given I start postgres4 in citus group 2
|
|
Then postgres4 is a leader in a group 2 after 10 seconds
|
|
And "members/postgres4" key in a group 2 in DCS has role=primary after 3 seconds
|
|
When I run patronictl.py edit-config batman --group 2 -s ttl=20 --force
|
|
Then I receive a response returncode 0
|
|
And I receive a response output "+ttl: 20"
|
|
Then postgres4 is registered in the postgres2 as the primary in group 2 after 5 seconds
|
|
When I shut down postgres4
|
|
Then there is a transaction in progress on postgres0 changing pg_dist_node after 5 seconds
|
|
When I run patronictl.py restart batman postgres2 --group 1 --force
|
|
Then a transaction finishes in 20 seconds
|