Files
patroni/features/basic_replication.feature
Alexander Kukushkin 4c3af2d1a0 Change master->primary/leader/member (#2541)
keep as much backward compatibility as possible.

Following changes were made:
1. All internal checks are performed as `role in ('master', 'primary')`
2. All internal variables/functions/methods are renamed
3. `GET /metrics` endpoint returns `patroni_primary` in addition to `patroni_master`.
4. Logs are changed to use leader/primary/member/remote depending on the context
5. Unit-tests are using only role = 'primary' instead of 'master' to verify that 1 works.
6. patronictl still supports old syntax, but also accepts `--leader` and `--primary`.
7. `master_(start|stop)_timeout` is automatically translated to `primary_(start|stop)_timeout` if the last one is not set.
8. updated the documentation and some examples

Future plan: in the next major release switch role name from `master` to `primary` and maybe drop `master` altogether.
The Kubernetes implementation will require more work and keep two labels in parallel. Label values should probably be configurable as described in https://github.com/zalando/patroni/issues/2495.
2023-01-27 07:40:24 +01:00

85 lines
4.4 KiB
Gherkin

Feature: basic replication
We should check that the basic bootstrapping, replication and failover works.
Scenario: check replication of a single table
Given I start postgres0
Then postgres0 is a leader after 10 seconds
And there is a non empty initialize key in DCS after 15 seconds
When I issue a PATCH request to http://127.0.0.1:8008/config with {"ttl": 20, "synchronous_mode": true}
Then I receive a response code 200
When I start postgres1
And I configure and start postgres2 with a tag replicatefrom postgres0
And "sync" key in DCS has leader=postgres0 after 20 seconds
And I add the table foo to postgres0
Then table foo is present on postgres1 after 20 seconds
Then table foo is present on postgres2 after 20 seconds
Scenario: check restart of sync replica
Given I shut down postgres2
Then "sync" key in DCS has sync_standby=postgres1 after 5 seconds
When I start postgres2
And I shut down postgres1
Then "sync" key in DCS has sync_standby=postgres2 after 10 seconds
When I start postgres1
Then "members/postgres1" key in DCS has state=running after 10 seconds
And Status code on GET http://127.0.0.1:8010/sync is 200 after 3 seconds
And Status code on GET http://127.0.0.1:8009/async is 200 after 3 seconds
Scenario: check stuck sync replica
Given I issue a PATCH request to http://127.0.0.1:8008/config with {"pause": true, "maximum_lag_on_syncnode": 15000000, "postgresql": {"parameters": {"synchronous_commit": "remote_apply"}}}
Then I receive a response code 200
And I create table on postgres0
And table mytest is present on postgres1 after 2 seconds
And table mytest is present on postgres2 after 2 seconds
When I pause wal replay on postgres2
And I load data on postgres0
Then "sync" key in DCS has sync_standby=postgres1 after 15 seconds
And I resume wal replay on postgres2
And Status code on GET http://127.0.0.1:8009/sync is 200 after 3 seconds
And Status code on GET http://127.0.0.1:8010/async is 200 after 3 seconds
When I issue a PATCH request to http://127.0.0.1:8008/config with {"pause": null, "maximum_lag_on_syncnode": -1, "postgresql": {"parameters": {"synchronous_commit": "on"}}}
Then I receive a response code 200
And I drop table on postgres0
Scenario: check multi sync replication
Given I issue a PATCH request to http://127.0.0.1:8008/config with {"synchronous_node_count": 2}
Then I receive a response code 200
Then "sync" key in DCS has sync_standby=postgres1,postgres2 after 10 seconds
And Status code on GET http://127.0.0.1:8010/sync is 200 after 3 seconds
And Status code on GET http://127.0.0.1:8009/sync is 200 after 3 seconds
When I issue a PATCH request to http://127.0.0.1:8008/config with {"synchronous_node_count": 1}
Then I receive a response code 200
And I shut down postgres1
Then "sync" key in DCS has sync_standby=postgres2 after 10 seconds
When I start postgres1
Then "members/postgres1" key in DCS has state=running after 10 seconds
And Status code on GET http://127.0.0.1:8010/sync is 200 after 3 seconds
And Status code on GET http://127.0.0.1:8009/async is 200 after 3 seconds
Scenario: check the basic failover in synchronous mode
Given I run patronictl.py pause batman
Then I receive a response returncode 0
When I sleep for 2 seconds
And I shut down postgres0
And I run patronictl.py resume batman
Then I receive a response returncode 0
And postgres2 role is the primary after 24 seconds
And Response on GET http://127.0.0.1:8010/history contains recovery after 10 seconds
When I issue a PATCH request to http://127.0.0.1:8010/config with {"synchronous_mode": null, "master_start_timeout": 0}
Then I receive a response code 200
When I add the table bar to postgres2
Then table bar is present on postgres1 after 20 seconds
And Response on GET http://127.0.0.1:8010/config contains master_start_timeout after 10 seconds
Scenario: check immediate failover when master_start_timeout=0
Given I kill postmaster on postgres2
Then postgres1 is a leader after 10 seconds
And postgres1 role is the primary after 10 seconds
Scenario: check rejoin of the former primary with pg_rewind
Given I add the table splitbrain to postgres0
And I start postgres0
Then postgres0 role is the secondary after 20 seconds
When I add the table buz to postgres1
Then table buz is present on postgres0 after 20 seconds