Files
patroni/features/basic_replication.feature
Alexander Kukushkin f3420e2db5 Compatibility with PostgreSQL 14 (#1926)
PostgreSQL 14 changed the behavior of replicas when certain parameters (like for example `max_connections`) are changed (increased): https://github.com/postgres/postgres/commit/15251c0a.
Instead of immediately exiting Postgres 14 pauses replication and waits for actions from the operator.

Since the `pg_is_wal_replay_paused()` returning `True` is the only indicator of such a change, Patroni on the replica will call the `pg_wal_replay_resume()`, which would cause either continue replication or shutdown (like previously).

So far Patroni was never calling `pg_wal_replay_resume()` on its own, therefore, to remain backward compatible it will call it only for PostgreSQL 14+.
2021-06-25 13:41:45 +02:00

97 lines
4.7 KiB
Gherkin

Feature: basic replication
We should check that the basic bootstrapping, replication and failover works.
Scenario: check replication of a single table
Given I start postgres0
Then postgres0 is a leader after 10 seconds
And there is a non empty initialize key in DCS after 15 seconds
When I issue a PATCH request to http://127.0.0.1:8008/config with {"ttl": 20, "loop_wait": 2, "synchronous_mode": true}
Then I receive a response code 200
When I start postgres1
And I configure and start postgres2 with a tag replicatefrom postgres0
And "sync" key in DCS has leader=postgres0 after 20 seconds
And I add the table foo to postgres0
Then table foo is present on postgres1 after 20 seconds
Then table foo is present on postgres2 after 20 seconds
Scenario: check restart of sync replica
Given I shut down postgres2
Then "sync" key in DCS has sync_standby=postgres1 after 5 seconds
When I start postgres2
And I shut down postgres1
Then "sync" key in DCS has sync_standby=postgres2 after 10 seconds
When I start postgres1
And "members/postgres1" key in DCS has state=running after 10 seconds
And I sleep for 2 seconds
When I issue a GET request to http://127.0.0.1:8010/sync
Then I receive a response code 200
When I issue a GET request to http://127.0.0.1:8009/async
Then I receive a response code 200
Scenario: check stuck sync replica
Given I issue a PATCH request to http://127.0.0.1:8008/config with {"pause": true, "maximum_lag_on_syncnode": 15000000, "postgresql": {"parameters": {"synchronous_commit": "remote_apply"}}}
Then I receive a response code 200
And I create table on postgres0
And table mytest is present on postgres1 after 2 seconds
And table mytest is present on postgres2 after 2 seconds
When I pause wal replay on postgres2
And I load data on postgres0
Then "sync" key in DCS has sync_standby=postgres1 after 15 seconds
And I resume wal replay on postgres2
And I sleep for 2 seconds
And I issue a GET request to http://127.0.0.1:8009/sync
Then I receive a response code 200
When I issue a GET request to http://127.0.0.1:8010/async
Then I receive a response code 200
When I issue a PATCH request to http://127.0.0.1:8008/config with {"pause": null, "maximum_lag_on_syncnode": -1, "postgresql": {"parameters": {"synchronous_commit": "on"}}}
Then I receive a response code 200
And I drop table on postgres0
Scenario: check multi sync replication
Given I issue a PATCH request to http://127.0.0.1:8008/config with {"synchronous_node_count": 2}
Then I receive a response code 200
And I sleep for 10 seconds
Then "sync" key in DCS has sync_standby=postgres1,postgres2 after 5 seconds
When I issue a GET request to http://127.0.0.1:8010/sync
Then I receive a response code 200
When I issue a GET request to http://127.0.0.1:8009/sync
Then I receive a response code 200
When I issue a PATCH request to http://127.0.0.1:8008/config with {"synchronous_node_count": 1}
Then I receive a response code 200
And I shut down postgres1
And I sleep for 10 seconds
Then "sync" key in DCS has sync_standby=postgres2 after 10 seconds
When I start postgres1
And "members/postgres1" key in DCS has state=running after 10 seconds
When I issue a GET request to http://127.0.0.1:8010/sync
Then I receive a response code 200
When I issue a GET request to http://127.0.0.1:8009/async
Then I receive a response code 200
Scenario: check the basic failover in synchronous mode
Given I run patronictl.py pause batman
Then I receive a response returncode 0
When I sleep for 2 seconds
And I shut down postgres0
And I run patronictl.py resume batman
Then I receive a response returncode 0
And postgres2 role is the primary after 24 seconds
And Response on GET http://127.0.0.1:8010/history contains recovery after 10 seconds
When I issue a PATCH request to http://127.0.0.1:8010/config with {"synchronous_mode": null, "master_start_timeout": 0}
Then I receive a response code 200
When I add the table bar to postgres2
Then table bar is present on postgres1 after 20 seconds
And Response on GET http://127.0.0.1:8010/config contains master_start_timeout after 10 seconds
Scenario: check immediate failover when master_start_timeout=0
Given I kill postmaster on postgres2
Then postgres1 is a leader after 10 seconds
And postgres1 role is the primary after 10 seconds
Scenario: check rejoin of the former master with pg_rewind
Given I add the table splitbrain to postgres0
And I start postgres0
Then postgres0 role is the secondary after 20 seconds
When I add the table buz to postgres1
Then table buz is present on postgres0 after 20 seconds