Files
patroni/features/citus.feature
Alexander Kukushkin c96e35c807 Enable Citus behave tests for Postgres v16 (#2914)
and reduce flakiness
2023-10-16 16:05:27 +02:00

74 lines
4.3 KiB
Gherkin

Feature: citus
We should check that coordinator discovers and registers workers and clients don't have errors when worker cluster switches over
Scenario: check that worker cluster is registered in the coordinator
Given I start postgres0 in citus group 0
And I start postgres2 in citus group 1
Then postgres0 is a leader in a group 0 after 10 seconds
And postgres2 is a leader in a group 1 after 10 seconds
When I start postgres1 in citus group 0
And I start postgres3 in citus group 1
Then replication works from postgres0 to postgres1 after 15 seconds
Then replication works from postgres2 to postgres3 after 15 seconds
And postgres0 is registered in the postgres0 as the primary in group 0 after 5 seconds
And postgres2 is registered in the postgres0 as the primary in group 1 after 5 seconds
Scenario: coordinator failover updates pg_dist_node
Given I run patronictl.py failover batman --group 0 --candidate postgres1 --force
Then postgres1 role is the primary after 10 seconds
And "members/postgres0" key in a group 0 in DCS has state=running after 15 seconds
And replication works from postgres1 to postgres0 after 15 seconds
And postgres1 is registered in the postgres2 as the primary in group 0 after 5 seconds
And "sync" key in a group 0 in DCS has sync_standby=postgres0 after 15 seconds
When I run patronictl.py switchover batman --group 0 --candidate postgres0 --force
Then postgres0 role is the primary after 10 seconds
And replication works from postgres0 to postgres1 after 15 seconds
And postgres0 is registered in the postgres2 as the primary in group 0 after 5 seconds
And "sync" key in a group 0 in DCS has sync_standby=postgres1 after 15 seconds
Scenario: worker switchover doesn't break client queries on the coordinator
Given I create a distributed table on postgres0
And I start a thread inserting data on postgres0
When I run patronictl.py switchover batman --group 1 --force
Then I receive a response returncode 0
And postgres3 role is the primary after 10 seconds
And "members/postgres2" key in a group 1 in DCS has state=running after 15 seconds
And replication works from postgres3 to postgres2 after 15 seconds
And postgres3 is registered in the postgres0 as the primary in group 1 after 5 seconds
And "sync" key in a group 1 in DCS has sync_standby=postgres2 after 15 seconds
And a thread is still alive
When I run patronictl.py switchover batman --group 1 --force
Then I receive a response returncode 0
And postgres2 role is the primary after 10 seconds
And replication works from postgres2 to postgres3 after 15 seconds
And postgres2 is registered in the postgres0 as the primary in group 1 after 5 seconds
And "sync" key in a group 1 in DCS has sync_standby=postgres3 after 15 seconds
And a thread is still alive
When I stop a thread
Then a distributed table on postgres0 has expected rows
Scenario: worker primary restart doesn't break client queries on the coordinator
Given I cleanup a distributed table on postgres0
And I start a thread inserting data on postgres0
When I run patronictl.py restart batman postgres2 --group 1 --force
Then I receive a response returncode 0
And postgres2 role is the primary after 10 seconds
And replication works from postgres2 to postgres3 after 15 seconds
And postgres2 is registered in the postgres0 as the primary in group 1 after 5 seconds
And a thread is still alive
When I stop a thread
Then a distributed table on postgres0 has expected rows
Scenario: check that in-flight transaction is rolled back after timeout when other workers need to change pg_dist_node
Given I start postgres4 in citus group 2
Then postgres4 is a leader in a group 2 after 10 seconds
And "members/postgres4" key in a group 2 in DCS has role=master after 3 seconds
When I run patronictl.py edit-config batman --group 2 -s ttl=20 --force
Then I receive a response returncode 0
And I receive a response output "+ttl: 20"
Then postgres4 is registered in the postgres2 as the primary in group 2 after 5 seconds
When I shut down postgres4
Then there is a transaction in progress on postgres0 changing pg_dist_node after 5 seconds
When I run patronictl.py restart batman postgres2 --group 1 --force
Then a transaction finishes in 20 seconds