Improve /(a)sync checks in behave tests (#2521)

They are frequently failing because sometimes replicas are a bit slow realizing that they are synchronous. Instead of instroducing more sleeps we will poll for required http status code with some timeout.
This commit is contained in:
Alexander Kukushkin
2023-01-12 08:23:59 +01:00
committed by GitHub
parent 650344fca8
commit 5bbb5dceeb
2 changed files with 26 additions and 24 deletions

View File

@@ -21,12 +21,9 @@ Feature: basic replication
And I shut down postgres1
Then "sync" key in DCS has sync_standby=postgres2 after 10 seconds
When I start postgres1
And "members/postgres1" key in DCS has state=running after 10 seconds
And I sleep for 2 seconds
When I issue a GET request to http://127.0.0.1:8010/sync
Then I receive a response code 200
When I issue a GET request to http://127.0.0.1:8009/async
Then I receive a response code 200
Then "members/postgres1" key in DCS has state=running after 10 seconds
And Status code on GET http://127.0.0.1:8010/sync is 200 after 3 seconds
And Status code on GET http://127.0.0.1:8009/async is 200 after 3 seconds
Scenario: check stuck sync replica
Given I issue a PATCH request to http://127.0.0.1:8008/config with {"pause": true, "maximum_lag_on_syncnode": 15000000, "postgresql": {"parameters": {"synchronous_commit": "remote_apply"}}}
@@ -38,11 +35,8 @@ Feature: basic replication
And I load data on postgres0
Then "sync" key in DCS has sync_standby=postgres1 after 15 seconds
And I resume wal replay on postgres2
And I sleep for 2 seconds
And I issue a GET request to http://127.0.0.1:8009/sync
Then I receive a response code 200
When I issue a GET request to http://127.0.0.1:8010/async
Then I receive a response code 200
And Status code on GET http://127.0.0.1:8009/sync is 200 after 3 seconds
And Status code on GET http://127.0.0.1:8010/async is 200 after 3 seconds
When I issue a PATCH request to http://127.0.0.1:8008/config with {"pause": null, "maximum_lag_on_syncnode": -1, "postgresql": {"parameters": {"synchronous_commit": "on"}}}
Then I receive a response code 200
And I drop table on postgres0
@@ -50,23 +44,17 @@ Feature: basic replication
Scenario: check multi sync replication
Given I issue a PATCH request to http://127.0.0.1:8008/config with {"synchronous_node_count": 2}
Then I receive a response code 200
And I sleep for 10 seconds
Then "sync" key in DCS has sync_standby=postgres1,postgres2 after 5 seconds
When I issue a GET request to http://127.0.0.1:8010/sync
Then I receive a response code 200
When I issue a GET request to http://127.0.0.1:8009/sync
Then I receive a response code 200
Then "sync" key in DCS has sync_standby=postgres1,postgres2 after 10 seconds
And Status code on GET http://127.0.0.1:8010/sync is 200 after 3 seconds
And Status code on GET http://127.0.0.1:8009/sync is 200 after 3 seconds
When I issue a PATCH request to http://127.0.0.1:8008/config with {"synchronous_node_count": 1}
Then I receive a response code 200
And I shut down postgres1
And I sleep for 10 seconds
Then "sync" key in DCS has sync_standby=postgres2 after 10 seconds
When I start postgres1
And "members/postgres1" key in DCS has state=running after 10 seconds
When I issue a GET request to http://127.0.0.1:8010/sync
Then I receive a response code 200
When I issue a GET request to http://127.0.0.1:8009/async
Then I receive a response code 200
Then "members/postgres1" key in DCS has state=running after 10 seconds
And Status code on GET http://127.0.0.1:8010/sync is 200 after 3 seconds
And Status code on GET http://127.0.0.1:8009/async is 200 after 3 seconds
Scenario: check the basic failover in synchronous mode
Given I run patronictl.py pause batman

View File

@@ -131,7 +131,21 @@ def add_tag_to_config(context, tag, value, pg_name):
context.pctl.add_tag_to_config(pg_name, tag, value)
@then('Response on GET {url} contains {value} after {timeout:d} seconds')
@then('Status code on GET {url:url} is {code:d} after {timeout:d} seconds')
def check_http_code(context, url, code, timeout):
if context.certfile:
url = url.replace('http://', 'https://')
timeout *= context.timeout_multiplier
for _ in range(int(timeout)):
r = context.request_executor.request('GET', url)
if int(code) == int(r.status):
break
time.sleep(1)
else:
assert False, "HTTP Status Code is not {0} after {1} seconds".format(code, timeout)
@then('Response on GET {url:url} contains {value} after {timeout:d} seconds')
def check_http_response(context, url, value, timeout, negate=False):
if context.certfile:
url = url.replace('http://', 'https://')