Files
patroni/features/standby_cluster.feature
Alexander Kukushkin d46ca88e6b Make it visible replication state on standbys (#2733)
To do that we use `pg_stat_get_wal_receiver()` function, which is available since 9.6. For older versions the `patronictl list` output and REST API responses remain as before.

In case if there is no wal receiver process we check if `restore_command` is set and show the state as `in archive recovery`.

Example of `patronictl list` output:
```bash
$ patronictl list
+ Cluster: batman -------------+---------+---------------------+----+-----------+
| Member      | Host           | Role    | State               | TL | Lag in MB |
+-------------+----------------+---------+---------------------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader  | running             | 12 |           |
| postgresql1 | 127.0.0.1:5433 | Replica | in archive recovery | 12 |         0 |
+-------------+----------------+---------+---------------------+----+-----------+

$ patronictl list
+ Cluster: batman -------------+---------+-----------+----+-----------+
| Member      | Host           | Role    | State     | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader  | running   | 12 |           |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 12 |         0 |
+-------------+----------------+---------+-----------+----+-----------+
```

Example of REST API response:
```bash
$ curl -s localhost:8009 | jq .
{
  "state": "running",
  "postmaster_start_time": "2023-07-06 13:12:00.595118+02:00",
  "role": "replica",
  "server_version": 150003,
  "xlog": {
    "received_location": 335544480,
    "replayed_location": 335544480,
    "replayed_timestamp": null,
    "paused": false
  },
  "timeline": 12,
  "replication_state": "in archive recovery",
  "dcs_last_seen": 1688642069,
  "database_system_identifier": "7252327498286490579",
  "patroni": {
    "version": "3.0.3",
    "scope": "batman"
  }
}

$ curl -s localhost:8009 | jq .
{
  "state": "running",
  "postmaster_start_time": "2023-07-06 13:12:00.595118+02:00",
  "role": "replica",
  "server_version": 150003,
  "xlog": {
    "received_location": 335544816,
    "replayed_location": 335544816,
    "replayed_timestamp": null,
    "paused": false
  },
  "timeline": 12,
  "replication_state": "streaming",
  "dcs_last_seen": 1688642089,
  "database_system_identifier": "7252327498286490579",
  "patroni": {
    "version": "3.0.3",
    "scope": "batman"
  }
}
```
2023-07-13 09:24:20 +02:00

71 lines
3.8 KiB
Gherkin

Feature: standby cluster
Scenario: prepare the cluster with logical slots
Given I start postgres1
Then postgres1 is a leader after 10 seconds
And there is a non empty initialize key in DCS after 15 seconds
When I issue a PATCH request to http://127.0.0.1:8009/config with {"slots": {"pm_1": {"type": "physical"}}, "postgresql": {"parameters": {"wal_level": "logical"}}}
Then I receive a response code 200
And Response on GET http://127.0.0.1:8009/config contains slots after 10 seconds
And I sleep for 3 seconds
When I issue a PATCH request to http://127.0.0.1:8009/config with {"slots": {"test_logical": {"type": "logical", "database": "postgres", "plugin": "test_decoding"}}}
Then I receive a response code 200
And I do a backup of postgres1
When I start postgres0
Then "members/postgres0" key in DCS has state=running after 10 seconds
And replication works from postgres1 to postgres0 after 15 seconds
When I issue a GET request to http://127.0.0.1:8008/patroni
Then I receive a response code 200
And I receive a response replication_state streaming
And "members/postgres0" key in DCS has replication_state=streaming after 10 seconds
@slot-advance
Scenario: check permanent logical slots are synced to the replica
Given I run patronictl.py restart batman postgres1 --force
Then Logical slot test_logical is in sync between postgres0 and postgres1 after 10 seconds
When I add the table replicate_me to postgres1
And I get all changes from logical slot test_logical on postgres1
Then Logical slot test_logical is in sync between postgres0 and postgres1 after 10 seconds
Scenario: Detach exiting node from the cluster
When I shut down postgres1
Then postgres0 is a leader after 10 seconds
And "members/postgres0" key in DCS has role=master after 3 seconds
When I issue a GET request to http://127.0.0.1:8008/
Then I receive a response code 200
Scenario: check replication of a single table in a standby cluster
Given I start postgres1 in a standby cluster batman1 as a clone of postgres0
Then postgres1 is a leader of batman1 after 10 seconds
When I add the table foo to postgres0
Then table foo is present on postgres1 after 20 seconds
When I issue a GET request to http://127.0.0.1:8009/patroni
Then I receive a response code 200
And I receive a response replication_state streaming
And I sleep for 3 seconds
When I issue a GET request to http://127.0.0.1:8009/primary
Then I receive a response code 503
When I issue a GET request to http://127.0.0.1:8009/standby_leader
Then I receive a response code 200
And I receive a response role standby_leader
And there is a postgres1_cb.log with "on_role_change standby_leader batman1" in postgres1 data directory
When I start postgres2 in a cluster batman1
Then postgres2 role is the replica after 24 seconds
And table foo is present on postgres2 after 20 seconds
When I issue a GET request to http://127.0.0.1:8010/patroni
Then I receive a response code 200
And I receive a response replication_state streaming
And postgres1 does not have a logical replication slot named test_logical
Scenario: check failover
When I kill postgres1
And I kill postmaster on postgres1
Then postgres2 is replicating from postgres0 after 32 seconds
When I issue a GET request to http://127.0.0.1:8010/primary
Then I receive a response code 503
And I sleep for 3 seconds
When I issue a GET request to http://127.0.0.1:8010/standby_leader
Then I receive a response code 200
And I receive a response role standby_leader
And replication works from postgres0 to postgres2 after 15 seconds
And there is a postgres2_cb.log with "on_start replica batman1\non_role_change standby_leader batman1" in postgres2 data directory