mirror of
https://github.com/optim-enterprises-bv/patroni.git
synced 2025-12-31 10:51:02 +00:00
Current problem of Patroni that strikes many people is that it removes replication slot for member which key is expired from DCS. As a result, when the replica comes back from a scheduled maintenance WAL segments could be already absent, and it can't continue streaming without pulling files from archive. With PostgreSQL 16 and newer we get another problem: logical slot on a standby node could be invalidated if physical replication slot on the primary was removed (and `pg_catalog` vacuumed). The most problematic environment is Kubernetes, where slot is removed nearly instantly when member Pod is deleted. So far, one of the recommended solutions was to configure permanent physical slots with names that match member names to avoid removal of replication slots. It works, but depending on environment might be non-trivial to implement (when for example members may change their names). This PR implements support of `member_slots_ttl` global configuration parameter, that controls for how long member replication slots should be kept when the member key is absent. Default value is set to `30min`. The feature is supported only starting from PostgreSQL 11 and newer, because we want to retain slots not only on the leader node, but on all nodes that could potentially become the new leader, and they should be moved forward using `pg_replication_slot_advance()` function. One could disable feature and get back to the old behavior by setting `member_slots_ttl` to `0`.
27 lines
1.5 KiB
Gherkin
27 lines
1.5 KiB
Gherkin
Feature: nostream node
|
|
|
|
Scenario: check nostream node is recovering from archive
|
|
When I start postgres0
|
|
And I configure and start postgres1 with a tag nostream true
|
|
Then "members/postgres1" key in DCS has replication_state=in archive recovery after 10 seconds
|
|
And replication works from postgres0 to postgres1 after 30 seconds
|
|
|
|
@slot-advance
|
|
Scenario: check permanent logical replication slots are not copied
|
|
When I issue a PATCH request to http://127.0.0.1:8008/config with {"postgresql": {"parameters": {"wal_level": "logical"}}, "slots":{"test_logical":{"type":"logical","database":"postgres","plugin":"test_decoding"}}}
|
|
Then I receive a response code 200
|
|
When I run patronictl.py restart batman postgres0 --force
|
|
Then postgres0 has a logical replication slot named test_logical with the test_decoding plugin after 10 seconds
|
|
When I configure and start postgres2 with a tag replicatefrom postgres1
|
|
Then "members/postgres2" key in DCS has replication_state=streaming after 10 seconds
|
|
And postgres1 does not have a replication slot named test_logical
|
|
And postgres2 does not have a replication slot named test_logical
|
|
|
|
@slot-advance
|
|
Scenario: check that slots are written to the /status key
|
|
Given "status" key in DCS has postgres0 in slots
|
|
And "status" key in DCS has postgres2 in slots
|
|
And "status" key in DCS has test_logical in slots
|
|
And "status" key in DCS has test_logical in slots
|
|
And "status" key in DCS does not have postgres1 in slots
|