mirror of
https://github.com/optim-enterprises-bv/patroni.git
synced 2026-01-10 17:41:26 +00:00
Current problem of Patroni that strikes many people is that it removes replication slot for member which key is expired from DCS. As a result, when the replica comes back from a scheduled maintenance WAL segments could be already absent, and it can't continue streaming without pulling files from archive. With PostgreSQL 16 and newer we get another problem: logical slot on a standby node could be invalidated if physical replication slot on the primary was removed (and `pg_catalog` vacuumed). The most problematic environment is Kubernetes, where slot is removed nearly instantly when member Pod is deleted. So far, one of the recommended solutions was to configure permanent physical slots with names that match member names to avoid removal of replication slots. It works, but depending on environment might be non-trivial to implement (when for example members may change their names). This PR implements support of `member_slots_ttl` global configuration parameter, that controls for how long member replication slots should be kept when the member key is absent. Default value is set to `30min`. The feature is supported only starting from PostgreSQL 11 and newer, because we want to retain slots not only on the leader node, but on all nodes that could potentially become the new leader, and they should be moved forward using `pg_replication_slot_advance()` function. One could disable feature and get back to the old behavior by setting `member_slots_ttl` to `0`.
84 lines
5.3 KiB
Gherkin
84 lines
5.3 KiB
Gherkin
Feature: permanent slots
|
|
Scenario: check that physical permanent slots are created
|
|
Given I start postgres0
|
|
Then postgres0 is a leader after 10 seconds
|
|
And there is a non empty initialize key in DCS after 15 seconds
|
|
When I issue a PATCH request to http://127.0.0.1:8008/config with {"slots":{"test_physical":0,"postgres3":0},"postgresql":{"parameters":{"wal_level":"logical"}}}
|
|
Then I receive a response code 200
|
|
And Response on GET http://127.0.0.1:8008/config contains slots after 10 seconds
|
|
When I start postgres1
|
|
And I start postgres2
|
|
And I configure and start postgres3 with a tag replicatefrom postgres2
|
|
Then postgres0 has a physical replication slot named test_physical after 10 seconds
|
|
And postgres0 has a physical replication slot named postgres1 after 10 seconds
|
|
And postgres0 has a physical replication slot named postgres2 after 10 seconds
|
|
And postgres2 has a physical replication slot named postgres3 after 10 seconds
|
|
|
|
@slot-advance
|
|
Scenario: check that logical permanent slots are created
|
|
Given I run patronictl.py restart batman postgres0 --force
|
|
And I issue a PATCH request to http://127.0.0.1:8008/config with {"slots":{"test_logical":{"type":"logical","database":"postgres","plugin":"test_decoding"}}}
|
|
Then postgres0 has a logical replication slot named test_logical with the test_decoding plugin after 10 seconds
|
|
|
|
@slot-advance
|
|
Scenario: check that permanent slots are created on replicas
|
|
Given postgres1 has a logical replication slot named test_logical with the test_decoding plugin after 10 seconds
|
|
Then Logical slot test_logical is in sync between postgres0 and postgres1 after 10 seconds
|
|
And Logical slot test_logical is in sync between postgres0 and postgres2 after 10 seconds
|
|
And Logical slot test_logical is in sync between postgres0 and postgres3 after 10 seconds
|
|
And postgres1 has a physical replication slot named test_physical after 2 seconds
|
|
And postgres2 has a physical replication slot named test_physical after 2 seconds
|
|
And postgres3 has a physical replication slot named test_physical after 2 seconds
|
|
|
|
@slot-advance
|
|
Scenario: check permanent physical slots that match with member names
|
|
Given postgres0 has a physical replication slot named postgres3 after 2 seconds
|
|
And postgres1 has a physical replication slot named postgres0 after 2 seconds
|
|
And postgres1 has a physical replication slot named postgres2 after 2 seconds
|
|
And postgres1 has a physical replication slot named postgres3 after 2 seconds
|
|
And postgres2 has a physical replication slot named postgres0 after 2 seconds
|
|
And postgres2 has a physical replication slot named postgres3 after 2 seconds
|
|
And postgres2 has a physical replication slot named postgres1 after 2 seconds
|
|
And postgres3 has a physical replication slot named postgres0 after 2 seconds
|
|
And postgres3 has a physical replication slot named postgres1 after 2 seconds
|
|
And postgres3 has a physical replication slot named postgres2 after 2 seconds
|
|
|
|
@slot-advance
|
|
Scenario: check that permanent slots are advanced on replicas
|
|
Given I add the table replicate_me to postgres0
|
|
When I get all changes from logical slot test_logical on postgres0
|
|
And I get all changes from physical slot test_physical on postgres0
|
|
Then Logical slot test_logical is in sync between postgres0 and postgres1 after 10 seconds
|
|
And Physical slot test_physical is in sync between postgres0 and postgres1 after 10 seconds
|
|
And Logical slot test_logical is in sync between postgres0 and postgres2 after 10 seconds
|
|
And Physical slot test_physical is in sync between postgres0 and postgres2 after 10 seconds
|
|
And Logical slot test_logical is in sync between postgres0 and postgres3 after 10 seconds
|
|
And Physical slot test_physical is in sync between postgres0 and postgres3 after 10 seconds
|
|
And Physical slot postgres1 is in sync between postgres0 and postgres2 after 10 seconds
|
|
And Physical slot postgres1 is in sync between postgres0 and postgres3 after 10 seconds
|
|
And Physical slot postgres3 is in sync between postgres2 and postgres0 after 20 seconds
|
|
And Physical slot postgres3 is in sync between postgres2 and postgres1 after 10 seconds
|
|
|
|
@slot-advance
|
|
Scenario: check that permanent slots and member slots are written to the /status key
|
|
Given "status" key in DCS has test_physical in slots
|
|
And "status" key in DCS has postgres0 in slots
|
|
And "status" key in DCS has postgres1 in slots
|
|
And "status" key in DCS has postgres2 in slots
|
|
And "status" key in DCS has postgres3 in slots
|
|
|
|
@slot-advance
|
|
Scenario: check that only non-permanent member slots are written to the retain_slots in /status key
|
|
And "status" key in DCS has postgres0 in retain_slots
|
|
And "status" key in DCS has postgres1 in retain_slots
|
|
And "status" key in DCS has postgres2 in retain_slots
|
|
And "status" key in DCS does not have postgres3 in retain_slots
|
|
|
|
Scenario: check permanent physical replication slot after failover
|
|
Given I shut down postgres3
|
|
And I shut down postgres2
|
|
And I shut down postgres0
|
|
Then postgres1 has a physical replication slot named test_physical after 10 seconds
|
|
And postgres1 has a physical replication slot named postgres0 after 10 seconds
|
|
And postgres1 has a physical replication slot named postgres3 after 10 seconds
|