patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-28 02:20:04 +00:00

Files

Alexander Kukushkin fa88d80c4f Apply master_start_timeout when executing crash recovery (#1720 )

It is not very common, but the master Postgres might "crash" due to different reasons, like OOM, or out of disk space. Of course, there are chances that the current node holds some unreplicated data and therefore Patroni by default prefers to start Postgres on the leader node rather than doing a failover.

In order to be on the safe side Patroni always starts Postgres in recovery no matter whether the current node owns the leader lock or not. If the Postgres wasn't shut down cleanly, starting in recovery might fail, therefore in some cases as a workaround Patroni is executing a crash recovery by starting the postgres up in the single-user mode.

A few times we end up in the situation:
1. Master postgres crashed due to the out of disk space
2. Patroni starts crash recovery in a single-user mode
3. While doing crash-recovery Patroni keeps updating the leader lock

It makes Patroni stuck on step 3 and the manual intervention is required for recovering the cluster.

Patroni already has the option `master_start_timeout`, which controls for how long we let postgres stay in the `starting` state and after that Patroni might decide to release the leader lock if there are healthy replicas available which could take it over.

This PR makes the `master_start_timeout` option also work for crash recovery.

2020-09-30 08:04:27 +02:00

__init__.py

Advanced validation of PostgreSQL parameters (#1674 )

2020-09-01 16:26:57 +02:00

test_api.py

Release 2.0.0 (#1680 )

2020-09-02 15:35:04 +02:00

test_async_executor.py

Improve patronictl reinit (#576 )

2018-01-04 10:31:44 +01:00

test_aws.py

Get rid from requests module (#1296 )