* Ensure that nofailover will always be used if both nofailover and
failover_priority tags are provided
* Call _validate_failover_tags from reload_local_configuration() as well
* Properly check values in the _validate_failover_tags(): nofailover value should be casted to boolean like it is done when accessed in other places
If enabled it will allow Patroni to cope with DCS outages.
In case of a DCS outage the leader tries to call all remaining members in the cluster via API and if all of them respond with success the leader will not be demoted.
The failsafe_mode could be enabled by running
```sh
patronictl edit-config -s failsafe_mode=true
```
or by calling the `/config` REST API endpoint.
Co-authored-by: Polina Bungina <bungina@gmail.com>
They are frequently failing because sometimes replicas are a bit slow realizing that they are synchronous. Instead of instroducing more sleeps we will poll for required http status code with some timeout.
If openssl binary is available use it to generate a self-signed certificate. Use it to protect Patroni REST API
(`verify_client: required`).
In case if Postgres is compiled with SSL support enable it in the configuration and configure pg_hba.conf to check client certificates (`verify-ca`) in addition to passwords. Also configure superuser/replication/rewind users to use client certificates and verify server certificate (`verify-ca`)
There is a known [vector of attact](https://pganalyze.com/blog/5mins-postgres-security-patch-releases-pgspot-pghostile) by creating functions and/or operators in a public scheme with the same name and signature as corresponding objects in `pg_catalog`.
Since Patroni is heavily relying on superuser connections we want to mitigate it by enforcing `search_path=pg_catalog` for all connections created by Patroni (except replication connections). It is achieved by introducing a new function, that wraps psycopg.connect() and appends ` -c search_path=pg_catalog` to `options` parameter.
In addition to that, we set connection.autocommit to True before returning it.
Windows doesn't support `SIGTERM`, but our behave tests in majority of cases relying on Patroni graceful shutdown.
In order to emulate the behaviour we introduced the new REST API endpoint `POST /sigterm`. The endpoint works only on Windows and when `BEHAVE_DEBUG` environment variable is set.
Besides that some minor adjustments in behave tests were done. Mainly related to backslash-slash handling.
In addition to that improve test coverage on Windows by properly mocking access to filesystem and avoiding calling
`subprocess.call()`. Specifically, symlink creation on Windows requires Admin privileges and there is no `true.exe`.
* Use ConfigMaps or Endpoins for leader elections and to keep cluster state
* Label pods with a postgres role
* change behavior of pip install. From now on it will not install all dependencies, you have to specify explicitly DCS you want to use Patroni with: `pip install patroni[etcd,zookeeper,kubernetes]`
* Replace pytz.UTC with dateutil.tz.tzutc, it helps to reduce memory by more than 4Mb...
* fix check of python version: 0x0300000 => 0x3000000
* Update leader key before restart and demote
Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions.
To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over.
We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions.
On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS.
When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock.
Added acceptance tests and documentation.
Implementation by @ants with extensive review by @CyberDem0n.
Fix return value in the should_run_scheduled_action and the comments.
Correct the json composition in the scheduled_restart test.
Fix the delete in case there is no scheduled restart.
Fix the usage of format in the logger output.
Fix the indentation in the evaluate_scheduled_restart.
Fix the condition related to the body_is_optional in the do_POST_restart.
Fix a few typos in the error messages.
Fix the _read_json_content
Make the scheduled restart unit-tests a bit less ugly
The scheduled restart data structures are now independent of those
used by the normal restarts. This would be fixed in subsequent
commits.
Add the behave tests, that cover the POST /restart (but not DELETE).