The KUBERNETES_ environment variables are not required for PostgreSQL, yet having them exposed to the postmaster will also expose them to backends and to regular database users (using pl/perl for example).
Consul client uses urllib3 with a verify=True by default. When
SSL verification is disabled with verify=False, we can see
CERTIFICATE_VERIFY_FAILED exceptions. With urllib3 1.19.1-1 on
Debian Stretch, the "cert_reqs" argument must be explicitaly set
to ssl.CERT_NONE to effectively disable SSL verification.
It is a regular issue that primary is recycling WALs when one of the replicas is down for a long time. So far there were only two solutions for such a problem and both of them are not perfect:
1. Increase `wal_keep_segments`, but it is hard to guess the good value.
2. Use continuous archiving and PITR, but it is not always possible.
This PR is introducing the way to solve the problem for static clusters, with a fixed number of nodes and names that never change. You just need to list the names of all nodes in the `slots` so the primary will not remove the slot when the node is down (not registered in DCS).
Of course, the primary will not create the permanent slot which is matching its own name.
Usage example: let's assume you have a cluster with nodes named *abc1*, *abc2*, and *abc3*.
You have to run `patronictl edit-config` and put the following snippet into the configuration:
```yaml
slots:
abc1:
type: physical
abc2:
type: physical
abc3:
type: physical
```
If the node *abc2* is the primary, it will always create slots for *abc1* and *abc3* even if they are not running, but will not create slot *abc2*.
Other nodes will behave the same.
Close#280
OpenShift enforces securityContext.fsGroups for block devices and sets group stickybits for volumeMounts.
This leads to patroni pods failing to start after the first restart:
> 2020-01-13 14:46:13.695 UTC [143] FATAL: data directory "/home/postgres/pgdata/pgroot/data" has invalid permissions
2020-01-13 14:46:13.695 UTC [143] DETAIL: Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).
A initContainer which fixes the OpenShift tampering solves the issue. I stole the solution from the stable postgres helm chart:
https://github.com/helm/charts/pull/14540/files
Tested on OpenShift v3.11
Note: This error does not occur when using shared filesystems (like NFS)
It was used only to add the local timezone to the datetime specified in the patronictl for scheduled switchover or restart.
The `dateutil.tz.tzlocal()` does the same job equally well.
During the shutdown Patroni is trying to update its status in the DCS.
If the DCS is inaccessible an exception might be raised. Lack of exception handling prevents logger thread from stopping.
Fixes https://github.com/zalando/patroni/issues/1344
It might be that they are defined by the extension and therefore the unit is not necessarily is the string.
It also could be that change of the value requires a restart (for example pg_stat_statements.max).
it serves two purposes:
1. We don't want accidentally break the thread
2. During the shutdown socket.gaierror become unresolvable and nasty exceptions are raised
Close: https://github.com/zalando/patroni/issues/1353
The standby_leader was already doing it from the beginning feature existed. Not doing the same on replicas might prevent them from catching up with standby leader due to WALs being recycled.
In addition to that apply the same strategy to archive_cleanup_command.
Upon the start of Patroni and Postgres make sure that unix_socket_directories and stats_temp_directory exist or try to create them. Patroni will exit if failed to create them.
Close https://github.com/zalando/patroni/issues/863
That required a refactoring of `Config` and `Patroni` classes. Now one has to explicitely create the instance of `Config` before creating `Patroni`.
The Config file can optionally call the validate function.