- It modifies the Dockerfile and entrypoint slightly to allow for OpenShift SCCs to operate correctly
- It adds 2 template examples that can be easily modified by changing parameters
Fixes#572
Recently 2.6.0 was release which changes the way how create_connection method is called. Before it was passing two arguments, and in the new version all argument names are specified explicitly.
Permanent replication slots are preserved on failover/switchover, that is Patroni on the new primary will create configured replication slots right after doing promote.
Slots could be configured with the help of `patronictl edit-config`.
The initial configuration could be also done in the `bootstrap.dcs`
```yaml
slots:
permanent_physical_1:
type: physical
permanent_logical_1:
type: logical
database: foo
plugin: pgoutput
```
It is the responsibility of the operator to make sure that there are no clashes in names between replication slots automatically created by Patroni for members and permanent replication slots.
Closes https://github.com/zalando/patroni/issues/656
* Always run `pg_rewind` against the remote master
* Always use the remote master as the source when "recovering" stopped standby leader
* Use remote master as the source when "recovering" the node in the unhealthy cluster
* Use the local dbname as the fallback when doing `pg_rewind` from the remote master
* `no_replication_slot` is the allowed key in the `RemoteMember` object
* Make it possible to "bootstrap" the new `standby_cluster` with existing (and valid) data directory. There is one prerequisite though, there should be no `patroni.dynamic.json` file in it!
* pgbackrest support
pgBackrest can restore in existing $PGDATA folder, this allows speedy restore as files which have not changed since last backup are skipped, to support this feature new param keep_data has been introduced. When keep_data=True, cleanup of $PGDATA will be skipped.
Patroni passes some extra parameters to custom_replica_methods when calling, this causes an error due to pgbackrest strict parameter checking. New param no_params=True can be set to skip parameters passing.
Fixes https://github.com/zalando/patroni/issues/625
* Use `shutil.move` instead of `os.replace`, which is available only from 3.3
* Introduce standby-leader health-check and consul service
* Improve unit tests, some lines were not covered
* rename `assertEquals` -> `assertEqual`, due to deprecation warning
Implementation of "standby cluster" described in #657. Standby cluster consists
of a "standby leader", that replicates from a "remote master" (which is not a
part of current patroni cluster and can be anywhere), and cascade replicas,
that replicate from the corresponding standby leader. "Standby leader" behaves
pretty much like a regular leader, which means that it holds a leader lock in
DSC, in case if disappears there will be an election of a new "standby
leader".
One can define such a cluster using the section "standby_cluster" in patroni
config file. This section provides parameters for standby cluster, that will be
applied only once during bootstrap and can be changed only through DSC.
In synchronos_mode_strict we put '*' into synchronos_standby_names, what makes one connection 'sync' and other connections 'potential'.
The code picking up the correct sync standby didn't consider 'potential' as a good candidate.
Fixes: https://github.com/zalando/patroni/issues/789
Add a field to an api to figure out if a master is there from patroni point
of view. It can be useful, when you have an alert, based on Auto Scaling
Groups, and then ASG decided to shutdown the current master, spin up a
new instance but the current master shutdown is stuck. In this situation
the current master is no longer a part of ASG, but patroni and Postgres
are still alive on the instance, which means a new replica will not be
promoted yet - this will lead to a false alert, saying that your cluster
doesn't have any master node.
This adds INFO log messages that clearly state if configuration values were seen as changed by Patroni after SIGHUP/reload and warrant reloading (or if nothing was changed an no reloading is necessary).
This ended up being a lot simpler than I had imagined once I found postgresql.py:reload_config().
I add a log line in config.py:reload_local_configuration() since that function will short-circuit the process early if the local config wasn't changed. But the final determination of whether or not values have changed and need reloading is in postgresql.py:reload_config().
If Patroni gets partitioned it starts receiving stale information from DCS.
We can't use this information to determine that we have the leader key.
Instead, we will record in Ha object the actual state of acquire/update lock and report as a leader only if it was successful.
P.S. despite responding with 200 on `GET /master` postgres was still running read-only.
In really rare cases it was causing following behavior:
```
2018-07-31 10:35:30,302 INFO: starting as a secondary
2018-07-31 10:35:30,309 INFO: Lock owner: postgresql0; I am postgresql1
2018-07-31 10:35:30,310 INFO: Demoting master during restarting after failure
2018-07-31 10:35:30,381 INFO: postmaster pid=17709
2018-07-31 10:35:30,386 INFO: lost leader lock during restarting after failure
2018-07-31 10:35:30,388 ERROR: Exception during CHECKPOINT
```
* async is a keyword in python3.7
Setting up patroni (1.4.4-1) ...
File "/usr/lib/python3/dist-packages/patroni/ha.py", line 610
'offline': dict(stop='fast', checkpoint=False, release=False, offline=True, async=False),
^
SyntaxError: invalid syntax
Fix#750 by replacing dict member "async" with "async_req".
* requirements.txt: Update for new kubernetes version compatible with 3.7
'patronictl remove' deletes the cluster configuration (stored either in configmaps or endpoints) and cannot be run from the postgres pod w/o 'delete' on those objects being granted to the pod service account.
Add an EnvironmentFile directive to read in a configuration file with environment variables. The "-" prefix means it can proceed if the file doesn't exist.
This would allow users to keep sensitive information like the SUPERUSER/REPLICATION passwords in the config file separate from a YAML file that might be deployed from source control.
Patroni is relying on params to determinte timeout and amount of retries when executing api requests to consul. Starting from v1.1.0 python-consul changed internal API and started
using `list` instead of `dict` to pass query parameters. Such change broke "watch" functionality.
Fixes https://github.com/zalando/patroni/issues/742 and
https://github.com/zalando/patroni/issues/734
Currently the informational message logged is beyond confusing. This
improves the logging so there is some indication what this message is
about and that it is somewhat normal. Changes by @ants
Fix the discrepancy for the values of max_wal_senders and max_replication_slots between the sample postgres.yml files and hard-coded defaults in Patroni, bumping the former to 10.
Contributed by @dtseiler