The latest timeline is calculated from the `/history` key in DCS. In case there is no such key or it contains some garbage we consider the node healthy.
Closes https://github.com/zalando/patroni/issues/890
It allows changing logging settings in runtime by updating config and doing reload or sending `SIGHUP` to the Patroni process.
Important! Environment configuration names related to logging were renamed and documentation accordingly updated. For compatibility reasons Patroni still accepts `PATRONI_LOGLEVEL` and `PATRONI_FORMAT`, but some other variables related to logging, which were introduced only
recently (between releases), will stop working. I think it is ok, since we didn't release the new version yet and therefore it is very unlikely that somebody is using them except authors of corresponding PRs.
Example of log section in the config file:
```yaml
log:
dir: /where/to/write/patroni/logs # if not specified, write logs to stderr
file_size: 50000000 # 50MB
file_num: 10 # keep history of 10 files
dateformat: '%Y-%m-%d %H:%M:%S'
loggers: # increase log verbosity for etcd.client and urllib3
etcd.client: DEBUG
urllib3: DEBUG
```
It gives users the option to send Patroni application logs to a File instead of Standard Out. There are three environment variables that can be set to enable and configure file logging.
1. `PATRONI_FILE_LOG_DIR`: Path to a directory that is writeable by the executing user. Having this variable set is what activates file logging.
2. `PATRONI_FILE_LOG_NUM`: This is a rolling file logger. This variable dictates how many log files are retained.
3. `PATRONI_FILE_LOG_SIZE`: This variable dictates the size at which the logs will roll.
If `PATRONI_FILE_LOG_DIR` is not set than Patroni will log to stderr (default behavior does not change)
Closes https://github.com/zalando/patroni/issues/902
Permanent replication slots are preserved on failover/switchover, that is Patroni on the new primary will create configured replication slots right after doing promote.
Slots could be configured with the help of `patronictl edit-config`.
The initial configuration could be also done in the `bootstrap.dcs`
```yaml
slots:
permanent_physical_1:
type: physical
permanent_logical_1:
type: logical
database: foo
plugin: pgoutput
```
It is the responsibility of the operator to make sure that there are no clashes in names between replication slots automatically created by Patroni for members and permanent replication slots.
Closes https://github.com/zalando/patroni/issues/656
* pgbackrest support
pgBackrest can restore in existing $PGDATA folder, this allows speedy restore as files which have not changed since last backup are skipped, to support this feature new param keep_data has been introduced. When keep_data=True, cleanup of $PGDATA will be skipped.
Patroni passes some extra parameters to custom_replica_methods when calling, this causes an error due to pgbackrest strict parameter checking. New param no_params=True can be set to skip parameters passing.
Fixes https://github.com/zalando/patroni/issues/625
Implementation of "standby cluster" described in #657. Standby cluster consists
of a "standby leader", that replicates from a "remote master" (which is not a
part of current patroni cluster and can be anywhere), and cascade replicas,
that replicate from the corresponding standby leader. "Standby leader" behaves
pretty much like a regular leader, which means that it holds a leader lock in
DSC, in case if disappears there will be an election of a new "standby
leader".
One can define such a cluster using the section "standby_cluster" in patroni
config file. This section provides parameters for standby cluster, that will be
applied only once during bootstrap and can be changed only through DSC.
pgBackRest's restore command generates the appropriate recovery.conf based
on the parameters you provide to pgBackRest. When calling pgBackRest's restore command
via Patroni's custom bootstrap, it deletes that recovery.conf. Specifying the recovery.conf
information in the patroni.yml is less than ideal. It prevent's leveraging pgBackRests
work to ensure recovery.conf files are properly generated. It also can lead to transient
config data in the patroni.yml under certain restore cases, such as a PITR restore
of Cluster B to Cluster A, where the restore_commnand in A needs to reference B.
The parameter is optional. The default behavior is to delete the recovery.conf.
Fixes https://github.com/zalando/patroni/issues/637
* `PATRONI_LOGLEVEL` - sets the general logging level
* `PATRONI_REQUESTS_LOGLEVEL` - sets the logging level for all HTTP requests e.g. Kubernetes API calls
This list will be used for initial discovery of etcd cluster members.
If for some reason during work this list of hosts has been exhausted (during work), Patroni will return to initial list.
In addition to that improve ipv6 compatibility by using a special function for splitting host and port.
Fixes https://github.com/zalando/patroni/issues/523
* Show information about scheduled failover and maintenance mode when showing list of cluster members. Fixes https://github.com/zalando/patroni/issues/557
* Fix postgres version check functions (postgres 10 and above compatibility) and apply pep8 formatting to the tests.
* Bump some configuration parameters to match with postgres 10 defaults.
* Fix name of contributor in release notes.
If list of checks is not specified, Consul will use "serfHealth" in addition to TTL based created by Patroni.
There are some cases when people want to sacrifice fast detection of network partitioning in favor of ability to tolerate network lags.
Fixes https://github.com/zalando/patroni/issues/522
* Only activate watchdog while master and not paused
We don't really need the protections while we are not master. This way
we only need to tickle the watchdog when we are updating leader key or
while demotion is happening.
As implemented we might fail to notice to shut down the watchdog if
someone demotes postgres and removes leader key behind Patroni's back.
There are probably other similar cases. Basically if the administrator
if being actively stupid they might get unexpected restarts. That seems
fine.
* Add configuration change support. Change MODE_REQUIRED to disable leader eligibility instead of closing Patroni.
Changes watchdog timeout during the next keepalive when ttl is changed. Watchdog driver and requirement can also be switched online.
When watchdog mode is `required` and watchdog setup does not work then the effect is similar to nofailover. Add watchdog_failed to status API to signify this. This is True only when watchdog does not work **AND** it is required.
* Reset implementation when config changed while active.
* Add watchdog safety margin configuration
Defaults to 5 seconds. Basically this is the maximum amount of time
that can pass between the calls to odcs.update_leader()` and
`watchdog.keepalive()`, which are called right after each other. Should
be safe for pretty much any sane scenario and allows the default
settings to not trigger watchdog when DCS is not responding.
* Cancel bootstrap if watchdog activation fails
The system would have demoted itself anyway the next HA loop. Doing it
in bootstrap gives at least some other node chance to try bootstrapping
in the hope that it is configured correctly.
If all nodes are unable to activate they will continue to try until the
disk is filled with moved datadirs. Perhaps not ideal behavior, but as
the situation is unlikely to resolve itself without administrator
intervention it doesn't seem too bad.