We are using only one function from there, `find_executable()` and it is better to implement a similar function in Patroni rather than add `distutils` module into requirements.txt
* Convert postgresql.py into a package
* Factor out cancellable process into a separate class
* Factor out connection handler into a separate class
* Move postmaster into postgresql package
* Factor out pg_rewind into a separate class
* Factor out bootstrap into a separate class
* Factor out slots handler into a separate class
* Factor out postgresql config handler into a separate class
* Move callback_executor into postgresql package
This is just a careful refactoring, without code changes.
* Use `shutil.move` instead of `os.replace`, which is available only from 3.3
* Introduce standby-leader health-check and consul service
* Improve unit tests, some lines were not covered
* rename `assertEquals` -> `assertEqual`, due to deprecation warning
Implementation of "standby cluster" described in #657. Standby cluster consists
of a "standby leader", that replicates from a "remote master" (which is not a
part of current patroni cluster and can be anywhere), and cascade replicas,
that replicate from the corresponding standby leader. "Standby leader" behaves
pretty much like a regular leader, which means that it holds a leader lock in
DSC, in case if disappears there will be an election of a new "standby
leader".
One can define such a cluster using the section "standby_cluster" in patroni
config file. This section provides parameters for standby cluster, that will be
applied only once during bootstrap and can be changed only through DSC.
* Use scope from config file when listing members
* Add version command to patronictl
* Only delete leader on shutdown when we have the lock to avoid exceptions when leader key does not exist
* Add a timestamp option to list command.
* YAML format for patronictl output
* Fix API request to get version
Make it possible to cancel a running task if you want to reinitialize replica.
There are two possible ways to trigger it:
1. patronictl will ask whether you want to cancel already running task if an attempt to trigger reinitialize has failed
2. if you are using `--force` argument with `patronictl reinit`
* Show information about scheduled failover and maintenance mode when showing list of cluster members. Fixes https://github.com/zalando/patroni/issues/557
* Fix postgres version check functions (postgres 10 and above compatibility) and apply pep8 formatting to the tests.
* Bump some configuration parameters to match with postgres 10 defaults.
* Fix name of contributor in release notes.
Current UI to change cluster configuration is somewhat unfriendly, involving a curl command, knowing the REST API endpoint, knowing the specific syntax to call it with and writing a JSON document. I added two commands in this branch to make this a bit easier, `show-config` and `edit-config` (names are merely placeholders, any opinions on better ones?).
* `patronictl show-config clustername` fetches the config from DCS, formats it as YAML and outputs it.
* `patronictl edit-config clustername` fetches the config, formats it as YAML, invokes $EDITOR on it, then shows user the diff and after confirmation applies the changed config to DCS, guarding for concurrent modifications.
* `patronictl edit-config clustername --set synchronous_mode=true --set postgresql.use_slots=true` will set the specific key-value pairs.
There are also some UI capabilities I'm less sure of, but included them here as I already implemented them.
* If output is a tty then the diffs are colored. I'm not sure if this feature is cool enough to pull the weight of adding a dependency on cdiff. Or maybe someone knows of another more task focused diff coloring library?
* `patronictl edit-config clustername --pg work_mem=100MB` - Shorthand for `--set postgresql.parameters.work_mem=100MB`
* `patronictl edit-config clustername --apply changes.yaml` - apply changes from a yaml file.
* `patronictl edit-config clustername --replace new-config.yaml` - replace config with new version.
Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind.
This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests.
* Introduce timeout parameter to restart.
When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.
Previously replicas were always watching for leader key (even if the
postgres was not in the running there). It was not a big issue, but it
was not possible to interrupt such watch in cases if the postgres
started up or stopped successfully. Also it was delaying update_member
call and we had kind of stale information in DCS up to `loop_wait`
seconds. This commit changes such behavior. If the async_executor is
busy by starting/stopping or restarting postgres we will not watch for
leader key but waiting for event from async_executor up to `loop_wait`
seconds. Async executor will fire such event only in case if the
function it was calling returned something what could be evaluated to
boolean True.
Such functionality is really needed to change the way how we are making
decision about necessity of pg_rewind. It will require to have a local
postgres running and for us it is really important to get such
notification as soon as possible.
* Replace pytz.UTC with dateutil.tz.tzutc, it helps to reduce memory by more than 4Mb...
* fix check of python version: 0x0300000 => 0x3000000
* Update leader key before restart and demote
Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions.
To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over.
We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions.
On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS.
When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock.
Added acceptance tests and documentation.
Implementation by @ants with extensive review by @CyberDem0n.
When Patroni was "joining" already running postgres it was not calling
callbacks, what in some cases causing issues (callback could be used to
change routing/load-balancer or assign/remove floating (service) ip.
In addition to that we should `start` postgres instead of `restart`-ing
it when doing recovery, because in this case 'on_start' callback should
be called, instead of 'on_restart'
In the original code we were parsing/deparsing url-style connection
strings back and forth. That was not really resource greedy but rather
annoying. Also it was not really obvious how to switch all local
connections to unix-sockets (preferably).
This commit isolates different use-cases of working with connection
strings and minimizes amount of code parsing and deparsing them. Also it
introduces one new helper method in the `Member` object - `conn_kwargs`.
This method can accept as a parameter dict object with credentials
(username and password). As a result it returns dict object which could
be used by `psycopg2.connect` or for building connection urls for
pg_rewind, pg_basebackup or some other replica creation methods.
Params for local connection are builded in the `_local_connect_kwargs`
method and could be changed to unix-socket later easily.
There is no point to try to update topology until original request is
not performed. Also for us it is more important to execute original
request rather then keep topology of etcd cluster in sync.
In addition to that implement the same retry-timeout logic in the
`machines` property which already is used in `api_execute` method.
- We don't want to export RestApi object, since it initializes the
socket and listens on it.
- Change get_dcs, so that the explicit scope passed to it will take
priority over the one in the configuration file.
Originally Exhibitor was supported in the ZooKeeper class and
configuration for Exhibitor was taken also from `zookeeper` section in
the yaml config file. In fact, Exhibitor just extends ZooKeeper and now
it is reflected in the code and also Exhibitor got it's own section in
the config.yaml file. It will make it easier to configure Exhibitor
hosts and port via environment variables when PR#211 will be merged.