* Convert postgresql.py into a package
* Factor out cancellable process into a separate class
* Factor out connection handler into a separate class
* Move postmaster into postgresql package
* Factor out pg_rewind into a separate class
* Factor out bootstrap into a separate class
* Factor out slots handler into a separate class
* Factor out postgresql config handler into a separate class
* Move callback_executor into postgresql package
This is just a careful refactoring, without code changes.
if the `etcd.use_proxies` is set to true, Patroni will stick to the list of hosts specified in the `etcd.hosts` and avoid doing topology discovery. Such mode might be useful when you know that you connect to the etcd cluster via the set of proxies or when th etcd cluster has static topology.
* Use `shutil.move` instead of `os.replace`, which is available only from 3.3
* Introduce standby-leader health-check and consul service
* Improve unit tests, some lines were not covered
* rename `assertEquals` -> `assertEqual`, due to deprecation warning
We already have a lot of logic in place to prevent failover in such case and restore all keys, but an accidental removal of `/config` key was effectively switching off pause mode for 1 cycle of HA loop.
It is very easy to get current timeline on the master by executing
```sql
SELECT ('x' || SUBSTR(pg_walfile_name(pg_current_wal_lsn()), 1, 8))::bit(32)::int
```
Unfortunately the same method doesn't work when postgres is_in_recovery. Therefore we will use replication connection for that on the replicas. In order to avoid opening and closing replication connection on every HA loop we will cache the result if its value matches with the timeline of the master.
Also this PR introduces a new key in DCS: `/history`. It will contain a json serialized object with timeline history in a format similar to the usual history files. The differences are:
* Second column is the absolute wal position in bytes, instead of LSN
* Optionally there might be a fourth column - timestamp, (mtime of history file)
Make it possible to cancel a running task if you want to reinitialize replica.
There are two possible ways to trigger it:
1. patronictl will ask whether you want to cancel already running task if an attempt to trigger reinitialize has failed
2. if you are using `--force` argument with `patronictl reinit`
This list will be used for initial discovery of etcd cluster members.
If for some reason during work this list of hosts has been exhausted (during work), Patroni will return to initial list.
In addition to that improve ipv6 compatibility by using a special function for splitting host and port.
Fixes https://github.com/zalando/patroni/issues/523
* Use ConfigMaps or Endpoins for leader elections and to keep cluster state
* Label pods with a postgres role
* change behavior of pip install. From now on it will not install all dependencies, you have to specify explicitly DCS you want to use Patroni with: `pip install patroni[etcd,zookeeper,kubernetes]`
When all etcd servers refuse connections during watch the call will fail with an exception and will be immediately retried. This creates a huge amount of log spam potentially creating additional issues on top of losing the DCS. This patch takes note if etcd failures are repeating and starting from the second failure will sleep for a second before retrying. It additionally omits the stack trace after the first failure in a streak of failures.
Change hostnames by ip addresses was causing certificate verification to
fail. Instead of doing it we will better monkey patch urllib3
functionality which does name resolution. It should work without
problems even for https connection.
Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind.
This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests.
* Introduce timeout parameter to restart.
When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.
* Add https and auth support for etcd
Also implement support of PATRONI_ETCD_URL and PATRONI_ETCD_SRV
environment variables
* Implement etcd.proxy etcd.cacert, etcd.cert and etcd.key support
Now it should be possible to set up fully encrypted connection to etcd
with authorization.
Previously replicas were always watching for leader key (even if the
postgres was not in the running there). It was not a big issue, but it
was not possible to interrupt such watch in cases if the postgres
started up or stopped successfully. Also it was delaying update_member
call and we had kind of stale information in DCS up to `loop_wait`
seconds. This commit changes such behavior. If the async_executor is
busy by starting/stopping or restarting postgres we will not watch for
leader key but waiting for event from async_executor up to `loop_wait`
seconds. Async executor will fire such event only in case if the
function it was calling returned something what could be evaluated to
boolean True.
Such functionality is really needed to change the way how we are making
decision about necessity of pg_rewind. It will require to have a local
postgres running and for us it is really important to get such
notification as soon as possible.
Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions.
To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over.
We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions.
On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS.
When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock.
Added acceptance tests and documentation.
Implementation by @ants with extensive review by @CyberDem0n.
This error is send by etcd when Patroni is doing "watch" on leader key
which is never updated after creation and etcd cluster receives a lot of
updates, what cleans history of events.
Instead of doing watch on modifiedIndex + 1 we will do watch on X-Etcd-Index,
which is probably still available...
There is no point to try to update topology until original request is
not performed. Also for us it is more important to execute original
request rather then keep topology of etcd cluster in sync.
In addition to that implement the same retry-timeout logic in the
`machines` property which already is used in `api_execute` method.
Client class takes care about retrying when connection to the etcd node
fails. It calculates amount of retries and timeout depending on etcd
cluster size.
Etcd class should not retry when EtcdConnectionFailed exception is
raised (this case is already handled in the Client).
Besides that adjust retry timeouts in the Client class.
where it is not necessary (test_ha, test_ctl, etc...)
It will simplyfy further refactoring and make it possible to install
implementations of AbstractDCS independant of each other.
Somehow when you import only urllib3 it's not possible work with
urllib3.exceptions.HTTPError exception (it looks like it is imported
from some other place. from urllib3.exceptions import HTTPError solves
the problem.
In addition to that rename confusing `Etcd.client` and
`ZooKeeper.client` into `_client`. This attribute is available from
AbstractDCS and people had wrong impression that it provides the same
interface for different DCS implementations, which is obviously not the
case. For Etcd it has type etcd.Client and for ZooKeeper - KazooClient.
Despite this release was very buggy it has really nice features:
* EtcdWatchTimedOut exception is raised when `watch` call timed out
* it supports SRV autodiscovery
Since we already implemented our own SRV discovery this feature is not
really interesting for us, but it solves the problem of having two
requirements files for different python versions, because python-etcd
will install dnspython or dnspython3 as a dependency.
In order to fix https://github.com/jplana/python-etcd/issues/152 and
https://github.com/jplana/python-etcd/pull/154 I had to override
`api_execute` method.
Previously, patroni would die after receiving an exception
other than RetryFailedError, etcd.EtcdException from etcd.
We have observed an AttributeError raised by etcd on some
occasions. With this change, we demote ourselves, but not
terminate on such exceptions.
If the namespace is not specified in a config file /service/ would be
used.
Also it's possible to use just '/' as a namespace. It means we would
have following structure:
/scope1
/scope2
...
Such situation could happen if we replaced all etcd nodes except one
which was used by patroni. After replacing the last node patroni will
try to execute request on all other nodes from machines_cache but non of
them are available. Michines cache would became empty and patroni will
stick to the latest node which was available in the machines_cache and
will never try to refresh machines_cache from dns for example.
Currently machines cache is refreshed only when one request to the etcd
cluster has failed, but probably it should be done periodically, for
example every minute...
1. run touch_member from the main loop
2. move code which takes care about long tasks into separate class
3. change format of data stored in a DCS: use json instead of url
4. change Member class: from now it deserialize everything into data property
5. rework API: from now it takes into account state of the current node in a dcs