- Simplify setup.py: remove unneeded features and get rid of deprecation warnings
- Compatibility with Python 3.10: handle `threading.Event.isSet()` deprecation
- Make sure setup.py could run without `six`: move Patroni class and main function to the `__main__.py`. The `__init__.py` will have only a few functions used by the Patroni class and from the setup.py
The only python-etcd3 client working directly via gRPC still supports only a single endpoint, which is not very nice for high-availability.
Since Patroni is already using a heavily hacked version of python-etcd with smart retries and auto-discovery out-of-the-box, I decided to enhance the existing code with limited support of v3 protocol via gRPC-gateway.
Unfortunately, watches via gRPC-gateway requires us to open and keep the second connection to the etcd.
Known limitations:
* The very minimal supported version is 3.0.4. On earlier versions transactions don't work due to bugs in grpc-gateway. Without transactions we can't do atomic operations, i.e. leader locks.
* Watches work only starting from 3.1.0
* Authentication works only starting from 3.3.0
* gRPC-gateway does not support authentication using TLS Common Name. This is because gRPC-proxy terminates TLS from its client so all the clients share a cert of the proxy: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/authentication.md#using-tls-common-name
* new node can join the cluster dynamically and become a part of consensus
* it is also possible to join only Patroni cluster (without adding the node to the raft), just comment or remove `raft.self_addr` for that
* when the node joins the cluster it is using values from `raft.partner_addrs` only for initial discovery.
* It is possible to run Patroni and Postgres on two nodes plus one node with `patroni_raft_controller` (without Patroni and Postgres). In such setup one can temporarily lose one node without affecting the primary.
During the shutdown Patroni is trying to update its status in the DCS.
If the DCS is inaccessible an exception might be raised. Lack of exception handling prevents logger thread from stopping.
Fixes https://github.com/zalando/patroni/issues/1344
That required a refactoring of `Config` and `Patroni` classes. Now one has to explicitely create the instance of `Config` before creating `Patroni`.
The Config file can optionally call the validate function.
* Implement proper tests for `multiprocessing.set_start_method()`
* Exclude some watchdog code from coverage (it is used only for behave tests)
* properly use os.path.join for windows compatibility
* import DCS modules in `features/environment.py` on demand. It allows to run behave tests against chosen DCS without installing all dependencies.
* remove some unused behave code
* fix some minor issues in the dcs.kubernetes module
It is possible that some config files are not controlled by Patroni and when somebody is doing reload via REST API or by sending SIGHUP to Patroni process the usual expectation is that postgres will also be reloaded, but it didn't happen when there were no changes in the postgresql section of Patroni config.
For example one might replace ssl_cert_file and ssl_key_file on the filesystem and starting from PostgreSQL 10 it just requires a reload, but Patroni wasn't doing it.
In addition to that fix the issue with handling of `wal_buffers`. The default value depends on `shared_buffers` and `wal_segment_size` and therefore Patroni was exposing pending_restart when the new value in the config was explicitly set to -1 (default).
Close https://github.com/zalando/patroni/issues/1198
The PatroniLogger object is instantiated in the Patroni constructor and down the road there might be a fatal error causing Patroni process to exit, but live thread prevents the normal shutdown.
In order to mitigate the issue and don't loose ability to use the logging infrastructure we will switch to QueueLogger only when the thread was explicitly started from the Patroni.run() method.
Continuation of https://github.com/zalando/patroni/pull/1178
Since it is based on Thread with daemon set to True, the shutdown of logger was very likely to happen too early, what was causing some lines not to appear at the destination.
Close https://github.com/zalando/patroni/issues/1173
A few times we observed that Patroni HA loop was blocked for a few minutes due to not being able to write logs to stderr. This is a very rare condition which we hit so far only on k8s. This commit makes Patroni resilient to such kind of problems. All log messages first are written into the in-memory queue and later they are asynchronously flushed into the stderr or file from a separate thread.
The maximum queue size is configurable and the default value is 1000. This should be enough to keep more than one hour of log messages with default settings and when Patroni cluster operates normally (without big issues).
In case if we hit the maximum size of the queue further logs will be discarded until the queue size will be reduced. The number of discarded messages will be reported into the log later.
In addition to that, the number of non-flushed and discarded messages (if there are any), will be reported via Patroni REST API as:
```json
"logger_queue_size": X,
"logger_records_lost": Y`
```
* Convert postgresql.py into a package
* Factor out cancellable process into a separate class
* Factor out connection handler into a separate class
* Move postmaster into postgresql package
* Factor out pg_rewind into a separate class
* Factor out bootstrap into a separate class
* Factor out slots handler into a separate class
* Factor out postgresql config handler into a separate class
* Move callback_executor into postgresql package
This is just a careful refactoring, without code changes.
This functionality works similarly to the `pg_hba`:
If the `postgresql.pg_ident` is defined in the config file or DCS, Patroni will write its value to pg_ident.conf, however, if `postgresql.parameters.ident_file` is defined, Patroni will assume that pg_ident is managed from outside and not update the file.
Recently released psycopg2 split into two different packages, psycopg2, and psycopg2-binary which could be installed at the same time into the same place on the filesystem. In order to decrease dependency hell problem, we let a user choose how to install psycopg2. There are a few options available and it is reflected in the documentation.
This PR also changes the following behavior:
* `pip install patroni` will fail if psycopg2 is not installed
* Patroni will check psycopg2 upon start and fail if it can't be found or outdated.
Closes https://github.com/zalando/patroni/issues/1021
First of all, this patch changes the behavior of `on_start`/`on_restart` callbacks, they will be called only when postgres is started or restarted without role changes. In case if the member is promoted or demoted only the `on_role_change` callback will be executed. `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument.
Before that `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument.
In addition to that, the REST API will return standby_leader role for the leader of the standby cluster.
Closes https://github.com/zalando/patroni/issues/988
In addition to that transfer postmaster pid to Patroni process with the help of multiprocessing.Pipe instead of using stdin-stdout pipes.
Closes https://github.com/zalando/patroni/issues/992
* Use `shutil.move` instead of `os.replace`, which is available only from 3.3
* Introduce standby-leader health-check and consul service
* Improve unit tests, some lines were not covered
* rename `assertEquals` -> `assertEqual`, due to deprecation warning
If Patroni gets partitioned it starts receiving stale information from DCS.
We can't use this information to determine that we have the leader key.
Instead, we will record in Ha object the actual state of acquire/update lock and report as a leader only if it was successful.
P.S. despite responding with 200 on `GET /master` postgres was still running read-only.
Introduces a PostmasterProcess object that identifies a running process via pid and start time.
When pid file is parsed and the correct process identified this object is passed around.
When the process goes away we try to find a new one in case somebody restarted postgres behind our back.
It could happen that ttl provided in Patroni configuration is smaller
than minimum supported by Consul. In such case Consul agent fails to
create a new session and responds with 500 Internal Server Error and
http body contains something like: "Invalid Session TTL '3000000000',
must be between [10s=24h0m0s]". Without session Patroni is not able to
create member and leader keys in the Consul KV store and it means that
cluster becomes completely unhealthy.
As a workaround we will handle such exception, adjust ttl to the minimum
possible and retry session creation.
In addition to that make it possible to define custom log format via environment variable `PATRONI_LOGFORMAT`
Task of restoring a cluster from backup or cloning existing cluster into a new one was floating around for some time. It was kind of possible to achieve it by doing a lot of manual actions and very error prone. So I come up with the idea of making the way how we bootstrap a new cluster configurable.
In short - we want to run a custom script instead of running initdb.
Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind.
This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests.
* Introduce timeout parameter to restart.
When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.
If Patroni was started in a docker with pid=1 it will execute itself
with the same arguments. The original process will take care about init
process duties, i.e. handle sigchld and reap dead orphan processes.
Also it will forward SIGINT, SIGHUP, SIGTERM and some other signals to
the real Patroni process.
Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions.
To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over.
We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions.
On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS.
When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock.
Added acceptance tests and documentation.
Implementation by @ants with extensive review by @CyberDem0n.
There is no point to try to update topology until original request is
not performed. Also for us it is more important to execute original
request rather then keep topology of etcd cluster in sync.
In addition to that implement the same retry-timeout logic in the
`machines` property which already is used in `api_execute` method.
* Make different kazoo timeouts dependant on loop_wait
ping timeout ~ 1/2 * loop_wait
connect_timeout ~ 1/2 * loop_wait
Originally these values were calculated from negotiated session timeout
and didn't worked very well, because it was taking significant time to
figure out that connection is dead and reconnect (up to session timeout)
and not giving us time to retry.
* Address the code review
It was causing patroni failing to stop after receiving SIGTERM.
Acceptance tests was killing it with SIGKILL which was causing further tests fail because postgres was still running:
2016-06-16 14:36:24,444 INFO: no action. i am the leader with the lock
2016-06-16 14:36:25,448 INFO: Lock owner: postgres0; I am postgres0
2016-06-16 14:36:25,452 ERROR: Failed to update /service/batman/optime/leader
Traceback (most recent call last):
File "/home/akukushkin/git/patroni/patroni/dcs/zookeeper.py", line 208, in write_leader_optime
self._client.retry(self._client.set, path, last_operation)
File "/home/akukushkin/git/patroni/py2/local/lib/python2.7/site-packages/kazoo/client.py", line 273, in _retry
return self._retry.copy()(*args, **kwargs)
File "/home/akukushkin/git/patroni/py2/local/lib/python2.7/site-packages/kazoo/retry.py", line 123, in __call__
return func(*args, **kwargs)
File "/home/akukushkin/git/patroni/py2/local/lib/python2.7/site-packages/kazoo/client.py", line 1219, in set
return self.set_async(path, value, version).get()
File "/home/akukushkin/git/patroni/py2/local/lib/python2.7/site-packages/kazoo/handlers/utils.py", line 74, in get
self._condition.wait(timeout)
File "/usr/lib/python2.7/threading.py", line 340, in wait
waiter.acquire()
File "/home/akukushkin/git/patroni/patroni/utils.py", line 219, in sigterm_handler
sys.exit()
SystemExit
2016-06-16 14:36:25,453 INFO: no action. i am the leader with the lock
2016-06-16 14:36:26,443 INFO: Lock owner: postgres0; I am postgres0
2016-06-16 14:36:26,444 INFO: no action. i am the leader with the lock
where it is not necessary (test_ha, test_ctl, etc...)
It will simplyfy further refactoring and make it possible to install
implementations of AbstractDCS independant of each other.