Commit Graph

92 Commits

Author SHA1 Message Date
Alexander Kukushkin
4a24b79b73 IPv6 support (#1122)
fixes https://github.com/zalando/patroni/issues/1121
2019-08-02 11:34:29 +02:00
Alexander Kukushkin
37f03790cc Implement two-step logging (#1080)
A few times we observed that Patroni HA loop was blocked for a few minutes due to not being able to write logs to stderr. This is a very rare condition which we hit so far only on k8s. This commit makes Patroni resilient to such kind of problems. All log messages first are written into the in-memory queue and later they are asynchronously flushed into the stderr or file from a separate thread.

The maximum queue size is configurable and the default value is 1000. This should be enough to keep more than one hour of log messages with default settings and when Patroni cluster operates normally (without big issues).

In case if we hit the maximum size of the queue further logs will be discarded until the queue size will be reduced. The number of discarded messages will be reported into the log later.

In addition to that, the number of non-flushed and discarded messages (if there are any), will be reported via Patroni REST API as:
```json
"logger_queue_size": X,
"logger_records_lost": Y`
```
2019-06-13 14:18:49 +02:00
wilfriedroset
2384d9e735 Add API route /health (#1079)
close #119
2019-06-11 15:22:52 +02:00
Alexander Kukushkin
a4bd6a9b4b Refactor postgresql class (#1060)
* Convert postgresql.py into a package
* Factor out cancellable process into a separate class
* Factor out connection handler into a separate class
* Move postmaster into postgresql package
* Factor out pg_rewind into a separate class
* Factor out bootstrap into a separate class
* Factor out slots handler into a separate class
* Factor out postgresql config handler into a separate class
* Move callback_executor into postgresql package

This is just a careful refactoring, without code changes.
2019-05-21 16:02:47 +02:00
Julien Riou
663026c34c Use SSLContext to wrap REST API socket (#1039)
Using `ssl.wrap_socket` is deprecated and was still allowing soon-to-be-deprecated protocols like TLS 1.1.
Now using `SSLContext.create_default_context()` to produce a secure SSL context to wrap the REST API server's socket.
2019-04-23 11:23:22 +02:00
Alexander Kukushkin
7c0c9599fc Remove psycopg2 from requirements (#1023)
Recently released psycopg2 split into two different packages, psycopg2, and psycopg2-binary which could be installed at the same time into the same place on the filesystem. In order to decrease dependency hell problem, we let a user choose how to install psycopg2. There are a few options available and it is reflected in the documentation.

This PR also changes the following behavior:
* `pip install patroni` will fail if psycopg2 is not installed
* Patroni will check psycopg2 upon start and fail if it can't be found or outdated.

Closes https://github.com/zalando/patroni/issues/1021
2019-04-15 14:30:16 +02:00
Alexander Kukushkin
2c128520cf Python34 compatibility (#933)
and some other minor fixes.

Closes https://github.com/zalando/patroni/issues/932
2019-01-16 14:40:05 +01:00
Alexander Kukushkin
381a5b80d2 Release 1.5.4 (#931)
* Bump version
* Update release notes
* Make it possible to configure registration of Service in Consul via env variables
2019-01-15 12:14:19 +01:00
Alexander Kukushkin
71dae6a905 Optionally consider node not healthy if it is not on the latest timeline (#892)
The latest timeline is calculated from the `/history` key in DCS. In case there is no such key or it contains some garbage we consider the node healthy.
Closes https://github.com/zalando/patroni/issues/890
2019-01-15 11:16:30 +01:00
Alexander Kukushkin
f70edefc65 A few bugfixes in the "standby cluster" workflow (#823)
* Always run `pg_rewind` against the remote master
* Always use the remote master as the source when "recovering" stopped standby leader
* Use remote master as the source when "recovering" the node in the unhealthy cluster
* Use the local dbname as the fallback when doing `pg_rewind` from the remote master
*  `no_replication_slot` is the allowed key in the `RemoteMember` object
* Make it possible to "bootstrap" the new `standby_cluster` with existing (and valid) data directory. There is one prerequisite though, there should be no `patroni.dynamic.json` file in it!
2018-10-09 13:30:48 +02:00
Alexander Kukushkin
76d1b4cfd8 Minor fixes (#808)
* Use `shutil.move` instead of `os.replace`, which is available only from 3.3
*  Introduce standby-leader health-check and consul service
* Improve unit tests, some lines were not covered
* rename `assertEquals` -> `assertEqual`, due to deprecation warning
2018-09-19 16:32:33 +02:00
Alexander Kukushkin
90cf930036 Refactor REST API health-checks (#779)
Make it more readable and easy to understand.
Mostly it is needed to implement https://github.com/zalando/patroni/issues/772
2018-08-29 11:35:22 +02:00
Alexander Kukushkin
0c1ae6fbeb Respond 200 to master health-check only if update_lock was successful (#713)
If Patroni gets partitioned it starts receiving stale information from DCS.
We can't use this information to determine that we have the leader key.
Instead, we will record in Ha object the actual state of acquire/update lock and report as a leader only if it was successful.

P.S. despite responding with 200 on `GET /master` postgres was still running read-only.
2018-08-03 17:00:01 +02:00
Henning Jacobs
2537147810 #694 handle configuration error (#695)
It is possible to change a lot of parameters in runtime (including `restapi.listen`) by updating Patroni config file and sending SIGHUP to Patroni process.

If something was misconfigured it was throwing a weird exception and breaking `restapi` thread.

This PR improves friendliness of error message and avoids breaking of `restapi`.
2018-06-12 14:08:38 +02:00
Alexander Kukushkin
84f29caf92 Fix race condition in poll_failover_result (#658)
It didn't affect directly neither failover nor switchover, but in some rare cases it was reporting it as a success too early, when the former leader released the lock: `Failed over to "None" instead of "desired-node"`

In addition to that this commit improves logs and status messages by differentiating between failover and switchover.
2018-04-16 17:45:05 +02:00
Alexander Kukushkin
5668367181 Implement '/sync' and /async endpoints (#578)
They will respond with http status code 200 only when the node is running as a synchronous or asynchronous replica.

Fixes https://github.com/zalando/patroni/issues/189
Fixes https://github.com/zalando/patroni/issues/415
2018-01-05 15:28:40 +01:00
Alexander Kukushkin
03c2a85d23 Expose current timeline in DCS and via API (#591)
It is very easy to get current timeline on the master by executing
```sql
SELECT ('x' || SUBSTR(pg_walfile_name(pg_current_wal_lsn()), 1, 8))::bit(32)::int
```

Unfortunately the same method doesn't work when postgres is_in_recovery. Therefore we will use replication connection for that on the replicas. In order to avoid opening and closing replication connection on every HA loop we will cache the result if its value matches with the timeline of the master.

Also this PR introduces a new key in DCS: `/history`. It will contain a json serialized object with timeline history in a format similar to the usual history files. The differences are:
* Second column is the absolute wal position in bytes, instead of LSN
* Optionally there might be a fourth column - timestamp, (mtime of history file)
2018-01-05 15:25:56 +01:00
Alexander Kukushkin
18786464a1 Rename failover to switchover and make new failover work without leader (#588)
In addition to that implement /switchover endpoint as an alias to /failover endpoint and implement more checks like:
* candidate must be provided for a failover
* switchover can't be scheduled in a pause state
* and so on

Fixes https://github.com/zalando/patroni/issues/585
Fixes https://github.com/zalando/patroni/issues/520
2018-01-05 15:17:56 +01:00
Alexander Kukushkin
3a96ffa718 Expose pause state of every member to DCS and via REST (#592)
and implement patronictl pause|resume --wait on top of that

Fixes https://github.com/zalando/patroni/issues/349
2018-01-05 15:16:45 +01:00
Alexander Kukushkin
0e01bb33bb Improve patronictl reinit (#576)
Make it possible to cancel a running task if you want to reinitialize replica.
There are two possible ways to trigger it:
1. patronictl will ask whether you want to cancel already running task if an attempt to trigger reinitialize has failed
2. if you are using `--force` argument with `patronictl reinit`
2018-01-04 10:31:44 +01:00
Alexander Kukushkin
23152a7fc4 synchronous_standby_names must be quoted with quote_ident (#505)
in addition to that implement additional checks around manual failover and recover when synchronous_mode is enabled

* Comparison must be case insensitive
2017-08-24 07:55:02 +02:00
Alexander Kukushkin
77aea03df9 Different bugfixes around pause state, mostly related to watchdog (#507)
* Do not send keepalives if watchdog is not active
* Avoid activating watchdog in a pause mode
* Set correct postgres state in pause mode
* Don't try to run queries from API if postgres is stopped
2017-08-24 07:53:32 +02:00
Alexander Kukushkin
f8b3703d6e Bugfix: failover via API didn't work due to change in _MemberStatus (#489)
Originally fetch_nodes_statuses was returning a tuple, later it was
wrapped into namedtuple _MemberStatus and recently _MemberStatus was
extened with watchdog_failed field, but api.py was still relying on
usual tuple and checking failover limitations on it's own instead of
calling `failover_limitation` method.
2017-07-28 15:38:55 +02:00
Ants Aasma
70d718a058 Simplify watchdog code (#452)
* Only activate watchdog while master and not paused

We don't really need the protections while we are not master. This way
we only need to tickle the watchdog when we are updating leader key or
while demotion is happening.

As implemented we might fail to notice to shut down the watchdog if
someone demotes postgres and removes leader key behind Patroni's back.
There are probably other similar cases. Basically if the administrator
if being actively stupid they might get unexpected restarts. That seems
fine.

* Add configuration change support. Change MODE_REQUIRED to disable leader eligibility instead of closing Patroni.

Changes watchdog timeout during the next keepalive when ttl is changed. Watchdog driver and requirement can also be switched online.

When watchdog mode is `required` and watchdog setup does not work then the effect is similar to nofailover. Add watchdog_failed to status API to signify this. This is True only when watchdog does not work **AND** it is required.

* Reset implementation when config changed while active.

* Add watchdog safety margin configuration

Defaults to 5 seconds. Basically this is the maximum amount of time
that can pass between the calls to odcs.update_leader()` and
`watchdog.keepalive()`, which are called right after each other. Should
be safe for pretty much any sane scenario and allows the default
settings to not trigger watchdog when DCS is not responding.

* Cancel bootstrap if watchdog activation fails

The system would have demoted itself anyway the next HA loop. Doing it
in bootstrap gives at least some other node chance to try bootstrapping
in the hope that it is configured correctly.

If all nodes are unable to activate they will continue to try until the
disk is filled with moved datadirs. Perhaps not ideal behavior, but as
the situation is unlikely to resolve itself without administrator
intervention it doesn't seem too bad.
2017-07-27 12:16:11 +02:00
Alexander Kukushkin
cd84dc82b6 Implement postgresql-10 support (#444)
Mainly it handles rename of xlog to wal.
In the API and inside DCS it is still named xlog (for compatibility).

* Address feedback
2017-05-19 17:04:53 +02:00
Alexander Kukushkin
44a7142a9d Synchronous mode strict (#435)
If synchronous_mode_strict==true then '*' will be written as synchronous_standby_names when that last replication host dies.
2017-04-27 14:32:15 +02:00
Oleksii Kliukin
d39f895082 Fix unit tests for Python 3.6 (#431)
Python 3.6 complains about 'AttributeError: 'MockRequest' object has no attribute 'sendall'
2017-04-18 12:44:42 +02:00
Ants Aasma
1290b30b84 Introduce starting state and master start timeout. (#295)
Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind.

This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests.

* Introduce timeout parameter to restart.

When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.
2016-12-08 14:44:27 +01:00
Alexander Kukushkin
038b5aed72 Improve leader watch functionality (#356)
Previously replicas were always watching for leader key (even if the
postgres was not in the running there). It was not a big issue, but it
was not possible to interrupt such watch in cases if the postgres
started up or stopped successfully. Also it was delaying update_member
call and we had kind of stale information in DCS up to `loop_wait`
seconds. This commit changes such behavior. If the async_executor is
busy by starting/stopping or restarting postgres we will not watch for
leader key but waiting for event from async_executor up to `loop_wait`
seconds. Async executor will fire such event only in case if the
function it was calling returned something what could be evaluated to
boolean True.

Such functionality is really needed to change the way how we are making
decision about necessity of pg_rewind. It will require to have a local
postgres running and for us it is really important to get such
notification as soon as possible.
2016-11-22 16:22:30 +01:00
Alexander Kukushkin
37b020e7a3 Various bugfixes and improvements: (#346)
* Replace pytz.UTC with dateutil.tz.tzutc, it helps to reduce memory by more than 4Mb...

* fix check of python version: 0x0300000 => 0x3000000

* Update leader key before restart and demote
2016-11-04 18:42:56 +02:00
Ants Aasma
7e53a604d4 Add synchronous replication support. (#314)
Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions.

To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over.

We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions.

On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS.

When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock.

Added acceptance tests and documentation.

Implementation by @ants with extensive review by @CyberDem0n.
2016-10-19 16:12:51 +02:00
Alexander Kukushkin
6dc1d9c88e Trigger reinitialize from api
and make it possible to reinitialize in a pause state
2016-08-29 15:38:58 +02:00
Alexander Kukushkin
9fdd021e08 Fix unit-tests for api 2016-08-29 10:25:46 +02:00
Murat Kabilov
3d1fe3fa49 Introduce is_paused method in the Cluster 2016-08-29 09:29:49 +02:00
Murat Kabilov
89ef5da5ae Add tests for api; add checks for ctl and api for the paused state case 2016-08-29 08:36:35 +02:00
Alexander Kukushkin
96da6340a9 Calculate future restart time dynamically (#268)
`do_POST_restart` was ramdomly showing not 100% coverage after 2016-08-20 due to hardcoded timestamps.
2016-08-24 09:46:56 +02:00
Alexander Kukushkin
fa7aa71092 Always call on_start callback when starting Patroni (#262)
When Patroni was "joining" already running postgres it was not calling
callbacks, what in some cases causing issues (callback could be used to
change routing/load-balancer or assign/remove floating (service) ip.

In addition to that we should `start` postgres instead of `restart`-ing
it when doing recovery, because in this case 'on_start' callback should
be called, instead of 'on_restart'
2016-08-18 09:35:13 +02:00
Alexander Kukushkin
5fe74bec3b Make different kazoo timeouts depend on loop_wait (#243)
* Make different kazoo timeouts dependant on loop_wait

ping timeout ~ 1/2 * loop_wait
connect_timeout ~ 1/2 * loop_wait

Originally these values were calculated from negotiated session timeout
and didn't worked very well, because it was taking significant time to
figure out that connection is dead and reconnect (up to session timeout)
and not giving us time to retry.

* Address the code review
2016-08-10 10:15:09 +02:00
Oleksii Kliukin
13b4306f40 Remove one more occurrence of the time bomb 2016-07-14 16:53:02 +02:00
Oleksii Kliukin
3181c4e59f Code review, asynchronous restarts.
- Make the restart initiated by the schedule asynchronous
- Fix the placeholders in logs.
- Fix the regexp to detect the PostgreSQL version.
2016-07-12 20:25:01 +02:00
Oleksii Kliukin
8834f929aa Improve the unit tests/coverage. 2016-07-05 10:07:29 +02:00
Oleksii Kliukin
d2832ee43b Address the code review.
Fix return  value in the should_run_scheduled_action and the comments.
Correct the json composition in the scheduled_restart test.
Fix the delete in case there is no scheduled restart.
Fix the usage of format in the logger output.
Fix the indentation in the evaluate_scheduled_restart.
Fix the condition related to the body_is_optional in the do_POST_restart.
Fix a few typos in the error messages.
Fix the _read_json_content
Make the scheduled restart unit-tests a bit less ugly
2016-06-28 16:54:20 +02:00
Oleksii Kliukin
568eb730bc Clear the scheduled restart after the normal one.
Make sure the scheduled restart flag is cleared when the
postmaster_start_time changes since the time restart was scheduled.

Additionally, separate the logic of checking the restart conditions
into the function in order to support conditions for the normal
restart as well.
2016-06-24 17:39:04 +02:00
Oleksii Kliukin
29845dd383 Restart the node according to the schedule.
The scheduled restart data structures are now independent of those
used by the normal restarts. This would be fixed in subsequent
commits.
Add the behave tests, that cover the POST /restart (but not DELETE).
2016-06-23 10:43:54 +02:00
Oleksii Kliukin
318ca6be38 Implement scheduling and deleting a restart.
The scheduled restart API extends the already existing restart
endpoint by processing the parameters in the request body.

Only one scheduled restart at a time is support. DELETE method
on the /restart endpoint is used to remove an existing restart.
2016-06-20 15:16:22 +02:00
Alexander Kukushkin
9ecff0f64d Bugfixes
* GET /config was returning latesy "correct" version of dynamic
  configuration.
* PATCH /config was breaking when trying to patch not dict with dict
2016-06-10 12:35:04 +02:00
Alexander Kukushkin
ebb9e252d8 Rename restart_pending to pending_restart for compatibility 2016-06-02 09:31:30 +02:00
Alexander Kukushkin
1c30948ef9 Implement PUT /config and enhance some checks 2016-06-01 17:06:31 +02:00
Alexander Kukushkin
e10873dd9c RestApiHandler._patch_config returns True if configuration was changed 2016-05-31 15:49:55 +02:00
Alexander Kukushkin
1cd42d4e47 Get rid from some stupid logic with options=True/False
And some other tricks with overriding handle_one_request and finish
methods from the parent class which were necessary only to make OPTIONS
request from haproxy work with python2, but in fact it was still not
working with python3. Instead of doing all the magic we should simply
give to haproxy what it wants to get: HTTP response code and nothing
more.
2016-05-31 14:42:00 +02:00