120 Commits

Author SHA1 Message Date
Alexander Kukushkin
f8b3703d6e Bugfix: failover via API didn't work due to change in _MemberStatus (#489)
Originally fetch_nodes_statuses was returning a tuple, later it was
wrapped into namedtuple _MemberStatus and recently _MemberStatus was
extened with watchdog_failed field, but api.py was still relying on
usual tuple and checking failover limitations on it's own instead of
calling `failover_limitation` method.
2017-07-28 15:38:55 +02:00
Ants Aasma
70d718a058 Simplify watchdog code (#452)
* Only activate watchdog while master and not paused

We don't really need the protections while we are not master. This way
we only need to tickle the watchdog when we are updating leader key or
while demotion is happening.

As implemented we might fail to notice to shut down the watchdog if
someone demotes postgres and removes leader key behind Patroni's back.
There are probably other similar cases. Basically if the administrator
if being actively stupid they might get unexpected restarts. That seems
fine.

* Add configuration change support. Change MODE_REQUIRED to disable leader eligibility instead of closing Patroni.

Changes watchdog timeout during the next keepalive when ttl is changed. Watchdog driver and requirement can also be switched online.

When watchdog mode is `required` and watchdog setup does not work then the effect is similar to nofailover. Add watchdog_failed to status API to signify this. This is True only when watchdog does not work **AND** it is required.

* Reset implementation when config changed while active.

* Add watchdog safety margin configuration

Defaults to 5 seconds. Basically this is the maximum amount of time
that can pass between the calls to odcs.update_leader()` and
`watchdog.keepalive()`, which are called right after each other. Should
be safe for pretty much any sane scenario and allows the default
settings to not trigger watchdog when DCS is not responding.

* Cancel bootstrap if watchdog activation fails

The system would have demoted itself anyway the next HA loop. Doing it
in bootstrap gives at least some other node chance to try bootstrapping
in the hope that it is configured correctly.

If all nodes are unable to activate they will continue to try until the
disk is filled with moved datadirs. Perhaps not ideal behavior, but as
the situation is unlikely to resolve itself without administrator
intervention it doesn't seem too bad.
2017-07-27 12:16:11 +02:00
Alexander Kukushkin
cd84dc82b6 Implement postgresql-10 support (#444)
Mainly it handles rename of xlog to wal.
In the API and inside DCS it is still named xlog (for compatibility).

* Address feedback
2017-05-19 17:04:53 +02:00
Alexander Kukushkin
44a7142a9d Synchronous mode strict (#435)
If synchronous_mode_strict==true then '*' will be written as synchronous_standby_names when that last replication host dies.
2017-04-27 14:32:15 +02:00
Oleksii Kliukin
d39f895082 Fix unit tests for Python 3.6 (#431)
Python 3.6 complains about 'AttributeError: 'MockRequest' object has no attribute 'sendall'
2017-04-18 12:44:42 +02:00
Ants Aasma
1290b30b84 Introduce starting state and master start timeout. (#295)
Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind.

This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests.

* Introduce timeout parameter to restart.

When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.
2016-12-08 14:44:27 +01:00
Alexander Kukushkin
038b5aed72 Improve leader watch functionality (#356)
Previously replicas were always watching for leader key (even if the
postgres was not in the running there). It was not a big issue, but it
was not possible to interrupt such watch in cases if the postgres
started up or stopped successfully. Also it was delaying update_member
call and we had kind of stale information in DCS up to `loop_wait`
seconds. This commit changes such behavior. If the async_executor is
busy by starting/stopping or restarting postgres we will not watch for
leader key but waiting for event from async_executor up to `loop_wait`
seconds. Async executor will fire such event only in case if the
function it was calling returned something what could be evaluated to
boolean True.

Such functionality is really needed to change the way how we are making
decision about necessity of pg_rewind. It will require to have a local
postgres running and for us it is really important to get such
notification as soon as possible.
2016-11-22 16:22:30 +01:00
Alexander Kukushkin
37b020e7a3 Various bugfixes and improvements: (#346)
* Replace pytz.UTC with dateutil.tz.tzutc, it helps to reduce memory by more than 4Mb...

* fix check of python version: 0x0300000 => 0x3000000

* Update leader key before restart and demote
2016-11-04 18:42:56 +02:00
Ants Aasma
7e53a604d4 Add synchronous replication support. (#314)
Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions.

To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over.

We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions.

On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS.

When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock.

Added acceptance tests and documentation.

Implementation by @ants with extensive review by @CyberDem0n.
2016-10-19 16:12:51 +02:00
Alexander Kukushkin
6dc1d9c88e Trigger reinitialize from api
and make it possible to reinitialize in a pause state
2016-08-29 15:38:58 +02:00
Alexander Kukushkin
9fdd021e08 Fix unit-tests for api 2016-08-29 10:25:46 +02:00
Murat Kabilov
3d1fe3fa49 Introduce is_paused method in the Cluster 2016-08-29 09:29:49 +02:00
Murat Kabilov
89ef5da5ae Add tests for api; add checks for ctl and api for the paused state case 2016-08-29 08:36:35 +02:00
Alexander Kukushkin
96da6340a9 Calculate future restart time dynamically (#268)
`do_POST_restart` was ramdomly showing not 100% coverage after 2016-08-20 due to hardcoded timestamps.
2016-08-24 09:46:56 +02:00
Alexander Kukushkin
fa7aa71092 Always call on_start callback when starting Patroni (#262)
When Patroni was "joining" already running postgres it was not calling
callbacks, what in some cases causing issues (callback could be used to
change routing/load-balancer or assign/remove floating (service) ip.

In addition to that we should `start` postgres instead of `restart`-ing
it when doing recovery, because in this case 'on_start' callback should
be called, instead of 'on_restart'
2016-08-18 09:35:13 +02:00
Alexander Kukushkin
5fe74bec3b Make different kazoo timeouts depend on loop_wait (#243)
* Make different kazoo timeouts dependant on loop_wait

ping timeout ~ 1/2 * loop_wait
connect_timeout ~ 1/2 * loop_wait

Originally these values were calculated from negotiated session timeout
and didn't worked very well, because it was taking significant time to
figure out that connection is dead and reconnect (up to session timeout)
and not giving us time to retry.

* Address the code review
2016-08-10 10:15:09 +02:00
Oleksii Kliukin
13b4306f40 Remove one more occurrence of the time bomb 2016-07-14 16:53:02 +02:00
Oleksii Kliukin
3181c4e59f Code review, asynchronous restarts.
- Make the restart initiated by the schedule asynchronous
- Fix the placeholders in logs.
- Fix the regexp to detect the PostgreSQL version.
2016-07-12 20:25:01 +02:00
Oleksii Kliukin
8834f929aa Improve the unit tests/coverage. 2016-07-05 10:07:29 +02:00
Oleksii Kliukin
d2832ee43b Address the code review.
Fix return  value in the should_run_scheduled_action and the comments.
Correct the json composition in the scheduled_restart test.
Fix the delete in case there is no scheduled restart.
Fix the usage of format in the logger output.
Fix the indentation in the evaluate_scheduled_restart.
Fix the condition related to the body_is_optional in the do_POST_restart.
Fix a few typos in the error messages.
Fix the _read_json_content
Make the scheduled restart unit-tests a bit less ugly
2016-06-28 16:54:20 +02:00
Oleksii Kliukin
568eb730bc Clear the scheduled restart after the normal one.
Make sure the scheduled restart flag is cleared when the
postmaster_start_time changes since the time restart was scheduled.

Additionally, separate the logic of checking the restart conditions
into the function in order to support conditions for the normal
restart as well.
2016-06-24 17:39:04 +02:00
Oleksii Kliukin
29845dd383 Restart the node according to the schedule.
The scheduled restart data structures are now independent of those
used by the normal restarts. This would be fixed in subsequent
commits.
Add the behave tests, that cover the POST /restart (but not DELETE).
2016-06-23 10:43:54 +02:00
Oleksii Kliukin
318ca6be38 Implement scheduling and deleting a restart.
The scheduled restart API extends the already existing restart
endpoint by processing the parameters in the request body.

Only one scheduled restart at a time is support. DELETE method
on the /restart endpoint is used to remove an existing restart.
2016-06-20 15:16:22 +02:00
Alexander Kukushkin
9ecff0f64d Bugfixes
* GET /config was returning latesy "correct" version of dynamic
  configuration.
* PATCH /config was breaking when trying to patch not dict with dict
2016-06-10 12:35:04 +02:00
Alexander Kukushkin
ebb9e252d8 Rename restart_pending to pending_restart for compatibility 2016-06-02 09:31:30 +02:00
Alexander Kukushkin
1c30948ef9 Implement PUT /config and enhance some checks 2016-06-01 17:06:31 +02:00
Alexander Kukushkin
e10873dd9c RestApiHandler._patch_config returns True if configuration was changed 2016-05-31 15:49:55 +02:00
Alexander Kukushkin
1cd42d4e47 Get rid from some stupid logic with options=True/False
And some other tricks with overriding handle_one_request and finish
methods from the parent class which were necessary only to make OPTIONS
request from haproxy work with python2, but in fact it was still not
working with python3. Instead of doing all the magic we should simply
give to haproxy what it wants to get: HTTP response code and nothing
more.
2016-05-31 14:42:00 +02:00
Alexander Kukushkin
8b5d6e83e7 fix some bugs revaled by acceptance tests 2016-05-27 17:38:19 +02:00
Alexander Kukushkin
073ef3784f Implement PATCH /config 2016-05-27 16:29:33 +02:00
Alexander Kukushkin
6700cd0aa6 Implement reload of config.yml with REST API call
and acceptance tests for that
2016-05-26 17:09:40 +02:00
Alexander Kukushkin
7827951c8c Dynamic configuration 2016-05-25 14:17:05 +02:00
Alexander Kukushkin
d422e16aad Implement reload of config.yaml on SIGHUP
If some changes require restart of postgres patroni will expose
`restart_pending` flag in DCS and via REST API
2016-05-13 13:31:21 +02:00
Alexander Kukushkin
defc987328 Encode request body only once in a MockRequest
to avoid using bytestrings all over the file
2016-05-09 09:33:54 +02:00
Alexander Kukushkin
499061918d Implement noloadbalance support
Mostly this tag is necessary to give a hint to load balancer
auto-configuration tool that node should not be included into
LB configuration.
In addition to that Patroni also should not return status_code=200
for a health check if the tag is present and value is not `False`.
2016-04-22 09:46:34 +02:00
Feike Steenbergen
f317b9b9a6 Include database system identifier in cluster info 2016-04-18 10:44:35 +02:00
Alexander Kukushkin
b4e86f0809 Make it possible to schedule failover in less then 10 seconds
But only when API request was posted to the leader
2016-04-13 13:32:39 +02:00
Alexander Kukushkin
3a7d2c3874 Remove unused code from unit tests 2016-03-21 20:48:17 +01:00
Alexander Kukushkin
9fec8a41e4 Return different status if failed over not to candidate 2016-03-19 13:15:05 +01:00
Oleksii Kliukin
aa844b63d0 Avoid an unhandled exception in the API thread.
When receiving a failover request with no data or
non-JSON data, emit a message to the client instead
of crashing.
2016-03-04 19:21:14 +01:00
Alexander Kukushkin
dbb3e8308b Merge branch 'master' of github.com:zalando/patroni into codequality 2016-02-22 19:20:23 +01:00
Alexander Kukushkin
4038d94c5a Fix more codacy issues 2016-02-17 14:59:17 +01:00
Alexander Kukushkin
a210cfd1ab Fix more codacy issues 2016-02-17 14:51:59 +01:00
Alexander Kukushkin
1b9e77fe83 pep8 formatting 2016-02-17 12:34:04 +01:00
Alexander Kukushkin
a875e93f2e Merge branch 'master' of github.com:zalando/patroni into feature/scheduled_failover_squashed 2016-02-17 12:14:10 +01:00
Alexander Kukushkin
df9b8fed2e Improve quality of code by resolving issues found by quantifiedcode and codacy 2016-02-12 12:23:49 +01:00
Feike Steenbergen
1e2fdac891 Scheduled Failover tests
Add tests for the scheduled failover feature, also add more and better tests for patronictl.
2016-02-10 14:19:41 +01:00
Feike Steenbergen
bce96df177 Add attributes to Mocked classes 2016-01-29 13:29:51 +01:00
Alexander Kukushkin
57f19fb149 Merge pull request #80 from zalando/feature/nofailover
Feature/nofailover
2015-11-16 10:21:56 +01:00
Oleksii Kliukin
28934350ef Handle haproxy requests. Improve failover status code.
By default, haproxy sens an OPTION request, which we didn't
handle until now. In addition, all haproxy requests that doesn't
examine the request body close the connection as soon as the status
code is obtained. Such behavior breaks BaseHTTPRequestHandler,
namely handle_one_request, which doesn't check for connection reset
by peer and throw this error on a higher level, but since we don't
call this function directly, there is no place in the code to catch
it, therefore, we have to patch this function in the base class.
In addition, patch the StreamRequestHandler finish() function in
order to handle the connection reset error.

Re-read the cluster from DCS right after the failover to supply
the correct new values to the API thread. Fix a typo.
2015-11-12 17:38:22 +01:00