735 Commits

Author SHA1 Message Date
Ants Aasma
1290b30b84 Introduce starting state and master start timeout. (#295)
Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind.

This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests.

* Introduce timeout parameter to restart.

When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.
2016-12-08 14:44:27 +01:00
Alexander Kukushkin
ec78777778 Implement simple asynchronos dns-resolve cache (#360) 2016-12-07 13:16:26 +01:00
Oleksii Kliukin
b38d98a6a3 Fix the WAL-E restore (#359)
* Fix broken WAL directory symlinks after WAL-E restore.

* Add unit-tests for wale_restore.

* Reduce the amount of MagicMock to the one (for psycopg2.connect)

* Make WAL-E restore process more robuts.

Allow retries only on WAL-E failures.
Sleep after each attempt

* Update the tests.

* Change WAL-E behavior when master is absent, tests.

- Challenge the use of WAL-E even when 'no_master' flag is set. This flag in
  fact does not indicate that the master is absent. In order to check the master
  absense the script looks whether the connection string is not empty.

- Retry on a failure to fetch current xlog position from the master. The reason
  it has to be separate from retries in the main loop is that we don't just
  retry the connection attempt, but also make a decision when either it was
  successfull or all attempts are exhausted.

- Remove wrong usages of ProperyMocks from the tests.

* Avoid redundant output of the exception message in logger.exception

* Address issues uncovered by flake8
2016-12-06 17:01:40 +01:00
Alexander Kukushkin
b299b12f58 Varios configuration parameters for etcd (#358)
* Add https and auth support for etcd

Also implement support of PATRONI_ETCD_URL and PATRONI_ETCD_SRV
environment variables

* Implement etcd.proxy etcd.cacert, etcd.cert and etcd.key support

Now it should be possible to set up fully encrypted connection to etcd
with authorization.
2016-12-06 16:40:21 +01:00
Alexander Kukushkin
c6417b2558 Add postgres-9.6 support (#357)
starting from 9.6 we need wal_level = 'replica' which is alias for 'hot_standby'. It was working before without problems, but if somebody change wal_level to replica, Patroni will expose pending_restart flag, although restart in this case is not necessary.

* bump versions of consul and etcd to the latest for travis integration-tests
2016-11-25 12:35:01 +01:00
Alexander Kukushkin
28b00dea16 Solve issue of handling sigchld when dunning in a docker (#355)
If Patroni was started in a docker with pid=1 it will execute itself
with the same arguments. The original process will take care about init
process duties, i.e. handle sigchld and reap dead orphan processes.
Also it will forward SIGINT, SIGHUP, SIGTERM and some other signals to
the real Patroni process.
2016-11-22 16:22:47 +01:00
Alexander Kukushkin
038b5aed72 Improve leader watch functionality (#356)
Previously replicas were always watching for leader key (even if the
postgres was not in the running there). It was not a big issue, but it
was not possible to interrupt such watch in cases if the postgres
started up or stopped successfully. Also it was delaying update_member
call and we had kind of stale information in DCS up to `loop_wait`
seconds. This commit changes such behavior. If the async_executor is
busy by starting/stopping or restarting postgres we will not watch for
leader key but waiting for event from async_executor up to `loop_wait`
seconds. Async executor will fire such event only in case if the
function it was calling returned something what could be evaluated to
boolean True.

Such functionality is really needed to change the way how we are making
decision about necessity of pg_rewind. It will require to have a local
postgres running and for us it is really important to get such
notification as soon as possible.
2016-11-22 16:22:30 +01:00
Alexander Kukushkin
66543f41a3 BUGFIX: Try all cluster members when doing pause/resume (#351)
Previously it was not possible to pause/resume unhealthy cluster.
2016-11-04 18:43:19 +02:00
Alexander Kukushkin
37b020e7a3 Various bugfixes and improvements: (#346)
* Replace pytz.UTC with dateutil.tz.tzutc, it helps to reduce memory by more than 4Mb...

* fix check of python version: 0x0300000 => 0x3000000

* Update leader key before restart and demote
2016-11-04 18:42:56 +02:00
Ants Aasma
7e53a604d4 Add synchronous replication support. (#314)
Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions.

To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over.

We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions.

On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS.

When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock.

Added acceptance tests and documentation.

Implementation by @ants with extensive review by @CyberDem0n.
2016-10-19 16:12:51 +02:00
Alexander Kukushkin
1e573aec8f Do session/renew call to Consul when update_leader is called (#336) 2016-10-10 10:05:55 +02:00
Alejandro Martínez
48a6af6994 Add post_init configuration parameter on bootstrap (#296)
* Add bootstrap post_init configuration parameter
* Add documentation

By @zenitraM
2016-09-28 15:42:23 +02:00
Alexander Kukushkin
67c4b6b105 Make --dcs and --config-file options global (#322) 2016-09-27 16:25:24 +02:00
Alexander Kukushkin
e38dfaf1ba Call touch_member at the end of HA loop (#321)
To make sure that we have up-to-date state of member in DCS after HA
loop has changed something.
2016-09-27 16:25:11 +02:00
Alexander Kukushkin
298357c099 Implement retry and timeout strategy for consul (#305)
..the same way as for etcd

Change HTTPClient implementation from using `requests.session` to
`urllib3.PoolManager`, because reference implementation from python-consul
didn't really worked with timeouts and was blocking HA loop...
2016-09-27 16:24:30 +02:00
Alexander Kukushkin
5265e71fc2 Don't write leader optime into DCS if it didn't changed (#319) 2016-09-21 14:22:22 +02:00
Alexander Kukushkin
10c7fa41f3 Exclude unhealthy nodes when choosing where to clone from (#313)
Node MUST have tag clonefrom: true, be in the 'running' state and also
we should not try to clone from itself.
2016-09-21 09:42:48 +02:00
Alexander Kukushkin
7ca55359de Demote immediately if failed to update leader lock (#316)
If the Etcd node partitioned from rest of the cluster it is still
possible to read from it (though it returns some stale information),
but it is not possible to write into it.
Previously Patroni was trying to fetch the new cluster view from DCS in
order to figure out is it still the leader or not and Etcd is always
returning stale info where the node still owns the leader key, but with
negative TTL.
This weird bug clearly shows how dangerous premature optimization is.
2016-09-20 15:45:21 +02:00
Alexander Kukushkin
453e68637a Don't try to remove leader key when running ctl on the leader node (#302) 2016-09-19 13:33:24 +02:00
Alexander Kukushkin
0b1bfeca5b Make sure that we are running and testing latest versions of everything (#303) 2016-09-19 13:32:53 +02:00
Alexander Kukushkin
5c8399e4fa Make sure data directory is empty before trying to restore backup (#307)
We are doing number of attempts when trying to initialize replica using
different methods. Any of this attemp may create and put something into
data directory, what causes next attempts fail.
In addition to that improve logging when creating replica.
2016-09-19 13:32:27 +02:00
Alexander Kukushkin
540ee2b3c7 Bugfix/fast recover (#300)
* reap children before and after running HA loop

When the Patroni is running in a docker container with the pid=1 it is
also responsible for reaping of all dead processes. We can't call
os.waitpid immediately after receiving SIGCHLD because it breaks
subprocess module. It simply stops receiving exit codes of the processes
it executes because these processes. That's why we just registering the
fact of receiving SIGCHLD and reaping children only after execution of
HA loop.
If the postmaster was dying for some reason, Patroni was able to detect
this fact only on the next iteration of HA loop, because zombie
processes where still there and it was possible to send 0 signal to it.
To avoid such situation we should also reap all dead processes before
executing HA loop.

* Don't rely on _cursor_holder when closing connection

it could happen that connection has been opened but not cursor...

* Don't "retry" when fetching current xlog location and it fails

On every iteration of HA loop we are updaing member key in DCS and among
other data there is current xlog location stored in the value.
If the postgres has died for some reason it is not possible to fetch
xlog position and we are just wasting retry_timeout/2 = 5 seconds there.
If this information will be missing from DCS during period of one HA
loop nothing should break. Patroni is not relying on this information
anyway. When it is doing manual or automatic failover it aways
communicates with other nodes directly to get the most fresh
infomation.

* Don't try to update leader optime when postgres is not 100% healthy

`update_lock` method is not only doing update of the leader lock but
also writes the most recent value of xlog position into optime/leader
key. If you know that postgres can be not 100% healthy because it is in
process of restart or recover we should not try to fetch current xlog
position and update 'optime/leader'. Previously we were using
`AsyncExecutor.busy` property for avoiding of such action, but I think
we should be more excpilicit and do the update only if we know that
postgres is 100% healty.
2016-09-14 15:13:01 +02:00
Alexander Kukushkin
c2b91d0195 Merge branch 'master' of github.com:zalando/patroni into feature/disable-automatic-failover 2016-09-05 16:03:55 +02:00
Alexander Kukushkin
53bcc5c9bb Merge pull request #290 from zalando/feature/pyinstaller
Binary build with PyInstaller
2016-09-05 14:52:13 +02:00
Feike Steenbergen
6bdaa7fb88 Merge pull request #288 from zalando/bugfix/python3_wale_restore
Decode output from wal-e list backup
2016-09-05 14:45:14 +02:00
Alexander Kukushkin
2086c90a4a Try to get rid from hardcoded names when building binary 2016-09-05 14:11:53 +02:00
Alexander Kukushkin
57a0ac9086 pep8 format of test_wale_restore.py 2016-09-05 12:15:28 +02:00
Feike Steenbergen
5ba1294d60 Fix tests for wal-e restore 2016-09-02 17:04:37 +02:00
Oleksii Kliukin
3f7fa4b41f Avoid retries when syncing replication slots. (#282)
* Avoid retries when syncing replication slots.

Do not retry postgres queries that fetch, create and drop slots at the end of
the HA cycle. The complete run_cycle routine executes with the async_executor
lock. This lock is also used with scheduling operations like reinit or restart
in different threads. Looks like CPython threading class has fairness issues
when multiple threads try to acquire the same lock and one of them executes
long-running actions while holding it: the others have little chances of
acquiring the lock in order. To get around this issue, the long action (i.e.
retrying the query) is removed.

Investigation by Ants Aasma and Alexander Kukushkin.
2016-09-02 17:00:37 +02:00
Alexander Kukushkin
19c80df442 Try to mitigate EtcdEventIndexCleared exception (#287)
This error is send by etcd when Patroni is doing "watch" on leader key
which is never updated after creation and etcd cluster receives a lot of
updates, what cleans history of events.

Instead of doing watch on modifiedIndex + 1 we will do watch on X-Etcd-Index,
which is probably still available...
2016-09-02 13:44:47 +02:00
Alexander Kukushkin
db9b62b7ed Merge branch 'master' of github.com:zalando/patroni into feature/disable-automatic-failover 2016-09-01 11:09:09 +02:00
Alexander Kukushkin
33ff372ef6 Always try to rewind on manual failover 2016-09-01 11:08:26 +02:00
Oleksii Kliukin
46f1c5b690 Merge pull request #269 from zalando/feature/replica-info
Return replication information on the api
2016-08-31 13:58:19 +02:00
Alexander Kukushkin
4d72eef164 Execute API restart outside of lock
Otherwise it was blocking HA loop...
2016-08-31 12:38:02 +02:00
Alexander Kukushkin
c0fae1b2e9 Merge branch 'feature/disable-automatic-failover' of github.com:zalando/patroni into feature/disable-automatic-failover 2016-08-30 17:03:37 +02:00
Alexander Kukushkin
8028877be0 Remove failover key only after becoming master 2016-08-30 16:49:28 +02:00
Oleksii Kliukin
11359a26a9 Improve incomplete failover is a paused mode.
Instead of empying the stale failover key as a master and bailing
out, continue with the healthiest node evaluation. This should make
the actual master acquire the leader key faster. Emit the warning
message as well and add unit tests.
2016-08-30 12:00:51 +02:00
Ants Aasma
fa6bd51ad1 Appease Quantifiedcode about stylistic issues 2016-08-30 00:40:19 +03:00
Ants Aasma
e428c8d0fa Replace invalid characters in member names for replication slot names
PostgreSQL replication slot names only allow names consisting of [a-z0-9_].
Invalid characters cause replication slot creation and standby startup to fail.
This change substitutes the invalid characters with underscores or unicode
codepoints. In case multiple member names map to identical replication slots
master log will contain a corresponding error message.

Motivated by wanting to use hostnames as member names. Hostnames often
contain periods and dashes.
2016-08-30 00:21:33 +03:00
Alexander Kukushkin
366ed9cc52 fix pep8 formatting and implement missing tests 2016-08-29 15:39:24 +02:00
Alexander Kukushkin
6dc1d9c88e Trigger reinitialize from api
and make it possible to reinitialize in a pause state
2016-08-29 15:38:58 +02:00
Murat Kabilov
799d4c9bb8 Disable command renamed to pause 2016-08-29 14:30:19 +02:00
Murat Kabilov
22e4af3fb1 Fix failover in the paused state 2016-08-29 12:04:30 +02:00
Alexander Kukushkin
9fdd021e08 Fix unit-tests for api 2016-08-29 10:25:46 +02:00
Murat Kabilov
3d1fe3fa49 Introduce is_paused method in the Cluster 2016-08-29 09:29:49 +02:00
Murat Kabilov
89ef5da5ae Add tests for api; add checks for ctl and api for the paused state case 2016-08-29 08:36:35 +02:00
Alexander Kukushkin
1635f5269e Merge branch 'master' of github.com:zalando/patroni into feature/disable-automatic-failover 2016-08-26 11:09:43 +02:00
Alexander Kukushkin
ac49835a3c Possibility to disable automatic failover cluster-wide
Any node of the cluster will maintain it's member key until Patroni is
running there.

Master node will also maintain the leader key until postgres is running
as a master. If there is not postgres or it is running 'in_recovery',
Patroni will release leader lock.

Bootstrap of a new cluster will work (it is possible to specify
paused: true) in the `bootstrap.dcs`. Replicas also will be able to join
the cluster if the leader lock exist.

If the postgres is not running on the node it will not try to bring it
up. Also it disables reinitialize and all kind of scheduled actions, i.e.
scheduled restart and scheduled failover.

In case if DCS stops being reachable Patroni will not "demote" master if
the automatic failover was disabled.

Patroni will not stop postgres on exit.
2016-08-26 10:51:43 +02:00
Alexander Kukushkin
74166e996c Fix tests and formatting 2016-08-25 10:09:32 +02:00
Murat Kabilov
4e61ef06a8 Add coverage in requirements
Add some tests for patroni ctl
2016-08-24 18:08:23 +02:00