735 Commits

Author SHA1 Message Date
V Aitvaras
ad7a1b8a16 Make it possible to provide datacenter configuration for Consul (#558)
```yaml
consul:
  url: http://consul.host:8500
  token: long-token-here
  dc: dev1-d1
```
2017-11-06 16:44:30 +01:00
Alexander Kukushkin
4daaf2beb0 Perform crash recovery in a single user mode if postgres died as master (#554)
But do it only if pg_rewind is enabled or there is no master at the moment.
Such "crash recovery" procedure was advised by Heikki Linnakangas
2017-11-03 16:22:39 +01:00
Alexander Kukushkin
8d926cbc86 Always send token in X-Consul-Token http header (#555)
Fixes https://github.com/zalando/patroni/issues/552
2017-11-03 16:22:07 +01:00
Alexander Kukushkin
823a4d6b8e Adjust session ttl if supplied value is smaller than minimum possible (#556)
It could happen that ttl provided in Patroni configuration is smaller
than minimum supported by Consul. In such case Consul agent fails to
create a new session and responds with 500 Internal Server Error and
http body contains something like: "Invalid Session TTL '3000000000',
must be between [10s=24h0m0s]". Without session Patroni is not able to
create member and leader keys in the Consul KV store and it means that
cluster becomes completely unhealthy.

As a workaround we will handle such exception, adjust ttl to the minimum
possible and retry session creation.

In addition to that make it possible to define custom log format via environment variable `PATRONI_LOGFORMAT`
2017-11-03 16:21:53 +01:00
Alexander Kukushkin
8e3511ca6b Different minor fixes (#551)
* Use unix line endings
* Make flake8 happy
2017-11-02 16:24:17 +01:00
Alexander Kukushkin
34db670331 Improve test coverage 2017-10-12 15:03:13 +02:00
Alexander Kukushkin
94c52991e0 Set role to uninitialized if data directory was removed in runtime
Fixes https://github.com/zalando/patroni/issues/542
2017-10-12 15:03:13 +02:00
Alexander Kukushkin
cfdda23e27 Fix pg_rewind behaviour (#524)
When Patroni does calculation whether it should run pg_rewind or not, it relies on pg_controldata output or gets necessary information from replication connection.
On some cases (when for example postgres running as a master was killed), we can't use pg_controldata output immediately, but trying to start postgres. Such start could fail with the following errror:
```
LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
LOG,00000,"database system was interrupted; last known up at 2017-09-16 22:35:22 UTC",,,,,,,,,""
LOG,00000,"restored log file ""00000006.history"" from archive",,,,,,,,,""
LOG,00000,"entering standby mode",,,,,,,,,"" 2017-09-18 08:00:39.433 UTC,,,57,,59bf7d26.39,4,,2017-09-18 08:00:38 UTC,,0,LOG,00000,"restored log file ""00000006.history"" from archive",,,,,,,,,""
FATAL,XX000,"requested timeline 6 is not a child of this server's history","Latest checkpoint is at 29/1A000178 on timeline 5, but in the history of the requested timeline, the server forked off from that timeline at 29/1A000140.",,,,,,,,""
LOG,00000,"startup process (PID 57) exited with exit code 1",,,,,,,,,""
LOG,00000,"aborting startup due to startup process failure",,,,,,,,,""
LOG,00000,"database system is shut down",,,,,,,,,""
```
In this case controldata will still have `Database cluster state: in production`
All further attempts to start postgres will fail. Such situation could be fixed only if we start not in recovery. For safety we will do it in a single user mode.

The second problems is: if postgres was running as master, but later we started it and stopped, than pg_controldata will report:
```
Database cluster state:               shut down in recovery
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
```

And this info can't be used for calculations. In this case we should use
`Latest checkpoint location` and `Latest checkpoint's TimeLineID`
2017-09-29 14:21:19 +02:00
Ants Aasma
32b0768631 Fix watchdog on Python 3 (#531)
A misunderstanding of the ioctl() call interface. If mutable=False then fcntl.ioctl() actually returns the arg buffer back.
This accidentally worked on Python2 because int and str comparison did not return an error.
Error reporting is actually done by raising IOError on Python2 and OSError on Python3.

* Properly handle errors in set_timeout(), have them result in only a warning if watchdog support is not required.

* Improve watchdog device driver name display on Python3

* Eliminate race condition in watchdog feature tests.
  The pinged/closed states were not getting reset properly if the checks ran too quickly.
  Add explicit reset points in feature test so the check is unambiguous.
2017-09-29 10:27:10 +02:00
Alexander Kukushkin
8a584f7a61 Set pgpass explicitly to /tmp/pgpass0 when running unit-tests (#518)
If $HOME is set to a non-existing directory (which would e.g. be the case on an official Debian package autobuilder) some tests were failing
2017-09-12 16:07:20 +02:00
Alexander Kukushkin
3919b322f4 Release 1.3.4 (#515)
Fix documentation and update release notes
2017-09-08 10:56:09 +02:00
Alexander Kukushkin
5ef01cfdfa Advanced configuration for Consul (#506)
* possibility to specify client certs and cacert
* possibility to specify token
* compatibility with python-consul-0.7.1
2017-08-24 07:56:12 +02:00
Alexander Kukushkin
23152a7fc4 synchronous_standby_names must be quoted with quote_ident (#505)
in addition to that implement additional checks around manual failover and recover when synchronous_mode is enabled

* Comparison must be case insensitive
2017-08-24 07:55:02 +02:00
Alexander Kukushkin
77aea03df9 Different bugfixes around pause state, mostly related to watchdog (#507)
* Do not send keepalives if watchdog is not active
* Avoid activating watchdog in a pause mode
* Set correct postgres state in pause mode
* Don't try to run queries from API if postgres is stopped
2017-08-24 07:53:32 +02:00
Oleksii Kliukin
9f9acb6a55 Fix a watchdog unit test on OS X. 2017-07-28 16:45:29 +02:00
Alexander Kukushkin
f8b3703d6e Bugfix: failover via API didn't work due to change in _MemberStatus (#489)
Originally fetch_nodes_statuses was returning a tuple, later it was
wrapped into namedtuple _MemberStatus and recently _MemberStatus was
extened with watchdog_failed field, but api.py was still relying on
usual tuple and checking failover limitations on it's own instead of
calling `failover_limitation` method.
2017-07-28 15:38:55 +02:00
Alexander Kukushkin
6300ec4dbf Implement missing tests for watchdog (#487)
and fix one bug
2017-07-27 12:41:46 +02:00
Ants Aasma
70d718a058 Simplify watchdog code (#452)
* Only activate watchdog while master and not paused

We don't really need the protections while we are not master. This way
we only need to tickle the watchdog when we are updating leader key or
while demotion is happening.

As implemented we might fail to notice to shut down the watchdog if
someone demotes postgres and removes leader key behind Patroni's back.
There are probably other similar cases. Basically if the administrator
if being actively stupid they might get unexpected restarts. That seems
fine.

* Add configuration change support. Change MODE_REQUIRED to disable leader eligibility instead of closing Patroni.

Changes watchdog timeout during the next keepalive when ttl is changed. Watchdog driver and requirement can also be switched online.

When watchdog mode is `required` and watchdog setup does not work then the effect is similar to nofailover. Add watchdog_failed to status API to signify this. This is True only when watchdog does not work **AND** it is required.

* Reset implementation when config changed while active.

* Add watchdog safety margin configuration

Defaults to 5 seconds. Basically this is the maximum amount of time
that can pass between the calls to odcs.update_leader()` and
`watchdog.keepalive()`, which are called right after each other. Should
be safe for pretty much any sane scenario and allows the default
settings to not trigger watchdog when DCS is not responding.

* Cancel bootstrap if watchdog activation fails

The system would have demoted itself anyway the next HA loop. Doing it
in bootstrap gives at least some other node chance to try bootstrapping
in the hope that it is configured correctly.

If all nodes are unable to activate they will continue to try until the
disk is filled with moved datadirs. Perhaps not ideal behavior, but as
the situation is unlikely to resolve itself without administrator
intervention it doesn't seem too bad.
2017-07-27 12:16:11 +02:00
Alexander Kukushkin
e2feac87bc Block callbacks during bootstrap (#483)
It wasn't a big issue when on_start was called during normal boostrap
with initdb, because usually such process is very fast. But situation is
changing when we run custom bootstrap, becuase it might be a long time
between cluster become connectable and end of recovery and promote.

Actually situation was even worse than that, on_start was called with
the `replica` argument and later on_role_changes was never called,
because promote wasn't performed by Patroni.

As a solution for this problem we will block any callbacks during
bootstrap and explicitly call on_start after leader lock was taken.
2017-07-24 14:19:19 +02:00
Alexander Kukushkin
cb360f089c Restart postgres after custom bootstrap if hba_file is defined in configuration (#482)
In addition to that always use absolute paths to config files.

Fixes https://github.com/zalando/patroni/issues/481
2017-07-22 09:46:05 +02:00
Alexander Kukushkin
d5b3d94377 Custom bootstrap (#454)
Task of restoring a cluster from backup or cloning existing cluster into a new one was floating around for some time. It was kind of possible to achieve it by doing a lot of manual actions and very error prone. So I come up with the idea of making the way how we bootstrap a new cluster configurable.

In short - we want to run a custom script instead of running initdb.
2017-07-18 15:12:58 +02:00
Alexander Kukushkin
acc6d7c2c2 Watchdog unit-tests, bugfixes and questions (#449)
Implement missing unit-tests for and drop unused code
2017-07-11 10:00:30 +02:00
jouir
4ca94a5dab Add config_dir option for configuration files location (#466)
On debian, the configuration files (postgresql.conf, pg_hba.conf, etc) are not stored in the data directory. It would be great to be able to configure the location of this separate directory. Patroni could override existing configuration files where they are used to be.

The default is to store configuration files in the data directory. This setting is targeting custom installations like debian and any others moving configuration files out of the data directory.

Fixes #465
2017-07-04 16:14:17 +02:00
Alexander Kukushkin
10f2321334 Don't fail if one of DCS implementation can't be loaded (#463)
It might be that it's not required by configuration.
2017-07-03 12:07:19 +02:00
Alexander Kukushkin
b576e69362 Manage pg_hba.conf via patroni config or dynamic_configuration (#458)
So far Patroni was populating pg_hba.conf only when running bootstrap code and after that it was not very handy to manage it's content, because it was necessary to login to every node, change pg_hba.conf manually and run pg_ctl reload.

This commit intends to fix it and give Patroni control over pg_hba.conf. It is possible to define pg_hba.conf content via `postgresql.pg_hba` in the patroni configuration file or in the `DCS/config` (dynamic configuration).

If the `hba_file` is defined in the `postgresql.parameters`, Patroni will ignore `postgresql.pg_hba`.
2017-06-23 12:38:25 +02:00
Alexander Kukushkin
681b6b507b Support unix sockets when connecting to a local postgres cluster (#457)
For backward compatibility this feature is not enabled by default. To enable it you have to set `postgresql.use_unix_socket: true`.
If feature is enable, and `unix_socket_directories` is defined and non empty, Patroni will use the first suitable value from it to connect to the local postgres cluster.
If the `unix_socket_directories` is not defined, Patroni will assume that default value should be used and will not pass `host` to command line arguments and omit it from connection url.

Solves: https://github.com/zalando/patroni/issues/61

In addition to mentioned above, this commit solves couple of bugs:
* manual failover with pg_rewind in a pause state was broken
* psycopg2 (or libpq, I am not really sure what exactly) doesn't mark cursors connection as closed when we use unix socket and there is an `OperationalError` occurs. We will close such connection on our own.
2017-06-22 11:47:57 +02:00
Alexander Kukushkin
3fee62c39b BUGFIX: retry on boto exceptions never worked (#450)
because `boto.exception` is not an excpetion, but a python module.

+ increase retry timeout to 5 minutes
+ refactor unit-tests to cover the case with retries.
2017-06-16 10:27:03 +02:00
Alexander Kukushkin
5bd9aa7547 BUGFIX: pg_rewind wasn't working when data page checksum is not enabled (#456)
pg_controldata output depends on postgres major version and in some cases some of the parameters are prefixed by 'Current ' for old postgres versions.

Bug was introduced by commit 37c1552.
Fixes https://github.com/zalando/patroni/issues/455
2017-06-16 10:25:54 +02:00
Ants Aasma
a70b46ef13 Add watchdog support on Linux (#343)
Ensures that system gets rebooted before TTL runs out.

Initial version. Open questions:

    Do we want to disable watchdog while we are not master?
2017-06-01 16:53:46 +02:00
Alexander Kukushkin
e3a01727a9 Implement missing tests and add pg-10 support to wale_restore(#446)
in addition to that get rid from two modules and fix formatting of tests
2017-05-22 12:01:02 +02:00
Alexander Kukushkin
cd84dc82b6 Implement postgresql-10 support (#444)
Mainly it handles rename of xlog to wal.
In the API and inside DCS it is still named xlog (for compatibility).

* Address feedback
2017-05-19 17:04:53 +02:00
Alexander Kukushkin
7633b19213 Support change of superuser and replication credentials on reload (#445)
Fixes: https://github.com/zalando/patroni/issues/353
and: https://github.com/zalando/patroni/issues/443
2017-05-19 16:32:35 +02:00
Alexander Kukushkin
37c1552c0a Smart pg_rewind (#417)
Previously we were running pg_rewind only in limited amount of cases:
 * when we knew postgres was a master (no recovery.conf in data dir)
 * when we were doing a manual switchover to a specific node (no
   guaranty that this node is the most up-to-date)
 * when a given node has nofailover tag (it could be ahead of new master)

This approach was kind of working in most of the cases, but sometimes we
were executing pg_rewind when it was not necessary and in some other
cases we were not executing it although it was needed.

The main idea of this PR is first try to figure out that we really need
to run pg_rewind by analyzing timelineid, LSN and history file on master
and replica and run it only if it's needed.
2017-05-19 16:32:06 +02:00
Ants Aasma
644b741969 Add config editing to patronictl (#428)
Current UI to change cluster configuration is somewhat unfriendly, involving a curl command, knowing the REST API endpoint, knowing the specific syntax to call it with and writing a JSON document. I added two commands in this branch to make this a bit easier, `show-config` and `edit-config` (names are merely placeholders, any opinions on better ones?).

* `patronictl show-config clustername` fetches the config from DCS, formats it as YAML and outputs it.

* `patronictl edit-config clustername` fetches the config, formats it as YAML, invokes $EDITOR on it, then shows user the diff and after confirmation applies the changed config to DCS, guarding for concurrent modifications.

* `patronictl edit-config clustername --set synchronous_mode=true --set postgresql.use_slots=true` will set the specific key-value pairs.

There are also some UI capabilities I'm less sure of, but included them here as I already implemented them.

* If output is a tty then the diffs are colored. I'm not sure if this feature is cool enough to pull the weight of adding a dependency on cdiff. Or maybe someone knows of another more task focused diff coloring library?
* `patronictl edit-config clustername --pg work_mem=100MB` - Shorthand for `--set postgresql.parameters.work_mem=100MB`
* `patronictl edit-config clustername --apply changes.yaml` - apply changes from a yaml file.
* `patronictl edit-config clustername --replace new-config.yaml` - replace config with new version.
2017-05-19 16:25:21 +02:00
Joar Wandborg
3241ec2504 Use csv.DictReader when parsing wal-e backup-list (#436)
wal-e outputs in CSV format using the 'excel-tab' dialect: 3164de6852/wal_e/operator/backup.py (L63)

The ISO date may be written with a space instead of'T' as delimiter between date
and time, this causes the old parsing to fail.
2017-04-27 14:33:22 +02:00
Alexander Kukushkin
44a7142a9d Synchronous mode strict (#435)
If synchronous_mode_strict==true then '*' will be written as synchronous_standby_names when that last replication host dies.
2017-04-27 14:32:15 +02:00
Ants Aasma
856a13e24c Remove error spinning on etcd failure and reduce log spam (#429)
When all etcd servers refuse connections during watch the call will fail with an exception and will be immediately retried. This creates a huge amount of log spam potentially creating additional issues on top of losing the DCS. This patch takes note if etcd failures are repeating and starting from the second failure will sleep for a second before retrying. It additionally omits the stack trace after the first failure in a streak of failures.
2017-04-20 12:40:15 +02:00
Alex Kerney
1d513e7e04 Release the leader key when the leader restarts with an empty data dir (#420)
* If the leader has an empty data directory it must have been recreated, so release the leader key
2017-04-18 12:45:48 +02:00
Alexander Kukushkin
1c5d5f1dae BUGFIX: pg_drop_replication_slot may not be called if slot is active (#427)
Default value of wal_sender_timeout is 60 seconds while we are trying to remove replication slot after 30 seconds (ttl=30). That means postgres might think that slot is still active and does nothing. Patroni at the same time was thinking that it was removed successfully.

If the drop replication slot query didn't return any single row we must fetch list of existing physical replication slots from postgres on the next iteration of HA loop.

Fixes: issue #425
2017-04-18 12:45:24 +02:00
Oleksii Kliukin
d39f895082 Fix unit tests for Python 3.6 (#431)
Python 3.6 complains about 'AttributeError: 'MockRequest' object has no attribute 'sendall'
2017-04-18 12:44:42 +02:00
Oleksii Kliukin
875e450ff8 Retry when tagging EC2 and EBS with Postgres roles. (#418)
Retry if an error happens when setting Role or Name tags for the
EC2 instances or EBS volumes. The maximum retry interval is 15 seconds.
2017-03-24 16:51:24 +01:00
Alexander Kukushkin
3ece35c0a6 Reassemble postgresql parameters when major version became known (#395)
* Reassemble postgresql parameters when major version became known

Otherwise we were writing some "unknown" parameters into postgresql.conf
and postgres was refusing to start. Only 9.3 was affected.

In addition to that move rename of wal_level from hot_standby to replica
into get_server_parameters method. Now this rename is handled in a
single place.

* Bump etcd and consul versions
2017-02-16 17:07:21 +01:00
Alexander Kukushkin
1ed91a93c6 Handle EtcdEventIndexCleared and EtcdWatcherCleared exceptions (#387)
If this case it doesn't make sense to retry, because it brings nothing
but produces a log of exceptions in the log...
2017-02-16 17:07:09 +01:00
Alexander Kukushkin
c6252bc004 Don't resolve url hostnames manualy but mokey patch urllib3 (#385)
Change hostnames by ip addresses was causing certificate verification to
fail. Instead of doing it we will better monkey patch urllib3
functionality which does name resolution. It should work without
problems even for https connection.
2017-01-18 13:46:02 +01:00
Alexander Kukushkin
711d53980f Call self._load_machines_cache() method on timeout is causing switch to
a new server every 5 minutes
2017-01-12 17:30:18 +01:00
Alexander Kukushkin
8fda957804 Restart former master in readonly only once when partitioned (#370) 2016-12-20 16:41:18 +01:00
Alexander Kukushkin
a5e79bce9d * bugfix: pass an arguments to a callback 2016-12-16 15:44:04 +01:00
Alexander Kukushkin
8c0712047e Serialize callback execution (#366)
If the previous callback is still running - kill it
Also it will fix a problem of zombie processes when executing callbacks from the main thread.
2016-12-16 14:29:53 +01:00
Alexander Kukushkin
1e984c3f00 Take a max from xlog_receive and xlog_replay (#363) 2016-12-12 16:27:36 +01:00
Alexander Kukushkin
d138a8db17 AT for master_start_timeout + minor fixes (#361) 2016-12-09 12:02:41 +01:00