1541 Commits

Author SHA1 Message Date
Oleksii Kliukin
5e7345a2ca Release notes 1.4.5 (#762)
bump version update release notes
v1.4.5
2018-08-03 17:02:11 +02:00
Don Seiler
502094ee79 Log config change or not (#731)
This adds INFO log messages that clearly state if configuration values were seen as changed by Patroni after SIGHUP/reload and warrant reloading (or if nothing was changed an no reloading is necessary).

This ended up being a lot simpler than I had imagined once I found postgresql.py:reload_config().

I add a log line in config.py:reload_local_configuration() since that function will short-circuit the process early if the local config wasn't changed. But the final determination of whether or not values have changed and need reloading is in postgresql.py:reload_config().
2018-08-03 17:00:57 +02:00
Alexander Kukushkin
0c1ae6fbeb Respond 200 to master health-check only if update_lock was successful (#713)
If Patroni gets partitioned it starts receiving stale information from DCS.
We can't use this information to determine that we have the leader key.
Instead, we will record in Ha object the actual state of acquire/update lock and report as a leader only if it was successful.

P.S. despite responding with 200 on `GET /master` postgres was still running read-only.
2018-08-03 17:00:01 +02:00
Alexander Kukushkin
2fd2556050 Set role to demoted if postgres isn't running and no recovery.conf (#757)
In really rare cases it was causing following behavior:
```
2018-07-31 10:35:30,302 INFO: starting as a secondary
2018-07-31 10:35:30,309 INFO: Lock owner: postgresql0; I am postgresql1
2018-07-31 10:35:30,310 INFO: Demoting master during restarting after failure
2018-07-31 10:35:30,381 INFO: postmaster pid=17709
2018-07-31 10:35:30,386 INFO: lost leader lock during restarting after failure
2018-07-31 10:35:30,388 ERROR: Exception during CHECKPOINT
```
2018-08-03 16:59:04 +02:00
Oleksii Kliukin
d47049ce0e Fix condition for the replica start due to pg_rewind in paused state. (#754)
Avoid starting the replica that had already executed pg_rewind before.

Fixes in #753
2018-08-03 16:45:33 +02:00
Henning Jacobs
2f7c53031c Python 3.6 and 3.7 are now supported, too (#752) 2018-07-24 10:51:25 +02:00
Christoph Berg
a2c6ed5504 async is a keyword in python3.7 (#751)
* async is a keyword in python3.7

Setting up patroni (1.4.4-1) ...
    File "/usr/lib/python3/dist-packages/patroni/ha.py", line 610
      'offline':          dict(stop='fast', checkpoint=False, release=False, offline=True, async=False),
                                                                                               ^
  SyntaxError: invalid syntax

Fix #750 by replacing dict member "async" with "async_req".

* requirements.txt: Update for new kubernetes version compatible with 3.7
2018-07-23 20:42:33 +02:00
Oleksii Kliukin
00c2e1c2d0 Grant delete on endpoints and configmaps in RBAC. (#749)
'patronictl remove' deletes the cluster configuration (stored either in configmaps or endpoints) and cannot be run from the postgres pod w/o 'delete' on those objects being granted to the pod service account.
2018-07-23 20:39:46 +03:00
Don Seiler
f5927bad70 Add EnvironmentFile directive (#746)
Add an EnvironmentFile directive to read in a configuration file with environment variables. The "-" prefix means it can proceed if the file doesn't exist.

This would allow users to keep sensitive information like the SUPERUSER/REPLICATION passwords in the config file separate from a YAML file that might be deployed from source control.
2018-07-23 20:31:47 +03:00
Alexander Kukushkin
26466237b9 Update docker-compose example to postgres 10 (#737)
Some other changes are related to the new version of confd, which now
requires specifying etcd url instead of etcd host.
2018-07-23 16:41:17 +02:00
Tony Sorrentino
c8f9199988 Added setting state to "stopped" when a member is stopped in Ha.shutdown (#733)
Changes by @tonys66, review by @CyberDem0n
2018-07-23 14:59:39 +02:00
Alexander Kukushkin
2356af679b Convert query params from list to dict (#744)
Patroni is relying on params to determinte timeout and amount of retries when executing api requests to consul. Starting from v1.1.0 python-consul changed internal API and started
using `list` instead of `dict` to pass query parameters. Such change broke "watch" functionality.

Fixes https://github.com/zalando/patroni/issues/742 and
https://github.com/zalando/patroni/issues/734
2018-07-23 14:56:51 +02:00
Ants Aasma
3b633abd91 Improve logging when stale postmaster.pid matches running process (#738)
Currently the informational message logged is beyond confusing. This
improves the logging so there is some indication what this message is
about and that it is somewhat normal. Changes by @ants
2018-07-17 16:46:22 +02:00
alago197
936a4238fb Update some descriptions for the REST API endpoints (#729)
* Update some descriptions for the REST API endpoints

By @alago197
2018-07-10 15:40:53 +02:00
Don Seiler
50a8114d0b Use enforced minimums in postgresX.yml files (#730)
Fix the discrepancy for the values of max_wal_senders and max_replication_slots between the sample postgres.yml files and hard-coded defaults in Patroni, bumping the former to 10.
Contributed by @dtseiler
2018-07-04 10:08:54 +02:00
Don Seiler
4e8709b266 Adding reload functionality (#726)
This allows the config to be reloaded via `systemctl reload patroni`, sending SIGHUP to the patroni process. Tested on CentOS.
2018-06-30 23:16:42 +02:00
Alexander Kukushkin
4128cba628 max_worker_processes parameter was introduced only in 9.4 (#724)
exclude it from the list on 9.3 when building effective configuration
2018-06-26 13:48:16 +01:00
Don Seiler
959f254bfb Adding patronictl reload functionality to reload from yaml config file (#716)
Fixes https://github.com/zalando/patroni/issues/715
2018-06-20 10:09:10 +02:00
Alexander Kukushkin
8a3b78ca7b Rest api thread can raise an exception during shutdown (#711)
catch it and report
2018-06-14 13:17:50 +02:00
Oleksii Kliukin
41e5f58f2b Describe synchronous_mode_strict (#710)
* Describe synchronous_mode_strict

Per https://github.com/zalando/patroni/issues/709
2018-06-13 11:12:22 +02:00
Dmitry Dolgov
f0d23b0b14 Merge pull request #706 from zalando/feature/rename-create-replica-method
Rename create_replica_method to create_replica_methods
2018-06-12 14:16:54 +02:00
Alexander Kukushkin
cbd0a759c0 Relax kubernetes module version (#701)
Patroni is proven to work with 2.0.0, 3.0.0 and 6.0.0
2018-06-12 14:11:00 +02:00
Alexander Kukushkin
aadd39b0a4 Do crash recovery only when we sure that postgres was running as master (#707)
pg_controldata reports in this case:
* 'in production'
* 'shutting down'
* 'in crash recovery'
2018-06-12 14:09:09 +02:00
Henning Jacobs
2537147810 #694 handle configuration error (#695)
It is possible to change a lot of parameters in runtime (including `restapi.listen`) by updating Patroni config file and sending SIGHUP to Patroni process.

If something was misconfigured it was throwing a weird exception and breaking `restapi` thread.

This PR improves friendliness of error message and avoids breaking of `restapi`.
2018-06-12 14:08:38 +02:00
Alexander Kukushkin
e939304001 Take and apply some parameters from controldata when starting as replica (#703)
* Take and apply some parameters from controldata when starting as replica

https://www.postgresql.org/docs/10/static/hot-standby.html#HOT-STANDBY-ADMIN
There is set of parameters which value on the replica must be not smaller than on the primary, otherwise replica will refuse to start:
* max_connections
* max_prepared_transactions
* max_locks_per_transaction
* max_worker_processes

It might happen that values of these parameters in the global configuration are not set high enough, what makes impossible to start a replica without human intervention. Usually it happens when we bootstrap a new cluster from the basebackup.

As a solution to this problem we will take values of above parameters from the pg_controldata output and in case if the values in the global configuration are not high enough, apply values taken from pg_controldata and set `pending_restart` flag.
2018-06-12 14:04:32 +02:00
Chris Fraser
aa18f70466 If set, use LD_LIBRARY_PATH when starting postgres (#698)
Fixes #697
2018-06-12 14:00:48 +02:00
Alexander Kukushkin
e405e4e03c Workaround to sporadic unit-test failures (#696)
Fixes https://github.com/zalando/patroni/issues/691
2018-06-12 14:00:10 +02:00
erthalion
3d80e49b38 Rename also in settings docs 2018-06-12 13:28:30 +02:00
erthalion
d037aa8afd Rename create_replica_method to create_replica_methods
To make it clear that it's actually an array
2018-06-12 11:33:13 +02:00
Björn Albers
e5f2511764 Add WorkingDirectory to systemd sample config. (#686)
Otherwise `initdb` fails because it tries to create the data directory in the root directory where the postgres user has no permissions.
2018-06-04 16:36:41 +02:00
Alexander Kukushkin
1de7c78c04 Release 1.4.4 (#683)
bump version and update release notes
v1.4.4
2018-05-22 14:46:19 +02:00
Alexander Kukushkin
041015037e Sync replication slots when we noticed a new postmaster process (#677)
Fixes: https://github.com/zalando/patroni/issues/674
2018-05-18 16:32:06 +02:00
Alexander Kukushkin
856552bd61 Sync replication slots and verify sysid after coming out of pause (#678)
Fixes https://github.com/zalando/patroni/issues/568
and https://github.com/zalando/patroni/issues/674
2018-05-18 12:18:49 +02:00
Oleksii Kliukin
4ce539ba1b Allow options to the basebackup built-in method. (#604)
Options should be specified in the basebackup section, which is optional.
2018-05-18 12:18:35 +02:00
Oleksii Kliukin
1043376e6b Do not exit when encountering invalid system ID. (#669)
Do not exit when the cluster system ID is empty or the one that doesn't pass the validation check. In that case, the cluster most likely needs a reinit; mention it in the result message.
Avoid terminating Patroni, as otherwise reinit cannot happen.
2018-05-18 11:48:15 +02:00
Alexander Kukushkin
ed479fe585 Don't demote master if failed to update leader key in pause (#668)
Fixes https://github.com/zalando/patroni/issues/659
2018-05-18 11:19:56 +02:00
Alexander Kukushkin
5ce18a8045 Improve protection of DCS being accidentally wiped (#680)
We already have a lot of logic in place to prevent failover in such case and restore all keys, but an accidental removal of `/config` key was effectively switching off pause mode for 1 cycle of HA loop.
2018-05-18 11:18:58 +02:00
Alexander Kukushkin
5296336f4a BUGFIX: postmaster start can fail if pid from postmaster.pid is alive (#681)
Upon start postmaster process performs various safety checks if there is a postmaster.pid file in the data directory. Although Patroni already detected that the running process corresponding to the postmaster.pid is not a postmaster, the new postmaster might fail to start, because it thinks that postmaster.pid is already locked.
Important!!! Unlink of postmaster.pid isn't an option in this case, because it has a lot of nasty race conditions.
Luckily there is a workaround to this problem, we can pass the pid from postmaster.pid in the `PG_GRANDPARENT_PID` environment variable and postmaster will ignore it.

More likely to hit such problem if you run Patroni and postgres in the docker container.
2018-05-18 11:18:27 +02:00
Cody Coons
3eeb4ed979 Added check for empty subsets (#670)
On Kubernetes 1.10.0 I experienced an issue where calls to `patch_or_create` were failing when bootstraping a cluster. The call was failing because `self._leader_observed_subsets` was `None` instead of `[]`.
2018-04-26 16:38:19 +02:00
Alexander Kukushkin
84f29caf92 Fix race condition in poll_failover_result (#658)
It didn't affect directly neither failover nor switchover, but in some rare cases it was reporting it as a success too early, when the former leader released the lock: `Failed over to "None" instead of "desired-node"`

In addition to that this commit improves logs and status messages by differentiating between failover and switchover.
2018-04-16 17:45:05 +02:00
Alexander Kukushkin
d78790b194 Abort start if attaching to running postgres and cluster not initiazlied (#661)
Patroni can attach itself to an already running PostgreSQL instance. If that is the first instance "seen" in the given cluster, Patroni for that instance will create the initialize key, grab the leader key and, if the instance is running a replica, promote.

Because of this behavior, when a cluster with a master and one or more replicas gets Patroni for each node, it is imperative to start running Patroni on the master node before getting to the replicas.

This commit changes such weird behavior and will abort Patroni start if there is no initialize key in DCS and postgres is running as a replica.

Closes https://github.com/zalando/patroni/issues/655
2018-04-16 17:32:26 +02:00
Kostiantyn Nemchenko
3110090154 Minor corrections to the documentation. (#654) 2018-04-16 15:46:46 +02:00
Reinhard Nägele
20138af37a Link to official Helm chart (#660)
Changes the link from my outdated fork to the official Helm chart which is now up to date.
2018-04-16 15:45:53 +02:00
Dave Cramer
38ad394308 Use the word primary in favour of master (#663)
Primary is a better alternative.
2018-04-16 01:29:51 +02:00
Alexander Kukushkin
e375fac273 Treat postgres settings parameter names as case insensitive (#650)
Because they are indeed case insensitive.
Most of the parameters have snake_case_name, but there are three exceptions from this rule: DateStyle, IntervalStyle and TimeZone.
In fact, if you specify timezone = 'some/tzn' it still works, but Patroni wasn't able to find 'timezone' in pg_settings and stripping this parameter out.

We will use CaseInsensitiveDict to keep postgresql.parameters. This change affects only "final" configuration. That means if you put some"duplicates" (work_mem vs WORK_MEM) into patroni yaml or into cluster config, it would be resolved only at the last stage and for example you will be able to see both values if you use `patronictl edit-config`.

Fixes https://github.com/zalando/patroni/issues/649
2018-04-04 14:23:53 +02:00
Alexander Kukushkin
8c795ff0cf Pass dict object to touch_member instead of json encoded string (#651)
DCS implementation will take care about encoding it.
Fixes https://github.com/zalando/patroni/issues/642
2018-04-04 13:45:44 +02:00
Don Seiler
140618abd2 Missing a word (#647)
In re Issue #639
2018-04-04 13:40:46 +02:00
Josh Berkus
3c05e2e984 Added references to the Slack channel in Readme and in contributing.rst. (#653) 2018-04-04 13:39:43 +02:00
bradnicholson
ca679a93b8 Make deleting recovery.conf optional. (#638)
pgBackRest's restore command generates the appropriate recovery.conf based
on the parameters you provide to pgBackRest.  When calling pgBackRest's restore command
via Patroni's custom bootstrap, it deletes that recovery.conf.  Specifying the recovery.conf
information in the patroni.yml is less than ideal.  It prevent's leveraging pgBackRests
work to ensure recovery.conf files are properly generated.  It also can lead to transient
config data in the patroni.yml under certain restore cases, such as a PITR restore
of Cluster B to  Cluster A, where the restore_commnand in A needs to reference B.

The parameter is optional.  The default behavior is to delete the recovery.conf.

Fixes https://github.com/zalando/patroni/issues/637
2018-03-09 15:35:29 +01:00
Alexander Kukushkin
f500dbb0ff Release 1.4.3 (#635)
Bump version and update release notes
v1.4.3
2018-03-05 10:10:17 +01:00