Commit Graph

1560 Commits

Author SHA1 Message Date
Alexander Kukushkin
4d9720a78d Use pg10 as base image and python3 instead of 2 2018-09-12 16:57:22 +02:00
Alexander Kukushkin
1f205f8191 Merge branch 'master' of github.com:zalando/patroni into feature/demo 2018-09-12 16:44:02 +02:00
Pavel Kirillov
2e9cb412e4 Register service in consul (#802)
Кegister service 'scope_name' with tag 'master' or 'replica'

example with scope 'pgsql-pgpi'
```[root@pgpi1 ~]# host -t SRV pgsql-pgpi.service.consul. 127.0.0.1
Using domain server:
Name: 127.0.0.1
Address: 127.0.0.1#53
Aliases:

pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi1.node.dc.consul.
pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi2.node.dc.consul.
[root@pgpi1 ~]# host -t SRV master.pgsql-pgpi.service.consul. 127.0.0.1
Using domain server:
Name: 127.0.0.1
Address: 127.0.0.1#53
Aliases:

master.pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi2.node.dc.consul.
[root@pgpi1 ~]# host -t SRV replica.pgsql-pgpi.service.consul. 127.0.0.1
Using domain server:
Name: 127.0.0.1
Address: 127.0.0.1#53
Aliases:

replica.pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi1.node.dc.consul.```

Fixes: https://github.com/zalando/patroni/issues/771
2018-09-07 15:17:56 +02:00
Dmitry Dolgov
dd7c3c349f [WIP] Standby cluster implementation (#679)
Implementation of "standby cluster" described in #657. Standby cluster consists
of a "standby leader", that replicates from a "remote master" (which is not a
part of current patroni cluster and can be anywhere), and cascade replicas,
that replicate from the corresponding standby leader. "Standby leader" behaves
pretty much like a regular leader, which means that it holds a leader lock in
DSC, in case if disappears there will be an election of a new "standby
leader".
One can define such a cluster using the section "standby_cluster" in patroni
config file. This section provides parameters for standby cluster, that will be
applied only once during bootstrap and can be changed only through DSC.
2018-09-07 10:10:56 +02:00
Alexander Kukushkin
4ca8a6e506 Make retries of calls to DCS consistent across implementations (#805)
in addition to that do a small refactoring of zookeeper and consul and try to improve the stability of AT
2018-09-06 08:37:26 +02:00
wilfriedroset
0136f252ab Add patronictl -k/--insecure flag and suport for restapi cert (#790)
Fixes https://github.com/zalando/patroni/issues/785
2018-08-29 16:08:13 +02:00
anikin-aa
0e13677880 exclude members with nofailover tag (#798)
Exclude members with nofailover tag from `patronictl switchover/failover` output.
Fixes https://github.com/zalando/patroni/issues/769
2018-08-29 16:07:01 +02:00
anikin-aa
68c0d87d42 check if output lines of controldata are possible to split (#797)
Otherwise it fails with scary stacktrace
2018-08-29 16:06:06 +02:00
Alexander Kukushkin
90cf930036 Refactor REST API health-checks (#779)
Make it more readable and easy to understand.
Mostly it is needed to implement https://github.com/zalando/patroni/issues/772
2018-08-29 11:35:22 +02:00
Jan Mussler
2b87ae0cd0 Add member name to error message. (#792)
Analog to success message.
2018-08-29 11:30:51 +02:00
Alexander Kukushkin
5ca5dacaa9 Immediately reserve LSN on upon creation of replication slot (#783)
This feature is available starting from 9.6
2018-08-29 11:30:01 +02:00
Alexander Kukushkin
87e9aab04c Improve tests (#778)
* Implement missing unit-tests
* Add acceptance tests for ISSUE #776
* Update list of classifiers, keywords and authors
2018-08-29 11:29:37 +02:00
Alexander Kukushkin
518df2bc49 Search new sync candidate amoung potential and async standbys (#794)
In synchronos_mode_strict we put '*' into synchronos_standby_names, what makes one connection 'sync' and other connections 'potential'.
The code picking up the correct sync standby didn't consider 'potential' as a good candidate.

Fixes: https://github.com/zalando/patroni/issues/789
2018-08-29 11:28:46 +02:00
Alexander Kukushkin
a513a7bb68 Improve stability of acceptance tests (#780)
last time tests were failing due to postgres/patroni slowness in picking sync standby
2018-08-29 11:13:18 +02:00
Alexander Kukushkin
715caaddf3 Remove .zappr.yaml (#795)
and switch to github approvals
2018-08-29 11:09:35 +02:00
Oleksii Kliukin
b165183503 Reset is_leader status on demote (#777)
Make sure demoted cluster member stops responding with code 200 on the  /master API call.

Issue a new minor release.

Fixes https://github.com/zalando/patroni/issues/776
v1.4.6
2018-08-14 17:08:08 +02:00
Dmitry Dolgov
b282a0f254 Add "cluster_unlocked" field (#764)
Add a field to an api to figure out if a master is there from patroni point
of view. It can be useful, when you have an alert, based on Auto Scaling
Groups, and then ASG decided to shutdown the current master, spin up a
new instance but the current master shutdown is stuck. In this situation
the current master is no longer a part of ASG, but patroni and Postgres
are still alive on the instance, which means a new replica will not be
promoted yet - this will lead to a false alert, saying that your cluster
doesn't have any master node.
2018-08-13 14:02:01 +02:00
Oleksii Kliukin
5e7345a2ca Release notes 1.4.5 (#762)
bump version update release notes
v1.4.5
2018-08-03 17:02:11 +02:00
Don Seiler
502094ee79 Log config change or not (#731)
This adds INFO log messages that clearly state if configuration values were seen as changed by Patroni after SIGHUP/reload and warrant reloading (or if nothing was changed an no reloading is necessary).

This ended up being a lot simpler than I had imagined once I found postgresql.py:reload_config().

I add a log line in config.py:reload_local_configuration() since that function will short-circuit the process early if the local config wasn't changed. But the final determination of whether or not values have changed and need reloading is in postgresql.py:reload_config().
2018-08-03 17:00:57 +02:00
Alexander Kukushkin
0c1ae6fbeb Respond 200 to master health-check only if update_lock was successful (#713)
If Patroni gets partitioned it starts receiving stale information from DCS.
We can't use this information to determine that we have the leader key.
Instead, we will record in Ha object the actual state of acquire/update lock and report as a leader only if it was successful.

P.S. despite responding with 200 on `GET /master` postgres was still running read-only.
2018-08-03 17:00:01 +02:00
Alexander Kukushkin
2fd2556050 Set role to demoted if postgres isn't running and no recovery.conf (#757)
In really rare cases it was causing following behavior:
```
2018-07-31 10:35:30,302 INFO: starting as a secondary
2018-07-31 10:35:30,309 INFO: Lock owner: postgresql0; I am postgresql1
2018-07-31 10:35:30,310 INFO: Demoting master during restarting after failure
2018-07-31 10:35:30,381 INFO: postmaster pid=17709
2018-07-31 10:35:30,386 INFO: lost leader lock during restarting after failure
2018-07-31 10:35:30,388 ERROR: Exception during CHECKPOINT
```
2018-08-03 16:59:04 +02:00
Oleksii Kliukin
d47049ce0e Fix condition for the replica start due to pg_rewind in paused state. (#754)
Avoid starting the replica that had already executed pg_rewind before.

Fixes in #753
2018-08-03 16:45:33 +02:00
Henning Jacobs
2f7c53031c Python 3.6 and 3.7 are now supported, too (#752) 2018-07-24 10:51:25 +02:00
Christoph Berg
a2c6ed5504 async is a keyword in python3.7 (#751)
* async is a keyword in python3.7

Setting up patroni (1.4.4-1) ...
    File "/usr/lib/python3/dist-packages/patroni/ha.py", line 610
      'offline':          dict(stop='fast', checkpoint=False, release=False, offline=True, async=False),
                                                                                               ^
  SyntaxError: invalid syntax

Fix #750 by replacing dict member "async" with "async_req".

* requirements.txt: Update for new kubernetes version compatible with 3.7
2018-07-23 20:42:33 +02:00
Oleksii Kliukin
00c2e1c2d0 Grant delete on endpoints and configmaps in RBAC. (#749)
'patronictl remove' deletes the cluster configuration (stored either in configmaps or endpoints) and cannot be run from the postgres pod w/o 'delete' on those objects being granted to the pod service account.
2018-07-23 20:39:46 +03:00
Don Seiler
f5927bad70 Add EnvironmentFile directive (#746)
Add an EnvironmentFile directive to read in a configuration file with environment variables. The "-" prefix means it can proceed if the file doesn't exist.

This would allow users to keep sensitive information like the SUPERUSER/REPLICATION passwords in the config file separate from a YAML file that might be deployed from source control.
2018-07-23 20:31:47 +03:00
Alexander Kukushkin
26466237b9 Update docker-compose example to postgres 10 (#737)
Some other changes are related to the new version of confd, which now
requires specifying etcd url instead of etcd host.
2018-07-23 16:41:17 +02:00
Tony Sorrentino
c8f9199988 Added setting state to "stopped" when a member is stopped in Ha.shutdown (#733)
Changes by @tonys66, review by @CyberDem0n
2018-07-23 14:59:39 +02:00
Alexander Kukushkin
2356af679b Convert query params from list to dict (#744)
Patroni is relying on params to determinte timeout and amount of retries when executing api requests to consul. Starting from v1.1.0 python-consul changed internal API and started
using `list` instead of `dict` to pass query parameters. Such change broke "watch" functionality.

Fixes https://github.com/zalando/patroni/issues/742 and
https://github.com/zalando/patroni/issues/734
2018-07-23 14:56:51 +02:00
Ants Aasma
3b633abd91 Improve logging when stale postmaster.pid matches running process (#738)
Currently the informational message logged is beyond confusing. This
improves the logging so there is some indication what this message is
about and that it is somewhat normal. Changes by @ants
2018-07-17 16:46:22 +02:00
alago197
936a4238fb Update some descriptions for the REST API endpoints (#729)
* Update some descriptions for the REST API endpoints

By @alago197
2018-07-10 15:40:53 +02:00
Don Seiler
50a8114d0b Use enforced minimums in postgresX.yml files (#730)
Fix the discrepancy for the values of max_wal_senders and max_replication_slots between the sample postgres.yml files and hard-coded defaults in Patroni, bumping the former to 10.
Contributed by @dtseiler
2018-07-04 10:08:54 +02:00
Don Seiler
4e8709b266 Adding reload functionality (#726)
This allows the config to be reloaded via `systemctl reload patroni`, sending SIGHUP to the patroni process. Tested on CentOS.
2018-06-30 23:16:42 +02:00
Alexander Kukushkin
4128cba628 max_worker_processes parameter was introduced only in 9.4 (#724)
exclude it from the list on 9.3 when building effective configuration
2018-06-26 13:48:16 +01:00
Don Seiler
959f254bfb Adding patronictl reload functionality to reload from yaml config file (#716)
Fixes https://github.com/zalando/patroni/issues/715
2018-06-20 10:09:10 +02:00
Alexander Kukushkin
8a3b78ca7b Rest api thread can raise an exception during shutdown (#711)
catch it and report
2018-06-14 13:17:50 +02:00
Oleksii Kliukin
41e5f58f2b Describe synchronous_mode_strict (#710)
* Describe synchronous_mode_strict

Per https://github.com/zalando/patroni/issues/709
2018-06-13 11:12:22 +02:00
Dmitry Dolgov
f0d23b0b14 Merge pull request #706 from zalando/feature/rename-create-replica-method
Rename create_replica_method to create_replica_methods
2018-06-12 14:16:54 +02:00
Alexander Kukushkin
cbd0a759c0 Relax kubernetes module version (#701)
Patroni is proven to work with 2.0.0, 3.0.0 and 6.0.0
2018-06-12 14:11:00 +02:00
Alexander Kukushkin
aadd39b0a4 Do crash recovery only when we sure that postgres was running as master (#707)
pg_controldata reports in this case:
* 'in production'
* 'shutting down'
* 'in crash recovery'
2018-06-12 14:09:09 +02:00
Henning Jacobs
2537147810 #694 handle configuration error (#695)
It is possible to change a lot of parameters in runtime (including `restapi.listen`) by updating Patroni config file and sending SIGHUP to Patroni process.

If something was misconfigured it was throwing a weird exception and breaking `restapi` thread.

This PR improves friendliness of error message and avoids breaking of `restapi`.
2018-06-12 14:08:38 +02:00
Alexander Kukushkin
e939304001 Take and apply some parameters from controldata when starting as replica (#703)
* Take and apply some parameters from controldata when starting as replica

https://www.postgresql.org/docs/10/static/hot-standby.html#HOT-STANDBY-ADMIN
There is set of parameters which value on the replica must be not smaller than on the primary, otherwise replica will refuse to start:
* max_connections
* max_prepared_transactions
* max_locks_per_transaction
* max_worker_processes

It might happen that values of these parameters in the global configuration are not set high enough, what makes impossible to start a replica without human intervention. Usually it happens when we bootstrap a new cluster from the basebackup.

As a solution to this problem we will take values of above parameters from the pg_controldata output and in case if the values in the global configuration are not high enough, apply values taken from pg_controldata and set `pending_restart` flag.
2018-06-12 14:04:32 +02:00
Chris Fraser
aa18f70466 If set, use LD_LIBRARY_PATH when starting postgres (#698)
Fixes #697
2018-06-12 14:00:48 +02:00
Alexander Kukushkin
e405e4e03c Workaround to sporadic unit-test failures (#696)
Fixes https://github.com/zalando/patroni/issues/691
2018-06-12 14:00:10 +02:00
erthalion
3d80e49b38 Rename also in settings docs 2018-06-12 13:28:30 +02:00
erthalion
d037aa8afd Rename create_replica_method to create_replica_methods
To make it clear that it's actually an array
2018-06-12 11:33:13 +02:00
Björn Albers
e5f2511764 Add WorkingDirectory to systemd sample config. (#686)
Otherwise `initdb` fails because it tries to create the data directory in the root directory where the postgres user has no permissions.
2018-06-04 16:36:41 +02:00
Alexander Kukushkin
1de7c78c04 Release 1.4.4 (#683)
bump version and update release notes
v1.4.4
2018-05-22 14:46:19 +02:00
Alexander Kukushkin
041015037e Sync replication slots when we noticed a new postmaster process (#677)
Fixes: https://github.com/zalando/patroni/issues/674
2018-05-18 16:32:06 +02:00
Alexander Kukushkin
856552bd61 Sync replication slots and verify sysid after coming out of pause (#678)
Fixes https://github.com/zalando/patroni/issues/568
and https://github.com/zalando/patroni/issues/674
2018-05-18 12:18:49 +02:00