patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-27 10:20:10 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	4d9720a78d	Use pg10 as base image and python3 instead of 2	2018-09-12 16:57:22 +02:00
Alexander Kukushkin	1f205f8191	Merge branch 'master' of github.com:zalando/patroni into feature/demo	2018-09-12 16:44:02 +02:00
Pavel Kirillov	2e9cb412e4	Register service in consul (#802 ) Кegister service 'scope_name' with tag 'master' or 'replica' example with scope 'pgsql-pgpi' ```[root@pgpi1 ~]# host -t SRV pgsql-pgpi.service.consul. 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi1.node.dc.consul. pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi2.node.dc.consul. [root@pgpi1 ~]# host -t SRV master.pgsql-pgpi.service.consul. 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: master.pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi2.node.dc.consul. [root@pgpi1 ~]# host -t SRV replica.pgsql-pgpi.service.consul. 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: replica.pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi1.node.dc.consul.``` Fixes: https://github.com/zalando/patroni/issues/771	2018-09-07 15:17:56 +02:00
Dmitry Dolgov	dd7c3c349f	[WIP] Standby cluster implementation (#679 ) Implementation of "standby cluster" described in #657. Standby cluster consists of a "standby leader", that replicates from a "remote master" (which is not a part of current patroni cluster and can be anywhere), and cascade replicas, that replicate from the corresponding standby leader. "Standby leader" behaves pretty much like a regular leader, which means that it holds a leader lock in DSC, in case if disappears there will be an election of a new "standby leader". One can define such a cluster using the section "standby_cluster" in patroni config file. This section provides parameters for standby cluster, that will be applied only once during bootstrap and can be changed only through DSC.	2018-09-07 10:10:56 +02:00
Alexander Kukushkin	4ca8a6e506	Make retries of calls to DCS consistent across implementations (#805 ) in addition to that do a small refactoring of zookeeper and consul and try to improve the stability of AT	2018-09-06 08:37:26 +02:00
wilfriedroset	0136f252ab	Add patronictl -k/--insecure flag and suport for restapi cert (#790 ) Fixes https://github.com/zalando/patroni/issues/785	2018-08-29 16:08:13 +02:00
anikin-aa	0e13677880	exclude members with nofailover tag (#798 ) Exclude members with nofailover tag from `patronictl switchover/failover` output. Fixes https://github.com/zalando/patroni/issues/769	2018-08-29 16:07:01 +02:00
anikin-aa	68c0d87d42	check if output lines of controldata are possible to split (#797 ) Otherwise it fails with scary stacktrace	2018-08-29 16:06:06 +02:00
Alexander Kukushkin	90cf930036	Refactor REST API health-checks (#779 ) Make it more readable and easy to understand. Mostly it is needed to implement https://github.com/zalando/patroni/issues/772	2018-08-29 11:35:22 +02:00
Jan Mussler	2b87ae0cd0	Add member name to error message. (#792 ) Analog to success message.	2018-08-29 11:30:51 +02:00
Alexander Kukushkin	5ca5dacaa9	Immediately reserve LSN on upon creation of replication slot (#783 ) This feature is available starting from 9.6	2018-08-29 11:30:01 +02:00
Alexander Kukushkin	87e9aab04c	Improve tests (#778 ) * Implement missing unit-tests * Add acceptance tests for ISSUE #776 * Update list of classifiers, keywords and authors	2018-08-29 11:29:37 +02:00
Alexander Kukushkin	518df2bc49	Search new sync candidate amoung potential and async standbys (#794 ) In synchronos_mode_strict we put '*' into synchronos_standby_names, what makes one connection 'sync' and other connections 'potential'. The code picking up the correct sync standby didn't consider 'potential' as a good candidate. Fixes: https://github.com/zalando/patroni/issues/789	2018-08-29 11:28:46 +02:00
Alexander Kukushkin	a513a7bb68	Improve stability of acceptance tests (#780 ) last time tests were failing due to postgres/patroni slowness in picking sync standby	2018-08-29 11:13:18 +02:00
Alexander Kukushkin	715caaddf3	Remove .zappr.yaml (#795 ) and switch to github approvals	2018-08-29 11:09:35 +02:00
Oleksii Kliukin	b165183503	Reset is_leader status on demote (#777 ) Make sure demoted cluster member stops responding with code 200 on the /master API call. Issue a new minor release. Fixes https://github.com/zalando/patroni/issues/776 v1.4.6	2018-08-14 17:08:08 +02:00
Dmitry Dolgov	b282a0f254	Add "cluster_unlocked" field (#764 ) Add a field to an api to figure out if a master is there from patroni point of view. It can be useful, when you have an alert, based on Auto Scaling Groups, and then ASG decided to shutdown the current master, spin up a new instance but the current master shutdown is stuck. In this situation the current master is no longer a part of ASG, but patroni and Postgres are still alive on the instance, which means a new replica will not be promoted yet - this will lead to a false alert, saying that your cluster doesn't have any master node.	2018-08-13 14:02:01 +02:00
Oleksii Kliukin	5e7345a2ca	Release notes 1.4.5 (#762 ) bump version update release notes v1.4.5	2018-08-03 17:02:11 +02:00
Don Seiler	502094ee79	Log config change or not (#731 ) This adds INFO log messages that clearly state if configuration values were seen as changed by Patroni after SIGHUP/reload and warrant reloading (or if nothing was changed an no reloading is necessary). This ended up being a lot simpler than I had imagined once I found postgresql.py:reload_config(). I add a log line in config.py:reload_local_configuration() since that function will short-circuit the process early if the local config wasn't changed. But the final determination of whether or not values have changed and need reloading is in postgresql.py:reload_config().	2018-08-03 17:00:57 +02:00
Alexander Kukushkin	0c1ae6fbeb	Respond 200 to master health-check only if update_lock was successful (#713 ) If Patroni gets partitioned it starts receiving stale information from DCS. We can't use this information to determine that we have the leader key. Instead, we will record in Ha object the actual state of acquire/update lock and report as a leader only if it was successful. P.S. despite responding with 200 on `GET /master` postgres was still running read-only.	2018-08-03 17:00:01 +02:00
Alexander Kukushkin	2fd2556050	Set role to demoted if postgres isn't running and no recovery.conf (#757 ) In really rare cases it was causing following behavior: ``` 2018-07-31 10:35:30,302 INFO: starting as a secondary 2018-07-31 10:35:30,309 INFO: Lock owner: postgresql0; I am postgresql1 2018-07-31 10:35:30,310 INFO: Demoting master during restarting after failure 2018-07-31 10:35:30,381 INFO: postmaster pid=17709 2018-07-31 10:35:30,386 INFO: lost leader lock during restarting after failure 2018-07-31 10:35:30,388 ERROR: Exception during CHECKPOINT ```	2018-08-03 16:59:04 +02:00
Oleksii Kliukin	d47049ce0e	Fix condition for the replica start due to pg_rewind in paused state. (#754 ) Avoid starting the replica that had already executed pg_rewind before. Fixes in #753	2018-08-03 16:45:33 +02:00
Henning Jacobs	2f7c53031c	Python 3.6 and 3.7 are now supported, too (#752 )	2018-07-24 10:51:25 +02:00
Christoph Berg	a2c6ed5504	async is a keyword in python3.7 (#751 ) * async is a keyword in python3.7 Setting up patroni (1.4.4-1) ... File "/usr/lib/python3/dist-packages/patroni/ha.py", line 610 'offline': dict(stop='fast', checkpoint=False, release=False, offline=True, async=False), ^ SyntaxError: invalid syntax Fix #750 by replacing dict member "async" with "async_req". * requirements.txt: Update for new kubernetes version compatible with 3.7	2018-07-23 20:42:33 +02:00
Oleksii Kliukin	00c2e1c2d0	Grant delete on endpoints and configmaps in RBAC. (#749 ) 'patronictl remove' deletes the cluster configuration (stored either in configmaps or endpoints) and cannot be run from the postgres pod w/o 'delete' on those objects being granted to the pod service account.	2018-07-23 20:39:46 +03:00
Don Seiler	f5927bad70	Add EnvironmentFile directive (#746 ) Add an EnvironmentFile directive to read in a configuration file with environment variables. The "-" prefix means it can proceed if the file doesn't exist. This would allow users to keep sensitive information like the SUPERUSER/REPLICATION passwords in the config file separate from a YAML file that might be deployed from source control.	2018-07-23 20:31:47 +03:00
Alexander Kukushkin	26466237b9	Update docker-compose example to postgres 10 (#737 ) Some other changes are related to the new version of confd, which now requires specifying etcd url instead of etcd host.	2018-07-23 16:41:17 +02:00
Tony Sorrentino	c8f9199988	Added setting state to "stopped" when a member is stopped in Ha.shutdown (#733 ) Changes by @tonys66, review by @CyberDem0n	2018-07-23 14:59:39 +02:00
Alexander Kukushkin	2356af679b	Convert query params from list to dict (#744 ) Patroni is relying on params to determinte timeout and amount of retries when executing api requests to consul. Starting from v1.1.0 python-consul changed internal API and started using `list` instead of `dict` to pass query parameters. Such change broke "watch" functionality. Fixes https://github.com/zalando/patroni/issues/742 and https://github.com/zalando/patroni/issues/734	2018-07-23 14:56:51 +02:00
Ants Aasma	3b633abd91	Improve logging when stale postmaster.pid matches running process (#738 ) Currently the informational message logged is beyond confusing. This improves the logging so there is some indication what this message is about and that it is somewhat normal. Changes by @ants	2018-07-17 16:46:22 +02:00
alago197	936a4238fb	Update some descriptions for the REST API endpoints (#729 ) * Update some descriptions for the REST API endpoints By @alago197	2018-07-10 15:40:53 +02:00
Don Seiler	50a8114d0b	Use enforced minimums in postgresX.yml files (#730 ) Fix the discrepancy for the values of max_wal_senders and max_replication_slots between the sample postgres.yml files and hard-coded defaults in Patroni, bumping the former to 10. Contributed by @dtseiler	2018-07-04 10:08:54 +02:00
Don Seiler	4e8709b266	Adding reload functionality (#726 ) This allows the config to be reloaded via `systemctl reload patroni`, sending SIGHUP to the patroni process. Tested on CentOS.	2018-06-30 23:16:42 +02:00
Alexander Kukushkin	4128cba628	max_worker_processes parameter was introduced only in 9.4 (#724 ) exclude it from the list on 9.3 when building effective configuration	2018-06-26 13:48:16 +01:00
Don Seiler	959f254bfb	Adding patronictl reload functionality to reload from yaml config file (#716 ) Fixes https://github.com/zalando/patroni/issues/715	2018-06-20 10:09:10 +02:00
Alexander Kukushkin	8a3b78ca7b	Rest api thread can raise an exception during shutdown (#711 ) catch it and report	2018-06-14 13:17:50 +02:00
Oleksii Kliukin	41e5f58f2b	Describe synchronous_mode_strict (#710 ) * Describe synchronous_mode_strict Per https://github.com/zalando/patroni/issues/709	2018-06-13 11:12:22 +02:00
Dmitry Dolgov	f0d23b0b14	Merge pull request #706 from zalando/feature/rename-create-replica-method Rename create_replica_method to create_replica_methods	2018-06-12 14:16:54 +02:00
Alexander Kukushkin	cbd0a759c0	Relax kubernetes module version (#701 ) Patroni is proven to work with 2.0.0, 3.0.0 and 6.0.0	2018-06-12 14:11:00 +02:00
Alexander Kukushkin	aadd39b0a4	Do crash recovery only when we sure that postgres was running as master (#707 ) pg_controldata reports in this case: * 'in production' * 'shutting down' * 'in crash recovery'	2018-06-12 14:09:09 +02:00
Henning Jacobs	2537147810	#694 handle configuration error (#695 ) It is possible to change a lot of parameters in runtime (including `restapi.listen`) by updating Patroni config file and sending SIGHUP to Patroni process. If something was misconfigured it was throwing a weird exception and breaking `restapi` thread. This PR improves friendliness of error message and avoids breaking of `restapi`.	2018-06-12 14:08:38 +02:00
Alexander Kukushkin	e939304001	Take and apply some parameters from controldata when starting as replica (#703 ) * Take and apply some parameters from controldata when starting as replica https://www.postgresql.org/docs/10/static/hot-standby.html#HOT-STANDBY-ADMIN There is set of parameters which value on the replica must be not smaller than on the primary, otherwise replica will refuse to start: * max_connections * max_prepared_transactions * max_locks_per_transaction * max_worker_processes It might happen that values of these parameters in the global configuration are not set high enough, what makes impossible to start a replica without human intervention. Usually it happens when we bootstrap a new cluster from the basebackup. As a solution to this problem we will take values of above parameters from the pg_controldata output and in case if the values in the global configuration are not high enough, apply values taken from pg_controldata and set `pending_restart` flag.	2018-06-12 14:04:32 +02:00
Chris Fraser	aa18f70466	If set, use LD_LIBRARY_PATH when starting postgres (#698 ) Fixes #697	2018-06-12 14:00:48 +02:00
Alexander Kukushkin	e405e4e03c	Workaround to sporadic unit-test failures (#696 ) Fixes https://github.com/zalando/patroni/issues/691	2018-06-12 14:00:10 +02:00
erthalion	3d80e49b38	Rename also in settings docs	2018-06-12 13:28:30 +02:00
erthalion	d037aa8afd	Rename create_replica_method to create_replica_methods To make it clear that it's actually an array	2018-06-12 11:33:13 +02:00
Björn Albers	e5f2511764	Add WorkingDirectory to systemd sample config. (#686 ) Otherwise `initdb` fails because it tries to create the data directory in the root directory where the postgres user has no permissions.	2018-06-04 16:36:41 +02:00
Alexander Kukushkin	1de7c78c04	Release 1.4.4 (#683 ) bump version and update release notes v1.4.4	2018-05-22 14:46:19 +02:00
Alexander Kukushkin	041015037e	Sync replication slots when we noticed a new postmaster process (#677 ) Fixes: https://github.com/zalando/patroni/issues/674	2018-05-18 16:32:06 +02:00
Alexander Kukushkin	856552bd61	Sync replication slots and verify sysid after coming out of pause (#678 ) Fixes https://github.com/zalando/patroni/issues/568 and https://github.com/zalando/patroni/issues/674	2018-05-18 12:18:49 +02:00

1 2 3 4 5 ...

1560 Commits