patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-27 18:20:05 +00:00

Author	SHA1	Message	Date
Julien Riou	663026c34c	Use SSLContext to wrap REST API socket (#1039 ) Using `ssl.wrap_socket` is deprecated and was still allowing soon-to-be-deprecated protocols like TLS 1.1. Now using `SSLContext.create_default_context()` to produce a secure SSL context to wrap the REST API server's socket.	2019-04-23 11:23:22 +02:00
Alexander Kukushkin	51b085a76d	Don't wait until the previous callback finish is kill failed (#1036 ) Such wait was happening in the main thread and blocking HA loop. After all the executor thread was doing absolutely the same.	2019-04-15 15:49:06 +02:00
Alexander Kukushkin	7c0c9599fc	Remove psycopg2 from requirements (#1023 ) Recently released psycopg2 split into two different packages, psycopg2, and psycopg2-binary which could be installed at the same time into the same place on the filesystem. In order to decrease dependency hell problem, we let a user choose how to install psycopg2. There are a few options available and it is reflected in the documentation. This PR also changes the following behavior: * `pip install patroni` will fail if psycopg2 is not installed * Patroni will check psycopg2 upon start and fail if it can't be found or outdated. Closes https://github.com/zalando/patroni/issues/1021	2019-04-15 14:30:16 +02:00
Pavlo Golub	b53a29c022	Fix unit-tests for Windows (#1014 ) Closes #1013	2019-04-02 13:58:17 +02:00
Alexander Kukushkin	e38fe78b56	Fix callbacks behavior (mostly for standby cluster) (#998 ) First of all, this patch changes the behavior of `on_start`/`on_restart` callbacks, they will be called only when postgres is started or restarted without role changes. In case if the member is promoted or demoted only the `on_role_change` callback will be executed. `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument. Before that `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument. In addition to that, the REST API will return standby_leader role for the leader of the standby cluster. Closes https://github.com/zalando/patroni/issues/988	2019-03-29 10:28:07 +01:00
Alexander Kukushkin	680444ae13	Reduce lock time taken by dcs.get_cluster() (#989 ) `dcs.cluster` and `dcs.get_cluster()` are using the same lock resource and therefore when get_cluster call is slow due to the slowness of DCS it was also affecting the `dcs.cluster` call, which in return was making health-check requests slow.	2019-03-12 22:37:11 +01:00
Alexander Kukushkin	92720882aa	Reset is_leader flag for every removal of leader key (#990 ) This is the next improvement of #777	2019-03-12 22:10:46 +01:00
Alexander Kukushkin	13c88e8b7a	Replace self-execute with multiprocessing.Process (#994 ) In addition to that transfer postmaster pid to Patroni process with the help of multiprocessing.Pipe instead of using stdin-stdout pipes. Closes https://github.com/zalando/patroni/issues/992	2019-03-12 10:40:37 +01:00
Alexander Kukushkin	4a4258fc3f	Mock external resources (#995 ) unit tests should not accidentally hit running Postgres, DCS or filesystem unless we want it explicitly.	2019-03-12 10:39:42 +01:00
Alexander Kukushkin	c64d51f79c	Better support for static etcd cluster (#986 ) if the `etcd.use_proxies` is set to true, Patroni will stick to the list of hosts specified in the `etcd.hosts` and avoid doing topology discovery. Such mode might be useful when you know that you connect to the etcd cluster via the set of proxies or when th etcd cluster has static topology.	2019-03-07 11:36:36 +01:00
Alexander Kukushkin	e670122c80	Use create_replica_methods from standby_cluster for replica bootstrap (#981 ) It might happen that the standby cluster is configured to be created and replay WAL files from the different source than when it is running not in standby mode. This is necessary to avoid writing WAL files and backups into the old place after promotion. The easiest way to achieve such behavior is passing RemoteMember object to `Postgresql.clone` method instead of the usual Member object.	2019-02-21 11:37:50 +01:00
Alexander Kukushkin	0c516de147	Create headless service associated with $SCOPE-config endpoint (#958 ) if there is no service defined k8s assumes that endpoint is orphaned and removes it. Patroni tries to create the service only in case if use_endpoints is enabled if the following cases: 1. Upon start 2. When it tries to (re-)create the config endpoint If for some reason creation of the service has failed, Patroni will retry it on every cycle of HA loop. Usually it fails due to lack of permissions and if you don't want to give such permissions to the service account used by Patroni, you can create the service explicitly in the deployment manifest.	2019-02-15 13:35:04 +01:00
Alexander Kukushkin	739329b590	Make it possible to automatically reinit the former master (#948 ) If the pg_rewind is disabled or can't be used, the former master could fail to start as a new replica due to diverged timelines. In this case, the only way to fix it is wiping the data directory and reinitializing. So far Patroni was able to remove the data directory only after failed attempt to run pg_rewind. This commit fixes it. If the `postgresql.remove_data_directory_on_diverged_timelines` is set, Patroni will wipe the data directory and reinitialize the former master automatically. Fixes: https://github.com/zalando/patroni/issues/941	2019-01-30 12:37:21 +01:00
Alexander Kukushkin	2c128520cf	Python34 compatibility (#933 ) and some other minor fixes. Closes https://github.com/zalando/patroni/issues/932	2019-01-16 14:40:05 +01:00
Alexander Kukushkin	381a5b80d2	Release 1.5.4 (#931 ) * Bump version * Update release notes * Make it possible to configure registration of Service in Consul via env variables	2019-01-15 12:14:19 +01:00
Alexander Kukushkin	71dae6a905	Optionally consider node not healthy if it is not on the latest timeline (#892 ) The latest timeline is calculated from the `/history` key in DCS. In case there is no such key or it contains some garbage we consider the node healthy. Closes https://github.com/zalando/patroni/issues/890	2019-01-15 11:16:30 +01:00
Alexander Kukushkin	e080ded44b	Make logging configurable via YAML file (#927 ) It allows changing logging settings in runtime by updating config and doing reload or sending `SIGHUP` to the Patroni process. Important! Environment configuration names related to logging were renamed and documentation accordingly updated. For compatibility reasons Patroni still accepts `PATRONI_LOGLEVEL` and `PATRONI_FORMAT`, but some other variables related to logging, which were introduced only recently (between releases), will stop working. I think it is ok, since we didn't release the new version yet and therefore it is very unlikely that somebody is using them except authors of corresponding PRs. Example of log section in the config file: ```yaml log: dir: /where/to/write/patroni/logs # if not specified, write logs to stderr file_size: 50000000 # 50MB file_num: 10 # keep history of 10 files dateformat: '%Y-%m-%d %H:%M:%S' loggers: # increase log verbosity for etcd.client and urllib3 etcd.client: DEBUG urllib3: DEBUG ```	2019-01-15 08:42:13 +01:00
Alexander Kukushkin	994863c18d	Refactor wait_for_user_backends_to_close method (#917 ) 1. Log only debug level messages on any kind of error 2. Update regexp for matching postgres aux processes to make it compatible with postgres 11 Fixes https://github.com/zalando/patroni/issues/914	2019-01-14 14:55:45 +01:00
Dmitry Dolgov	11f7ceb521	Do not check types of standby_cluster configuration (#924 ) Simply allow valid keys	2019-01-14 14:16:15 +01:00
Alexander Kukushkin	f1d7ccf36e	Make sure we refresh session at least once per HA loop (#880 ) Fixes https://github.com/zalando/patroni/issues/879	2018-12-03 16:35:14 +01:00
Alexander Kukushkin	9bf074acfb	Compatibility with python3 (#883 ) Change of `loop_wait` was causing Patroni to disconnect from zookeeper and never reconnect back. The error was happening only with python3 due to a difference in implementation of `select.select` function.	2018-11-30 11:40:34 +01:00
Alexander Kukushkin	fb01aaebc5	Compatibility with kazoo-2.6.0 (#872 ) Recently 2.6.0 was release which changes the way how create_connection method is called. Before it was passing two arguments, and in the new version all argument names are specified explicitly.	2018-11-19 14:26:20 +01:00
Alexander Kukushkin	0f666e69f3	Prefix system tables, views and functions with pg_catalog (#845 ) and implement missing unit tests	2018-11-01 16:17:40 +01:00
Alexander Kukushkin	2efd97baab	Permanent replication slots (#819 ) Permanent replication slots are preserved on failover/switchover, that is Patroni on the new primary will create configured replication slots right after doing promote. Slots could be configured with the help of `patronictl edit-config`. The initial configuration could be also done in the `bootstrap.dcs` ```yaml slots: permanent_physical_1: type: physical permanent_logical_1: type: logical database: foo plugin: pgoutput ``` It is the responsibility of the operator to make sure that there are no clashes in names between replication slots automatically created by Patroni for members and permanent replication slots. Closes https://github.com/zalando/patroni/issues/656	2018-10-31 11:37:42 +01:00
Alexander Kukushkin	f70edefc65	A few bugfixes in the "standby cluster" workflow (#823 ) * Always run `pg_rewind` against the remote master * Always use the remote master as the source when "recovering" stopped standby leader * Use remote master as the source when "recovering" the node in the unhealthy cluster * Use the local dbname as the fallback when doing `pg_rewind` from the remote master * `no_replication_slot` is the allowed key in the `RemoteMember` object * Make it possible to "bootstrap" the new `standby_cluster` with existing (and valid) data directory. There is one prerequisite though, there should be no `patroni.dynamic.json` file in it!	2018-10-09 13:30:48 +02:00
Alexander Kukushkin	76d1b4cfd8	Minor fixes (#808 ) * Use `shutil.move` instead of `os.replace`, which is available only from 3.3 * Introduce standby-leader health-check and consul service * Improve unit tests, some lines were not covered * rename `assertEquals` -> `assertEqual`, due to deprecation warning	2018-09-19 16:32:33 +02:00
Pavel Kirillov	2e9cb412e4	Register service in consul (#802 ) Кegister service 'scope_name' with tag 'master' or 'replica' example with scope 'pgsql-pgpi' ```[root@pgpi1 ~]# host -t SRV pgsql-pgpi.service.consul. 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi1.node.dc.consul. pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi2.node.dc.consul. [root@pgpi1 ~]# host -t SRV master.pgsql-pgpi.service.consul. 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: master.pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi2.node.dc.consul. [root@pgpi1 ~]# host -t SRV replica.pgsql-pgpi.service.consul. 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: replica.pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi1.node.dc.consul.``` Fixes: https://github.com/zalando/patroni/issues/771	2018-09-07 15:17:56 +02:00
Dmitry Dolgov	dd7c3c349f	[WIP] Standby cluster implementation (#679 ) Implementation of "standby cluster" described in #657. Standby cluster consists of a "standby leader", that replicates from a "remote master" (which is not a part of current patroni cluster and can be anywhere), and cascade replicas, that replicate from the corresponding standby leader. "Standby leader" behaves pretty much like a regular leader, which means that it holds a leader lock in DSC, in case if disappears there will be an election of a new "standby leader". One can define such a cluster using the section "standby_cluster" in patroni config file. This section provides parameters for standby cluster, that will be applied only once during bootstrap and can be changed only through DSC.	2018-09-07 10:10:56 +02:00
Alexander Kukushkin	4ca8a6e506	Make retries of calls to DCS consistent across implementations (#805 ) in addition to that do a small refactoring of zookeeper and consul and try to improve the stability of AT	2018-09-06 08:37:26 +02:00
wilfriedroset	0136f252ab	Add patronictl -k/--insecure flag and suport for restapi cert (#790 ) Fixes https://github.com/zalando/patroni/issues/785	2018-08-29 16:08:13 +02:00
Alexander Kukushkin	90cf930036	Refactor REST API health-checks (#779 ) Make it more readable and easy to understand. Mostly it is needed to implement https://github.com/zalando/patroni/issues/772	2018-08-29 11:35:22 +02:00
Alexander Kukushkin	87e9aab04c	Improve tests (#778 ) * Implement missing unit-tests * Add acceptance tests for ISSUE #776 * Update list of classifiers, keywords and authors	2018-08-29 11:29:37 +02:00
Alexander Kukushkin	0c1ae6fbeb	Respond 200 to master health-check only if update_lock was successful (#713 ) If Patroni gets partitioned it starts receiving stale information from DCS. We can't use this information to determine that we have the leader key. Instead, we will record in Ha object the actual state of acquire/update lock and report as a leader only if it was successful. P.S. despite responding with 200 on `GET /master` postgres was still running read-only.	2018-08-03 17:00:01 +02:00
Alexander Kukushkin	8a3b78ca7b	Rest api thread can raise an exception during shutdown (#711 ) catch it and report	2018-06-14 13:17:50 +02:00
Dmitry Dolgov	f0d23b0b14	Merge pull request #706 from zalando/feature/rename-create-replica-method Rename create_replica_method to create_replica_methods	2018-06-12 14:16:54 +02:00
Alexander Kukushkin	aadd39b0a4	Do crash recovery only when we sure that postgres was running as master (#707 ) pg_controldata reports in this case: * 'in production' * 'shutting down' * 'in crash recovery'	2018-06-12 14:09:09 +02:00
Henning Jacobs	2537147810	#694 handle configuration error (#695 ) It is possible to change a lot of parameters in runtime (including `restapi.listen`) by updating Patroni config file and sending SIGHUP to Patroni process. If something was misconfigured it was throwing a weird exception and breaking `restapi` thread. This PR improves friendliness of error message and avoids breaking of `restapi`.	2018-06-12 14:08:38 +02:00
Alexander Kukushkin	e939304001	Take and apply some parameters from controldata when starting as replica (#703 ) * Take and apply some parameters from controldata when starting as replica https://www.postgresql.org/docs/10/static/hot-standby.html#HOT-STANDBY-ADMIN There is set of parameters which value on the replica must be not smaller than on the primary, otherwise replica will refuse to start: * max_connections * max_prepared_transactions * max_locks_per_transaction * max_worker_processes It might happen that values of these parameters in the global configuration are not set high enough, what makes impossible to start a replica without human intervention. Usually it happens when we bootstrap a new cluster from the basebackup. As a solution to this problem we will take values of above parameters from the pg_controldata output and in case if the values in the global configuration are not high enough, apply values taken from pg_controldata and set `pending_restart` flag.	2018-06-12 14:04:32 +02:00
Alexander Kukushkin	e405e4e03c	Workaround to sporadic unit-test failures (#696 ) Fixes https://github.com/zalando/patroni/issues/691	2018-06-12 14:00:10 +02:00
erthalion	d037aa8afd	Rename create_replica_method to create_replica_methods To make it clear that it's actually an array	2018-06-12 11:33:13 +02:00
Alexander Kukushkin	856552bd61	Sync replication slots and verify sysid after coming out of pause (#678 ) Fixes https://github.com/zalando/patroni/issues/568 and https://github.com/zalando/patroni/issues/674	2018-05-18 12:18:49 +02:00
Oleksii Kliukin	4ce539ba1b	Allow options to the basebackup built-in method. (#604 ) Options should be specified in the basebackup section, which is optional.	2018-05-18 12:18:35 +02:00
Oleksii Kliukin	1043376e6b	Do not exit when encountering invalid system ID. (#669 ) Do not exit when the cluster system ID is empty or the one that doesn't pass the validation check. In that case, the cluster most likely needs a reinit; mention it in the result message. Avoid terminating Patroni, as otherwise reinit cannot happen.	2018-05-18 11:48:15 +02:00
Alexander Kukushkin	ed479fe585	Don't demote master if failed to update leader key in pause (#668 ) Fixes https://github.com/zalando/patroni/issues/659	2018-05-18 11:19:56 +02:00
Alexander Kukushkin	5ce18a8045	Improve protection of DCS being accidentally wiped (#680 ) We already have a lot of logic in place to prevent failover in such case and restore all keys, but an accidental removal of `/config` key was effectively switching off pause mode for 1 cycle of HA loop.	2018-05-18 11:18:58 +02:00
Alexander Kukushkin	5296336f4a	BUGFIX: postmaster start can fail if pid from postmaster.pid is alive (#681 ) Upon start postmaster process performs various safety checks if there is a postmaster.pid file in the data directory. Although Patroni already detected that the running process corresponding to the postmaster.pid is not a postmaster, the new postmaster might fail to start, because it thinks that postmaster.pid is already locked. Important!!! Unlink of postmaster.pid isn't an option in this case, because it has a lot of nasty race conditions. Luckily there is a workaround to this problem, we can pass the pid from postmaster.pid in the `PG_GRANDPARENT_PID` environment variable and postmaster will ignore it. More likely to hit such problem if you run Patroni and postgres in the docker container.	2018-05-18 11:18:27 +02:00
Alexander Kukushkin	84f29caf92	Fix race condition in poll_failover_result (#658 ) It didn't affect directly neither failover nor switchover, but in some rare cases it was reporting it as a success too early, when the former leader released the lock: `Failed over to "None" instead of "desired-node"` In addition to that this commit improves logs and status messages by differentiating between failover and switchover.	2018-04-16 17:45:05 +02:00
Alexander Kukushkin	d78790b194	Abort start if attaching to running postgres and cluster not initiazlied (#661 ) Patroni can attach itself to an already running PostgreSQL instance. If that is the first instance "seen" in the given cluster, Patroni for that instance will create the initialize key, grab the leader key and, if the instance is running a replica, promote. Because of this behavior, when a cluster with a master and one or more replicas gets Patroni for each node, it is imperative to start running Patroni on the master node before getting to the replicas. This commit changes such weird behavior and will abort Patroni start if there is no initialize key in DCS and postgres is running as a replica. Closes https://github.com/zalando/patroni/issues/655	2018-04-16 17:32:26 +02:00
Alexander Kukushkin	3afd26101b	Single user mode was waiting for user input and never finish (#634 ) Regression was introduced in https://github.com/zalando/patroni/pull/576	2018-03-02 22:22:43 +01:00
Alexander Kukushkin	c04e7a1798	Write bootstrap.pg_hba into a pg_hba.conf after custom bootstrap (#632 ) Fixes https://github.com/zalando/patroni/issues/631	2018-02-26 18:48:56 +01:00

1 2 3 4 5 ...

553 Commits