patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-28 02:20:04 +00:00

Author	SHA1	Message	Date
Victor Sudakov	d4c6987f78	First variant of notes on PostgreSQL major upgrades. (#1634 ) [skip ci]	2020-07-31 15:43:02 +02:00
Alexander Kukushkin	3341c898ff	Add Etcd v3 protocol support via api gRPC-gateway (#1162 ) The only python-etcd3 client working directly via gRPC still supports only a single endpoint, which is not very nice for high-availability. Since Patroni is already using a heavily hacked version of python-etcd with smart retries and auto-discovery out-of-the-box, I decided to enhance the existing code with limited support of v3 protocol via gRPC-gateway. Unfortunately, watches via gRPC-gateway requires us to open and keep the second connection to the etcd. Known limitations: * The very minimal supported version is 3.0.4. On earlier versions transactions don't work due to bugs in grpc-gateway. Without transactions we can't do atomic operations, i.e. leader locks. * Watches work only starting from 3.1.0 * Authentication works only starting from 3.3.0 * gRPC-gateway does not support authentication using TLS Common Name. This is because gRPC-proxy terminates TLS from its client so all the clients share a cert of the proxy: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/authentication.md#using-tls-common-name	2020-07-31 14:33:40 +02:00
Alexander Kukushkin	bfbc4860d5	PoC: Patroni on pure RAFT (#375 ) * new node can join the cluster dynamically and become a part of consensus * it is also possible to join only Patroni cluster (without adding the node to the raft), just comment or remove `raft.self_addr` for that * when the node joins the cluster it is using values from `raft.partner_addrs` only for initial discovery. * It is possible to run Patroni and Postgres on two nodes plus one node with `patroni_raft_controller` (without Patroni and Postgres). In such setup one can temporarily lose one node without affecting the primary.	2020-07-29 15:34:44 +02:00
Robert Edström	c42d507b82	Add consul service_tags configuration field (#1625 ) This is useful for dynamic service discovery, for example by load balancers.	2020-07-28 12:07:24 +02:00
Victor Sudakov	20bc5ed684	Update README.rst (#1622 ) [ci skip]	2020-07-28 08:36:02 +02:00
Robert Edström	5cc35ec855	Documented required Consul policy (#1626 ) [ci skip] Close #1615	2020-07-28 08:34:37 +02:00
ksarabu1	8a62999eaa	replica & async rest API health check enhancement (#1599 ) - ``GET /replica?lag=<max-lag>``: replica check endpoint. - ``GET /asynchronous?lag=<max-lag>`` or ``GET /async&lag=<max-lag>``: asynchronous standby check endpoint. Checks replication latency and returns status code 200 only when the latency is below a specified value. The key leader_optime from DCS is used for the leader WAL position and compute latency on the replica for performance reasons. Please note that the value in leader_optime might be a couple of seconds old (based on loop_wait). Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>	2020-07-15 10:36:48 +02:00
Alexander Kukushkin	db8c634db3	Create readiness and liveness endpoints (#1590 ) They could be useful to eliminate "unhealthy" pods from subsets addresses when the K8s service with label selectors are used. Real-life example: the node where the primary was running has failed and being shutdown and Patroni can't update (remove) the role label. Therefore on OpenShift the leader service will have two pods assigned, one of them is a failed primary. With the readiness probe defined, the failed primary pod will be excluded from the list.	2020-07-10 14:08:39 +02:00
Alexander Kukushkin	cbff544b9c	Implement patronictl flush switchover (#1554 ) It includes implementing the `DELETE /switchover` REST API endpoint. Close https://github.com/zalando/patroni/issues/1376	2020-06-25 16:27:57 +02:00
Alexander Kukushkin	1b2491cedf	Check basic-auth indepandantly from client certificate (#1556 ) this is absolutely valid use-case	2020-06-05 09:25:33 +02:00
Tomáš Pospíšek	6406b39b77	add config section keys, improve verify_client documentation (#1549 )	2020-06-03 09:55:21 +02:00
Alexander Kukushkin	fe23d1f2d0	Release 1.6.5 (#1503 ) * bump version * update release notes * implement missing unit-tests and format code.	2020-04-23 16:02:01 +02:00
ksarabu1	5fa912f8fa	Make max history timelines in DCS configurable (#1491 ) Close https://github.com/zalando/patroni/issues/1487	2020-04-17 16:27:38 +02:00
ksarabu1	e3335bea1a	Master stop timeout (#1445 ) ## Feature: Postgres stop timeout Switchover/Failover operation hangs on signal_stop (or checkpoint) call when postmaster doesn't respond or hangs for some reason(Issue described in [1371](https://github.com/zalando/patroni/issues/1371)). This is leading to service loss for an extended period of time until the hung postmaster starts responding or it is killed by some other actor. ### master_stop_timeout The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled. When set to > 0 and the synchronous_mode is enabled, Patroni sends SIGKILL to the postmaster if the stop operation is running for more than the value set by master_stop_timeout. Set the value according to your durability/availability tradeoff. If the parameter is not set or set <= 0, master_stop_timeout does not apply.	2020-04-15 12:18:49 +02:00
Casey Allen Shobe	0e4d7f01f2	Correct documentation for consul.host (#1438 ) Close #1434	2020-04-01 15:50:50 +02:00
Michail Nikolaev	795efc4548	Note about possible data loss while canceling postgres backends. (#1414 ) Note about possible data loss while canceling postgres backends. Related to zalando#1412	2020-03-10 12:08:01 +01:00
Steven De Coeyer	0fa70e8d88	Updates README (#1394 ) We need to ensure to enable etcd v2, cfr. https://github.com/zalando/patroni/issues/1270 and https://github.com/zalando/patroni/issues/1163.	2020-02-20 10:15:58 +01:00
damien clochard	e759a3f2ef	[doc] add PATRONICTL_CONFIG_FILE env var (#1397 )	2020-02-20 10:14:36 +01:00
Alexander Kukushkin	80ce61876e	Don't create permanent physical slot with name of the primary (#1392 ) It is a regular issue that primary is recycling WALs when one of the replicas is down for a long time. So far there were only two solutions for such a problem and both of them are not perfect: 1. Increase `wal_keep_segments`, but it is hard to guess the good value. 2. Use continuous archiving and PITR, but it is not always possible. This PR is introducing the way to solve the problem for static clusters, with a fixed number of nodes and names that never change. You just need to list the names of all nodes in the `slots` so the primary will not remove the slot when the node is down (not registered in DCS). Of course, the primary will not create the permanent slot which is matching its own name. Usage example: let's assume you have a cluster with nodes named abc1, abc2, and abc3. You have to run `patronictl edit-config` and put the following snippet into the configuration: ```yaml slots: abc1: type: physical abc2: type: physical abc3: type: physical ``` If the node abc2 is the primary, it will always create slots for abc1 and abc3 even if they are not running, but will not create slot abc2. Other nodes will behave the same. Close #280	2020-02-20 10:07:43 +01:00
Alexander Kukushkin	dc1966e3bc	Release 1.6.4 (#1380 ) * Bump version * Update release notes	2020-01-27 14:15:21 +01:00
Kostiantyn Nemchenko	a2a5cc2f71	Disable serfHealth Consul check (#1364 ) Fixes #1362 and #1363.	2020-01-15 12:37:35 +01:00
Alexander Kukushkin	08d6e5e50e	BUGFIX: don't leak password when running pg_rewind (#1321 ) In addition to that: * enforce security settings from `postgresql.authention` * update release notes * bump version * close https://github.com/zalando/patroni/issues/1320	2019-12-05 18:19:38 +01:00
Alexander Kukushkin	b542e4b5f0	Release 1.6.2 (#1319 ) * update release notes * bump version	2019-12-05 11:36:17 +01:00
Igor Yanchenko	49d3968c23	Make it possible to configure log level for exception tracebacks (#1311 ) If you set `log.traceback_level=DEBUG`, the tracebacks will be visible only when `log.level=DEBUG`. The default behavior remains the same.	2019-12-03 15:13:42 +01:00
Alexander Kukushkin	35a2ccf8a8	A couple of small fixes in docs (#1285 ) * fix formatting in release notes * fix patronictl reinit command name	2019-11-21 10:39:28 +01:00
Alexander Kukushkin	2f9a48fae4	Release 1.6.1 (#1281 ) * Bump version to 1.6.1 * Update release notes	2019-11-15 12:48:00 +01:00
Alexander Kukushkin	c1adbafbc5	Improve documentation (#1244 ) * document tags * move dynamic configuration out of `bootstrap.dcs` * document REST API endpoints	2019-11-13 16:10:28 +01:00
Feike Steenbergen	d2d49907ad	Correctly document PATRONI_KUBERNETES_PORTS (#1266 ) The previous documentation was wrong and will throw the following error when used: Exception when parsing list {[{"name": "postgresql", "port": 5432}]} When removing the surrounding braces, the error goes away and the endpoint is updated with the correct Port name.	2019-11-05 10:09:24 +01:00
Alexander Kukushkin	b666f5e4ed	Refactor Patroni REST API communication (#1197 ) * make it possible to use client certificates with REST API * define a separate PatroniRequest class which handles all communication * refactor patronictl to use the new class * make Ha to use the new class instead of calling requests.get. The old call wasn't taking into account certificates and basic-auth Close #898	2019-10-11 10:16:33 +02:00
wilfriedroset	ee678f61d7	Fix typos in documentation (#1202 )	2019-10-07 10:34:43 +02:00
Jecho	a8c32a4032	Fix minor typo in documentation #1212 Close #1211	2019-10-07 10:14:15 +02:00
geokala	178e565fe4	Update cacert documentation for use with REST API (#1190 ) Fixes #1188	2019-09-24 13:04:07 +02:00
Jonathan S. Katz	a88704e792	Allow for certificate-based authentication from Patroni PostgreSQL accounts (#1134 ) The two principal features this introduces: 1. Provide the Patroni PostgreSQL management accounts (superuser, replication, rewind) to be able to authenticate using certificate-based authentication 2. Allow the user to specify the `sslmode` they wish to connect as. ### References - [PostgreSQL Certificate Based Authentication](https://www.postgresql.org/docs/current/auth-cert.html) - [libpq connection parameters](https://www.postgresql.org/docs/current/libpq-connect.html) which are used by psycopg2 - [SSL Modes](https://www.postgresql.org/docs/current/libpq-ssl.html)	2019-09-17 12:14:49 +02:00
Alexander Kukushkin	278bf9852b	Release 1.6.0 (#1131 ) * Implement missing tests and do a few minor fixes * Bump version to 1.6.0 * Update release notes	2019-08-05 15:08:04 +02:00
Don Seiler	5cb7d1bdc1	Grammar fixes for SETTINGS.rst (#1106 )	2019-07-26 09:34:42 +02:00
Jan Tomsa	7d1a5cad03	Allow to specify consul consistency mode (#1094 ) Allow users to specify consul consistency mode. This option will be passed to the Consul client as kwargs https://github.com/zalando/patroni/blob/master/patroni/dcs/consul.py#L213. The library will then enforce the selected consistency level https://python-consul.readthedocs.io/en/latest/#consul More about consistency mode here https://www.consul.io/api/features/consistency.html	2019-07-01 11:02:26 +02:00
Alexander Kukushkin	37f03790cc	Implement two-step logging (#1080 ) A few times we observed that Patroni HA loop was blocked for a few minutes due to not being able to write logs to stderr. This is a very rare condition which we hit so far only on k8s. This commit makes Patroni resilient to such kind of problems. All log messages first are written into the in-memory queue and later they are asynchronously flushed into the stderr or file from a separate thread. The maximum queue size is configurable and the default value is 1000. This should be enough to keep more than one hour of log messages with default settings and when Patroni cluster operates normally (without big issues). In case if we hit the maximum size of the queue further logs will be discarded until the queue size will be reduced. The number of discarded messages will be reported into the log later. In addition to that, the number of non-flushed and discarded messages (if there are any), will be reported via Patroni REST API as: ```json "logger_queue_size": X, "logger_records_lost": Y` ```	2019-06-13 14:18:49 +02:00
Kostiantyn Nemchenko	dcd605ebc8	Update existing_data.rst (#1071 )	2019-06-11 15:15:48 +02:00
Alexander Kukushkin	bba9066315	Make it possible to run pg_rewind without superuser on pg11+ (#1035 ) * expose the current patroni version in DCS * expose `checkpoint_after_promote` flag in DCS as an indicator that pg_rewind could be safely executed * other nodes will wait until this flag is set instead of connecting as superuser and issuing the CHECKPOINT * define `postgresql.authention.rewind` with credentials for pg_rewind in patroni configuration files. * create user for pg_rewind if postgres is 11+ * grant execute on functions required for pg_rewind to rewind user	2019-05-02 14:07:26 +02:00
Alexander Kukushkin	f0b784fe7f	Manage pg_ident.conf with Patroni (#1037 ) This functionality works similarly to the `pg_hba`: If the `postgresql.pg_ident` is defined in the config file or DCS, Patroni will write its value to pg_ident.conf, however, if `postgresql.parameters.ident_file` is defined, Patroni will assume that pg_ident is managed from outside and not update the file.	2019-04-23 16:16:53 +02:00
Alexander Kukushkin	7c0c9599fc	Remove psycopg2 from requirements (#1023 ) Recently released psycopg2 split into two different packages, psycopg2, and psycopg2-binary which could be installed at the same time into the same place on the filesystem. In order to decrease dependency hell problem, we let a user choose how to install psycopg2. There are a few options available and it is reflected in the documentation. This PR also changes the following behavior: * `pip install patroni` will fail if psycopg2 is not installed * Patroni will check psycopg2 upon start and fail if it can't be found or outdated. Closes https://github.com/zalando/patroni/issues/1021	2019-04-15 14:30:16 +02:00
Alexander Kukushkin	6909ce0c7a	Release 1.5.6 (#1020 ) * Update release notes * Bump version	2019-04-03 14:44:31 +02:00
Alexander Kukushkin	e38fe78b56	Fix callbacks behavior (mostly for standby cluster) (#998 ) First of all, this patch changes the behavior of `on_start`/`on_restart` callbacks, they will be called only when postgres is started or restarted without role changes. In case if the member is promoted or demoted only the `on_role_change` callback will be executed. `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument. Before that `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument. In addition to that, the REST API will return standby_leader role for the leader of the standby cluster. Closes https://github.com/zalando/patroni/issues/988	2019-03-29 10:28:07 +01:00
Alexander Kukushkin	c64d51f79c	Better support for static etcd cluster (#986 ) if the `etcd.use_proxies` is set to true, Patroni will stick to the list of hosts specified in the `etcd.hosts` and avoid doing topology discovery. Such mode might be useful when you know that you connect to the etcd cluster via the set of proxies or when th etcd cluster has static topology.	2019-03-07 11:36:36 +01:00
Alexander Kukushkin	c6e70a9910	Release 1.5.5 (#979 ) * Bump version * Update release notes	2019-02-15 16:14:39 +01:00
Alexander Kukushkin	739329b590	Make it possible to automatically reinit the former master (#948 ) If the pg_rewind is disabled or can't be used, the former master could fail to start as a new replica due to diverged timelines. In this case, the only way to fix it is wiping the data directory and reinitializing. So far Patroni was able to remove the data directory only after failed attempt to run pg_rewind. This commit fixes it. If the `postgresql.remove_data_directory_on_diverged_timelines` is set, Patroni will wipe the data directory and reinitialize the former master automatically. Fixes: https://github.com/zalando/patroni/issues/941	2019-01-30 12:37:21 +01:00
Étienne M	bd2c54581a	Add ETCD_(PROTOCOL\|USERNAME\|PASSWORD) env variables (#947 ) Fix #944	2019-01-30 12:36:50 +01:00
Maxim Ivanov	f0b12b7e2e	Document create_replicas_methods in standby_cluster section (#939 ) Fixes https://github.com/zalando/patroni/issues/935	2019-01-30 12:36:24 +01:00
Étienne M	93d157dea3	Document how to start Patroni with an existing data directory (#918 )	2019-01-30 12:35:57 +01:00
Alexander Kukushkin	381a5b80d2	Release 1.5.4 (#931 ) * Bump version * Update release notes * Make it possible to configure registration of Service in Consul via env variables	2019-01-15 12:14:19 +01:00

1 2 3 4

191 Commits