patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-27 18:20:05 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	dc9ff4cb8a	Release 2.1.2 (#2136 ) * Implement missing unit-tests * Bump version * Update release notes	2021-12-03 15:49:57 +01:00
Alexander Kukushkin	fce889cd04	Compatibility with psycopg 3.0 (#2088 ) By default `psycopg2` is preferred. The `psycopg>=3.0` will be used only if `psycopg2` is not available or its version is too old.	2021-11-19 14:32:54 +01:00
Alwyn Davis	14bb28c349	Allow setting ACLs for znodes in Zookeeper (#2086 ) Add a configuration option (`set_acls`) for Zookeeper DCS so that Kazoo will apply a default ACL for each znode that it creates. The intention is to improve security of the znodes when a single Zookeeper cluster is used as the DCS for multiple Patroni clusters. Zookeeper [does not apply an ACL to child znodes](https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#sc_ZooKeeperAccessControl), so permissions can't be set at the `scope` level and then be inherited by other znodes that Patroni creates. Kazoo instead [provides an option for configuring a default_acl](https://kazoo.readthedocs.io/en/latest/api/client.html#kazoo.client.KazooClient.__init__) that will be applied on node creation. Example configuration in Patroni might then be: ``` zookeeper: set_acls: CN=principal1: [ALL] CN=principal2: - READ ```	2021-10-28 09:59:45 +02:00
Michael Banck	e28557d2f0	Fix sphinx build. (#2080 ) Sphinx' add_stylesheet() has been deprecated for a long time and got removed in recent versions of sphinx. If available, use add_css_file() instead. Close #2079.	2021-10-07 16:07:41 +02:00
Farid Zarazvand	34db0bba16	PostgreSQL v14 is supported since v2.1.0 (#2078 )	2021-10-07 16:07:00 +02:00
Kostiantyn Nemchenko	3616906434	Add sslcrldir connection parameter support (#2068 ) This allows setting the `sslcrldir` connection parameter available since PostgreSQL 14.	2021-10-07 16:04:27 +02:00
Alexander Kukushkin	93efa91bbd	Release 2.1.1 (#2039 ) * Update release notes * Bump version * Improve unit-test coverage	2021-08-19 15:44:37 +02:00
Aron Parsons	313adb61ec	Make the CA bundle configurable for in-cluster Kubernetes config (#2025 ) Close https://github.com/zalando/patroni/issues/1758	2021-08-17 16:15:39 +02:00
DavidPavlicek	195b8bf049	Support for ETCD SRV name suffix (#2029 ) Add support for ETCD SRV name suffix as per description in ETCD dosc: > The -discovery-srv-name flag additionally configures a suffix to the SRV name that is queried during discovery. Use this flag to differentiate between multiple etcd clusters under the same domain. For example, if discovery-srv=example.com and -discovery-srv-name=foo are set, the following DNS SRV queries are made: > > _etcd-server-ssl-foo._tcp.example.com > _etcd-server-foo._tcp.example.com All test passes, but not been tested on the live ETCD system yet... Please, take a look and send feedback. Resolves #2028	2021-08-13 15:49:01 +02:00
Julien Riou	cb80f7ee06	docs: fix typo in 2.1.0 release note (#1999 )	2021-07-09 13:21:53 +02:00
Alexander Kukushkin	f2309abc87	Release 2.1.0 (#1998 ) * bump version * update release notes	2021-07-06 10:19:22 +02:00
Christian Clauss	75e52226a8	Fix typos discovered by codespell (#1997 )	2021-07-06 10:01:30 +02:00
Alexander Kukushkin	62aa1333cd	Implemented allowlist for REST API (#1959 ) If configured, only IPs that matching rules would be allowed to call unsafe endpoints. In addition to that, it is possible to automatically include IPs of members of the cluster to the list. If neither of the above is configured the old behavior is retained. Partially address https://github.com/zalando/patroni/issues/1734	2021-07-05 09:43:56 +02:00
Arman Jafari Tehrani	e48df9987d	Add health check on user defined tags (#1964 ) Close #1958	2021-06-23 08:30:10 +02:00
Alexander Kukushkin	03e71b6717	The /leader endpoint returns 200 if node holds the lock (#1917 ) Promoting the standby cluster requires updating load-balancer health checks, which is not very convenient and easy to forget. In order to solve it, we change the behavior of the `/leader` health-check endpoint. It will return 200 without taking into account whether PostgreSQL is running as the primary or the standby_leader.	2021-06-22 08:21:29 +02:00
melrifa	6d6b504cb8	Add support for patroni replication user socket connection (#1865 ) Close #1866	2021-04-20 09:43:05 +02:00
Alexander Kukushkin	c7173aadd7	Failover logical slots (#1820 ) Effectively, this PR consists of a few changes: 1. The easy part: In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots. In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster. 2. The tricky part: On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()` function. When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward. 3. Additional requirements: In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots. 4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use. Compatibility. The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary. The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed. Close: https://github.com/zalando/patroni/issues/1749	2021-03-25 16:18:23 +01:00
Alexander Kukushkin	b341ab2e2f	Release 2.0.2 (#1851 ) * bump version * update release notes * implement missing unit-test	2021-02-22 12:28:19 +01:00
Mark Mercado	32daef3939	Fixing grammar and typos (#1845 )	2021-02-18 10:28:38 +01:00
krishna	b3dc765e6d	Choose synchronous nodes based on replication lag (#1786 ) This commit makes it possible to configure the maximum lag (`maximum_lag_on_syncnode`) after which Patroni will "demote" the node from synchronous and replace it with another node. The previous implementation always tried to stick to the same synchronous nodes (even if they are not optimal ones).	2021-02-02 15:45:02 +01:00
Kaarel Moppel	9d7d4423e3	Docs: document the need for special configuration for symlinked pg_wal (#1818 ) If an existing instance was configured with WAL residing outside of PGDATA then currently a 'reinit' would lose such symlinks. So add some bits of information on that to draw attention to this cornercase issue and also add the --waldir option to the sample `postgresql.basebackup` configuration sections to increase visibility. Discussion: https://github.com/zalando/patroni/issues/1817	2021-02-02 11:50:35 +01:00
Doug Whitfield	9c5c0e71c2	Add a new section to end on testing HA solution (#1827 ) Thanks to @ants for the suggestion and some tips on testing via slack.	2021-02-02 11:49:50 +01:00
Jonathan S. Katz	accba93cbe	Add support for encrypted TLS keys for REST API (#1825 ) The Python SSL library allows for the inclusion of a password in its "load_cert_chain" function when setting up a SSLContext[1]. This allows for loading an encrypted key file in PEM representation to be loaded into the certificate chain. This commit adds the optional "keyfile_password" parameter to the REST API block of configuration so that Patroni can load in encrypted private keys when establishing its TLS socket. This also adds the corollary "PATRONI_RESTAPI_KEYFILE_PASSWORD" environmental variable, which has the same effect. [1] https://docs.python.org/3/library/ssl.html#ssl.SSLContext.load_cert_chain	2021-02-02 11:47:09 +01:00
Gunnar "Nick" Bluth	ba4ab58d40	Support cipher suite limitation for REST API (#1824 ) Many environments require a limitation of allowed TLS cipher suites / levels. See e.g. the german BSI requirements: https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/TechGuidelines/TG02102/BSI-TR-02102-2.pdf?__blob=publicationFile&v=10 This implements an optional "ciphers" setting that - if given - enforces the ciphers on the REST API socket. See also #1730.	2021-01-27 13:53:28 +01:00
Pascal GOUHIER	53b525202c	Add existing_data to toc (#1790 )	2020-12-14 15:53:02 +01:00
Kaarel Moppel	464019eaf7	Mention currently supported PostgreSQL versions (#1777 )	2020-12-14 15:51:06 +01:00
Alexander Kukushkin	1530ed0b9c	Switch to GH actions (#1778 ) it allows up to 20 parallel builds	2020-12-04 21:52:34 +01:00
James Coleman	d7f579ee61	Feature: ability to ignore externally managed replication slots (#1742 ) There are sometimes good reasons to manage replication slots externally to Patroni. For example, a consumer may wish to manage its own slots (so that it can more easily track when a failover has a occurred and whether it is ahead of or behind the WAL position on the new primary). Additionally tooling like pglogical actually replicates slots to all replicas so that the current position can be maintained on failover targets (this also aids consumers by supplying primitives so that they can verify data hasn't been lost or a split brain occurred relative to the physical cluster). To support these use cases this new feature allows configuring Patroni to entirely ignore sets of slots specified by any subset of name, database, slot type, and plugin.	2020-11-24 11:45:14 +01:00
James Coleman	acf512712e	Make it easier to develop locally (#1748 ) Previously the only documentation for how to run tests was the implementation in the Travis configuration file. Here we add instructions as well as move development dependencies to an easily used and shared (with Travis config) separate requirements.dev.txt file.	2020-11-24 08:29:36 +01:00
Alexander Kukushkin	311e5ccc5b	Release 2.0.1 (#1722 ) * Bump version * Update release notes	2020-10-01 15:18:53 +02:00
Kostiantyn Nemchenko	00cc62726d	Add sslpassword connection parameter support (#1721 ) This PR improves compatibility with PostgreSQL 13 by adding one more connection parameter `sslpassword`. Closes #1719	2020-10-01 14:37:40 +02:00
Alexander Kukushkin	885d226dac	Add support of raft bind_add and password (#1713 ) Close https://github.com/zalando/patroni/issues/1705	2020-09-28 11:05:07 +02:00
Alexander Kukushkin	0a1f389686	Release 2.0.0 (#1680 ) * update release notes * bump version * change the default alignment in patronictl table output to `left` * add missing tests * add missing pieces to the documentation	2020-09-02 15:35:04 +02:00
Floris van Nee	98f50423ca	Add support for configuration directories (#1669 ) (#1671 ) It is now also possible to point the configuration path to a directory instead of a file. Patroni will find all yml files in the directory and apply them in sorted order Close https://github.com/zalando/patroni/issues/1669	2020-09-02 13:57:22 +02:00
Sergey Dudoladov	950eff27ad	Optional fencing script (pre_promote) (#1099 ) Call a fencing script after acquiring the leader lock. If the script didn't finish successfully - don't promote but remove leader key Close https://github.com/zalando/patroni/issues/1567	2020-09-01 07:50:39 +02:00
Kostiantyn Nemchenko	918a57fe0c	Add no_params option for custom bootstrap method (#1664 ) Close #1475	2020-08-28 08:23:00 +02:00
Kostiantyn Nemchenko	48aa0ba61b	Add SSL support for ZooKeeper (#1662 ) Close #1658	2020-08-28 08:22:15 +02:00
Yogesh Sharma	62463db5e2	Add support for user defined HTTP header to Patroni REST API response (#1645 ) Close #1644	2020-08-26 17:37:02 +02:00
Victor Sudakov	35c3fd37a1	Security of Patroni (#1655 ) Close #1636	2020-08-26 16:42:33 +02:00
Alexander Kukushkin	7bf60b64b0	Compatibility with PostgreSQL 13 (#1654 ) So far Patroni was enforcing the same value of `wal_keep_segments` on all nodes in the cluster. If the parameter was missing from the global configuration it was using the default value `8`. In pg13 beta3 the `wal_keep_segments` was renamed to the `wal_keep_size` and it broke Patroni. If `wal_keep_segments` happened to be present in the configuration for pg13, Paroni will recalculate the value to `wal_keep_size` assuming that the `wal_segment_size` is 16MB. Sure, it is possible to get the real value of `wal_segment_size` from pg_control, but since we are dealing with the case of misconfiguration it is not worse time spend on it.	2020-08-17 10:45:02 +02:00
Alexander Kukushkin	23dcfaab49	Make it possible to bypass kubernetes service (#1614 ) When running on K8s Patroni is communicating with API via the `kubernetes` service, which is address is exposed via the `KUBERNETES_SERVICE_HOST` environment variable. Like any other service, the `kubernetes` service is handled by `kube-proxy`, that depending on configuration is either relying on userspace program or `iptables` for traffic routing. During K8s upgrade, when master nodes are replaced, it is possible that `kube-proxy` doesn't update the service configuration in time and as a result Patroni fails to update the leader lock and demotes postgres. In order to improve the user experience and get more control on the problem we make it possible to bypass the `kubernetes` service and connect directly to API nodes. The strategy is very simple: 1. Resolve list IPs of API nodes from the kubernetes endpoint on every iteration of HA loop. 2. Stick to one of these IPs for API requests 3. Switch to a different IP if connected to IP is not from the list 4. If the request fails, switch to another IP and retry Such a strategy is already used for Etcd and proven to work quite well. In order to enable the feature, you need either to set to `true` `kubernetes.bypass_api_service` in the Patroni configuration file or `PATRONI_KUBERNETES_BYPASS_API_SERVICE` environment variable. If for some reason `GET /default/endpoints/kubernetes` isn't allowed Patroni will disable the feature.	2020-08-14 12:39:47 +02:00
ksarabu1	1ab709c5f0	Multi Sync Standby Support (#1594 ) The new parameter `synchronous_node_count` is used by Patroni to manage number of synchronous standby databases. It is set to 1 by default. It has no effect when synchronous_mode is set to off. When enabled, Patroni manages precise number of synchronous standby databases based on parameter synchronous_node_count and adjusts the state in DCS & synchronous_standby_names as members join and leave. This functionality can be further extended to support Priority (FIRST n) based synchronous replication & Quorum (ANY n) based synchronous replication in future.	2020-08-14 11:51:07 +02:00
Victor Sudakov	d4c6987f78	First variant of notes on PostgreSQL major upgrades. (#1634 ) [skip ci]	2020-07-31 15:43:02 +02:00
Alexander Kukushkin	3341c898ff	Add Etcd v3 protocol support via api gRPC-gateway (#1162 ) The only python-etcd3 client working directly via gRPC still supports only a single endpoint, which is not very nice for high-availability. Since Patroni is already using a heavily hacked version of python-etcd with smart retries and auto-discovery out-of-the-box, I decided to enhance the existing code with limited support of v3 protocol via gRPC-gateway. Unfortunately, watches via gRPC-gateway requires us to open and keep the second connection to the etcd. Known limitations: * The very minimal supported version is 3.0.4. On earlier versions transactions don't work due to bugs in grpc-gateway. Without transactions we can't do atomic operations, i.e. leader locks. * Watches work only starting from 3.1.0 * Authentication works only starting from 3.3.0 * gRPC-gateway does not support authentication using TLS Common Name. This is because gRPC-proxy terminates TLS from its client so all the clients share a cert of the proxy: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/authentication.md#using-tls-common-name	2020-07-31 14:33:40 +02:00
Alexander Kukushkin	bfbc4860d5	PoC: Patroni on pure RAFT (#375 ) * new node can join the cluster dynamically and become a part of consensus * it is also possible to join only Patroni cluster (without adding the node to the raft), just comment or remove `raft.self_addr` for that * when the node joins the cluster it is using values from `raft.partner_addrs` only for initial discovery. * It is possible to run Patroni and Postgres on two nodes plus one node with `patroni_raft_controller` (without Patroni and Postgres). In such setup one can temporarily lose one node without affecting the primary.	2020-07-29 15:34:44 +02:00
Robert Edström	c42d507b82	Add consul service_tags configuration field (#1625 ) This is useful for dynamic service discovery, for example by load balancers.	2020-07-28 12:07:24 +02:00
Victor Sudakov	20bc5ed684	Update README.rst (#1622 ) [ci skip]	2020-07-28 08:36:02 +02:00
Robert Edström	5cc35ec855	Documented required Consul policy (#1626 ) [ci skip] Close #1615	2020-07-28 08:34:37 +02:00
ksarabu1	8a62999eaa	replica & async rest API health check enhancement (#1599 ) - ``GET /replica?lag=<max-lag>``: replica check endpoint. - ``GET /asynchronous?lag=<max-lag>`` or ``GET /async&lag=<max-lag>``: asynchronous standby check endpoint. Checks replication latency and returns status code 200 only when the latency is below a specified value. The key leader_optime from DCS is used for the leader WAL position and compute latency on the replica for performance reasons. Please note that the value in leader_optime might be a couple of seconds old (based on loop_wait). Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>	2020-07-15 10:36:48 +02:00
Alexander Kukushkin	db8c634db3	Create readiness and liveness endpoints (#1590 ) They could be useful to eliminate "unhealthy" pods from subsets addresses when the K8s service with label selectors are used. Real-life example: the node where the primary was running has failed and being shutdown and Patroni can't update (remove) the role label. Therefore on OpenShift the leader service will have two pods assigned, one of them is a failed primary. With the readiness probe defined, the failed primary pod will be excluded from the list.	2020-07-10 14:08:39 +02:00

1 2 3 4

183 Commits