patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-28 02:20:04 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	39875f448c	Release v3.0.2 (#2617 ) - bump version - update release notes - update links to Postgres Slack - simplify /sync health-check endpoint code - update unit-tests to cover missing lines	2023-03-24 08:54:54 +01:00
T.v.Dein	60723f5fa4	Add metric to report about sync standby replica status (#2615 ) Close #2613 Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>	2023-03-23 09:32:29 +01:00
Víctor Oriol i Aguilar	36c17e944b	high availability across multiple datacenter #2587 (#2598 ) documentations about how deploy a high availability across multiple datacenters Close #2587	2023-03-14 15:39:50 +01:00
Alexander Kukushkin	eefa15b390	Make K8s retriable HTTP status code configurable (#2585 ) Configuration parameter is `kubernetes.retriable_http_codes` or `PATRONI_KUBERNETES_RETRIABLE_HTTP_CODES` environment variable. These status codes are added to the default list of 500, 503, 504. Close https://github.com/zalando/patroni/issues/2536	2023-03-10 09:38:12 +01:00
Burak Ergen	89595babdf	add "GET /metrics" rest_api.rst (#2576 )	2023-03-02 09:40:54 +01:00
Polina Bungina	422047f105	Release 3.0.1 (#2561 ) * Bump version * Update release notes * Return 3.6 to supported versions in setup.py	2023-02-16 08:51:47 +01:00
Alexander Kukushkin	8ac8ed6584	Update Citus link to the github.com repo (#2546 ) Per suggestion from @clairegiordano	2023-02-02 11:50:19 +01:00
Alexander Kukushkin	7869f5e211	Release 3.0.0 (#2545 ) * bump version * update release notes * removed 2.7, 3.4, 3.5, and 3.6 from supported versions in setup.py * switched GH actions back to ubuntu-latest, removed tests with 2.7 and 3.6, and added 3.11 * some little fixes in Citus documentation and behave tests	2023-01-30 10:29:08 +01:00
Alexander Kukushkin	4c3af2d1a0	Change master->primary/leader/member (#2541 ) keep as much backward compatibility as possible. Following changes were made: 1. All internal checks are performed as `role in ('master', 'primary')` 2. All internal variables/functions/methods are renamed 3. `GET /metrics` endpoint returns `patroni_primary` in addition to `patroni_master`. 4. Logs are changed to use leader/primary/member/remote depending on the context 5. Unit-tests are using only role = 'primary' instead of 'master' to verify that 1 works. 6. patronictl still supports old syntax, but also accepts `--leader` and `--primary`. 7. `master_(start\|stop)_timeout` is automatically translated to `primary_(start\|stop)_timeout` if the last one is not set. 8. updated the documentation and some examples Future plan: in the next major release switch role name from `master` to `primary` and maybe drop `master` altogether. The Kubernetes implementation will require more work and keep two labels in parallel. Label values should probably be configurable as described in https://github.com/zalando/patroni/issues/2495.	2023-01-27 07:40:24 +01:00
Alexander Kukushkin	4872ac51e0	Citus integration (#2504 ) Citus cluster (coordinator and workers) will be stored in DCS as a fleet of Patroni logically grouped together: ``` /service/batman/ /service/batman/0/ /service/batman/0/initialize /service/batman/0/leader /service/batman/0/members/ /service/batman/0/members/m1 /service/batman/0/members/m2 /service/batman/ /service/batman/1/ /service/batman/1/initialize /service/batman/1/leader /service/batman/1/members/ /service/batman/1/members/m1 /service/batman/1/members/m2 ... ``` Where 0 is a Citus group for coordinator and 1, 2, etc are worker groups. Such hierarchy allows reading the entire Citus cluster with a single call to DCS (except Zookeeper). The get_cluster() method will be reading the entire Citus cluster on the coordinator because it needs to discover workers. For the worker cluster it will be reading the subtree of its own group. Besides that we introduce a new method get_citus_coordinator(). It will be used only by worker clusters. Since there is no hierarchical structures on K8s we will use the citus group suffix on all objects that Patroni creates. E.g. ``` batman-0-leader # the leader config map for the coordinator batman-0-config # the config map holding initialize, config, and history "keys" ... batman-1-leader # the leader config map for worker group 1 batman-1-config ... ``` Citus integration is enabled from patroni.yaml: ```yaml citus: database: citus group: 0 # 0 is for coordinator, 1, 2, etc are for workers ``` If enabled, Patroni will create the database, citus extension in it, and INSERTs INTO `pg_dist_authinfo` information required for Citus nodes to communicate between each other, i.e. 'password', 'sslcert', 'sslkey' for superuser if they are defined in the Patroni configuration file. When the new Citus coordinator/worker is bootstrapped, Patroni adds `synchronous_mode: on` to the `bootstrap.dcs` section. Besides that, Patroni takes over management of some Postgres GUCs: - `shared_preload_libraries` - Patroni ensures that the "citus" is added to the first place - `max_prepared_transactions` - if not set or set to 0, Patroni changes the value to `max_connections*2` - wal_level - automatically set to logical. It is used by Citus to move/split shards. Under the hood Citus is creating/removing replication slots and they are automatically added by Patroni to the `ignore_slots` configuration to avoid accidental removal. The coordinator primary actively discovers worker primary nodes and registers/updates them in the `pg_dist_node` table using citus_add_node() and citus_update_node() functions. Patroni running on the coordinator provides the new REST API endpoint: `POST /citus`. It is used by workers to facilitate controlled switchovers and restarts of worker primaries. When the worker primary needs to shut down Postgres because of restart or switchover, it calls the `POST /citus` endpoint on the coordinator and the Patroni on the coordinator starts a transaction and calls `citus_update_node(nodeid, 'host-demoted', port)` in order to pause client connections that work with the given worker. Once the new leader is elected or postgres started back, they perform another call to the `POST/citus` endpoint, that does another `citus_update_node()` call with actual hostname and port and commits a transaction. After transaction is committed, coordinator reestablishes connections to the worker node and client connections are unblocked. If clients don't run long transaction the operation finishes without client visible errors, but only a short latency spike. All operations on the `pg_dist_node` are serialized by Patroni on the coordinator. It allows to have more control and ROLLBACK transaction in progress if its lifetime exceeding a certain threshold and there are other worker nodes should be updated.	2023-01-24 16:14:58 +01:00
Polina Bungina	acecbe0d8f	Fix a couple of linter problems, delete TODO.md (#2526 ) Fix a couple of linter problems, remove trailing whitespaces Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>	2023-01-17 10:52:03 +01:00
Alexander Kukushkin	2ea0357854	DCS failsafe mode (#2379 ) If enabled it will allow Patroni to cope with DCS outages. In case of a DCS outage the leader tries to call all remaining members in the cluster via API and if all of them respond with success the leader will not be demoted. The failsafe_mode could be enabled by running ```sh patronictl edit-config -s failsafe_mode=true ``` or by calling the `/config` REST API endpoint. Co-authored-by: Polina Bungina <bungina@gmail.com>	2023-01-13 13:35:05 +01:00
Polina Bungina	650344fca8	Update Slack link in README.rst and CONTRIBUTING.rst (#2520 ) * Update Slack link in README.rst and CONTRIBUTING.rst	2023-01-11 16:06:25 +01:00
Alexander Kukushkin	442bd3f434	Compatibility with some old modules (#2514 ) - old click differently handles argument names - old pytest doesn't like `from mock import call` Bump version and update release notes. Close: https://github.com/zalando/patroni/issues/2508 Close: https://github.com/zalando/patroni/issues/2512	2023-01-04 07:24:52 +01:00
Polina Bungina	bad158046e	Release v2.1.6 (#2507 ) * bump version * update release notes Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>	2022-12-30 13:32:34 +01:00
Alexander Kukushkin	53f89faaab	Release v2.1.5 (#2462 ) * bump version * update release notes * run some behave tests on v15 * automate release process by building/pushing packages on tag creation and release publication	2022-11-28 10:45:04 +01:00
Alexander Kukushkin	8f8e9c9b81	Inptroduce postgresql.proxy_address (#2437 ) It will be written to member key in DCS as the `proxy_url` and could be used/useful for service discovery.	2022-10-24 10:23:06 +02:00
Jim Chanco Jr	84dc72b031	docs: Change term "Master" to "primary" or "leader" (#2417 )	2022-09-29 08:48:49 +02:00
Alexander Kukushkin	88db6018ac	Improve liveness probe (#2395 ) it will start failing if the heartbeat loop isn't running longer than `ttl` on the primary or `2*ttl` on the replica. Close https://github.com/zalando/patroni/issues/2388	2022-09-01 11:34:42 +02:00
Robert Cutajar	f92d975e7b	#2021 add HEAD support - minimal (#2360 )	2022-08-19 13:27:08 +02:00
Alexander Kukushkin	2d08e88c3e	Don't drop replication slots in pause (#2383 ) If replication slots are enabled Patroni automatically creates them for any cluster member that is supposed to stream from a given node and for any permanent slot defined in the global configuration. If the member disappears from the DCS Patroni automatically removes the replication slot for it. The same behavior was in the maintenance mode (pause). This commit disables removal of any replication slots that don't match Patroni's expectations in pause. Close https://github.com/zalando/patroni/issues/2314	2022-08-15 15:11:27 +02:00
Michael Banck	f65efecac9	Clarify standby cluster documentation. (#2369 ) This adds a paragraph to the Standby Cluster section clarifying that the standby cluster is independent of the primary cluster and not visible from the primary cluster's Patroni interface. Close #2090	2022-08-02 10:14:37 +02:00
Victor Sudakov	8d7828b079	Location of postgresql.conf on the remote master. (#2343 ) Close #2337	2022-06-30 10:50:38 +02:00
sahapasci	8d773be533	Update SETTINGS.rst (#2339 ) consul.service_check_interval defaults to 5 seconds	2022-06-30 10:49:56 +02:00
Michael Banck	a77fbb1912	Fix markup - the -status is part of the command (#2323 )	2022-06-13 14:57:28 +02:00
Alexander Kukushkin	fb06af9adb	Release 2.1.4 (#2322 ) - bump version - update release notes - implement missing unit-tests	2022-06-01 16:00:56 +02:00
Dennis4b	b42550aad4	Add /read-only-sync endpoint (#2305 ) (#2311 ) `/read-only-sync` mirrors `/read-only`, but only returns `200` on a replica if this replica is a synchronous standby.	2022-05-30 17:09:43 +02:00
Alexander Kukushkin	f67174d7cc	Use replication credential in divergance check only with v10 and older (#2308 ) and document in which case pg_hba.conf should allow access to "postgres" database with replication credentials. Close #2261	2022-05-20 10:24:49 +02:00
James Stroud	0057f9018b	Spell out DCS (#2228 )	2022-03-24 13:58:10 +01:00
grembo	c4e208ec50	Allow setting TLSServerName on consul service checks (#2231 ) See also https://www.consul.io/api-docs/agent/check#tlsservername Useful in case checks are done by IP and the consul `node_name` is not an FQDN.	2022-03-24 13:57:17 +01:00
Alexander Kukushkin	333d41d9f0	Release 2.1.3 (#2219 ) * Implement missing unit-tests * Bump version * Update release notes	2022-02-18 14:16:15 +01:00
Alexander Kukushkin	63586f0477	Add ctl.keyfile_password support (#2145 ) It compliments restapi.keyfile_password added in the #1825	2021-12-21 11:19:39 +01:00
Alexander Kukushkin	dc9ff4cb8a	Release 2.1.2 (#2136 ) * Implement missing unit-tests * Bump version * Update release notes	2021-12-03 15:49:57 +01:00
Alexander Kukushkin	fce889cd04	Compatibility with psycopg 3.0 (#2088 ) By default `psycopg2` is preferred. The `psycopg>=3.0` will be used only if `psycopg2` is not available or its version is too old.	2021-11-19 14:32:54 +01:00
Alwyn Davis	14bb28c349	Allow setting ACLs for znodes in Zookeeper (#2086 ) Add a configuration option (`set_acls`) for Zookeeper DCS so that Kazoo will apply a default ACL for each znode that it creates. The intention is to improve security of the znodes when a single Zookeeper cluster is used as the DCS for multiple Patroni clusters. Zookeeper [does not apply an ACL to child znodes](https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#sc_ZooKeeperAccessControl), so permissions can't be set at the `scope` level and then be inherited by other znodes that Patroni creates. Kazoo instead [provides an option for configuring a default_acl](https://kazoo.readthedocs.io/en/latest/api/client.html#kazoo.client.KazooClient.__init__) that will be applied on node creation. Example configuration in Patroni might then be: ``` zookeeper: set_acls: CN=principal1: [ALL] CN=principal2: - READ ```	2021-10-28 09:59:45 +02:00
Michael Banck	e28557d2f0	Fix sphinx build. (#2080 ) Sphinx' add_stylesheet() has been deprecated for a long time and got removed in recent versions of sphinx. If available, use add_css_file() instead. Close #2079.	2021-10-07 16:07:41 +02:00
Farid Zarazvand	34db0bba16	PostgreSQL v14 is supported since v2.1.0 (#2078 )	2021-10-07 16:07:00 +02:00
Kostiantyn Nemchenko	3616906434	Add sslcrldir connection parameter support (#2068 ) This allows setting the `sslcrldir` connection parameter available since PostgreSQL 14.	2021-10-07 16:04:27 +02:00
Alexander Kukushkin	93efa91bbd	Release 2.1.1 (#2039 ) * Update release notes * Bump version * Improve unit-test coverage	2021-08-19 15:44:37 +02:00
Aron Parsons	313adb61ec	Make the CA bundle configurable for in-cluster Kubernetes config (#2025 ) Close https://github.com/zalando/patroni/issues/1758	2021-08-17 16:15:39 +02:00
DavidPavlicek	195b8bf049	Support for ETCD SRV name suffix (#2029 ) Add support for ETCD SRV name suffix as per description in ETCD dosc: > The -discovery-srv-name flag additionally configures a suffix to the SRV name that is queried during discovery. Use this flag to differentiate between multiple etcd clusters under the same domain. For example, if discovery-srv=example.com and -discovery-srv-name=foo are set, the following DNS SRV queries are made: > > _etcd-server-ssl-foo._tcp.example.com > _etcd-server-foo._tcp.example.com All test passes, but not been tested on the live ETCD system yet... Please, take a look and send feedback. Resolves #2028	2021-08-13 15:49:01 +02:00
Julien Riou	cb80f7ee06	docs: fix typo in 2.1.0 release note (#1999 )	2021-07-09 13:21:53 +02:00
Alexander Kukushkin	f2309abc87	Release 2.1.0 (#1998 ) * bump version * update release notes	2021-07-06 10:19:22 +02:00
Christian Clauss	75e52226a8	Fix typos discovered by codespell (#1997 )	2021-07-06 10:01:30 +02:00
Alexander Kukushkin	62aa1333cd	Implemented allowlist for REST API (#1959 ) If configured, only IPs that matching rules would be allowed to call unsafe endpoints. In addition to that, it is possible to automatically include IPs of members of the cluster to the list. If neither of the above is configured the old behavior is retained. Partially address https://github.com/zalando/patroni/issues/1734	2021-07-05 09:43:56 +02:00
Arman Jafari Tehrani	e48df9987d	Add health check on user defined tags (#1964 ) Close #1958	2021-06-23 08:30:10 +02:00
Alexander Kukushkin	03e71b6717	The /leader endpoint returns 200 if node holds the lock (#1917 ) Promoting the standby cluster requires updating load-balancer health checks, which is not very convenient and easy to forget. In order to solve it, we change the behavior of the `/leader` health-check endpoint. It will return 200 without taking into account whether PostgreSQL is running as the primary or the standby_leader.	2021-06-22 08:21:29 +02:00
melrifa	6d6b504cb8	Add support for patroni replication user socket connection (#1865 ) Close #1866	2021-04-20 09:43:05 +02:00
Alexander Kukushkin	c7173aadd7	Failover logical slots (#1820 ) Effectively, this PR consists of a few changes: 1. The easy part: In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots. In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster. 2. The tricky part: On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()` function. When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward. 3. Additional requirements: In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots. 4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use. Compatibility. The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary. The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed. Close: https://github.com/zalando/patroni/issues/1749	2021-03-25 16:18:23 +01:00
Alexander Kukushkin	b341ab2e2f	Release 2.0.2 (#1851 ) * bump version * update release notes * implement missing unit-test	2021-02-22 12:28:19 +01:00

1 2 3 4 5

215 Commits