patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-27 10:20:10 +00:00

Author	SHA1	Message	Date
OutBackDingo	3a9befc1a2	update citus images and hosts, to generate helm charts	2025-04-25 10:32:39 +07:00
Kian-Meng Ang	4ce0f99cfb	Fix typos (#3204 ) Found via `codespell -H` and `typos --hidden --format brief`	2024-11-12 10:06:53 +01:00
Brian Hartford	87cb7481ae	Fix timeline metric None value (#3165 ) Close #3164	2024-09-17 10:06:37 +02:00
Alexander Kukushkin	b470ade20e	Change master->primary, take two (#3127 ) This commit is a breaking change: 1. `role` in DCS is written as "primary" instead of "master". 2. `role` in REST API responses is also written as "primary". 3. REST API no longer accepts role=master in requests (for example switchover/failover/restart endpoints). 4. `/metrics` REST API endpoint will no longer report `patroni_master`. 5. `patronictl` no longer accepts `--master` argument. 6. `no_master` option in declarative configuration of custom replica creation methods is no longer treated as a special option, please use `no_leader` instead. 7. `patroni_wale_restore` doesn't accept `--no_master` anymore. 8. `patroni_barman` doesn't accept `--role=master` anymore. 9. callback scripts will be executed with role=primary instead of role=master 10. On Kubernetes Patroni by default will set role label to primary. In case if you want to keep old behavior and avoid downtime or lengthy complex migrations you can configure `kubernetes.leader_label_value` and `kubernetes.standby_leader_label_value` to `master`. However, a few exceptions regarding master are still in place: 1. `GET /master` REST API endpoint will continue to work. 2. `master_start_timeout` and `master_stop_timeout` in global configuration are still accepted. 3. `master` tag is still preserved in Consul services in addition to `primary`. Rationale for these exceptions: DBA doesn't always 100% control the infrastructure and can't adjust the configuration.	2024-08-28 17:19:00 +02:00
Alexander Kukushkin	8cdb0c25d9	Follow up on #2755 (#3137 ) - don't register secondaries with `noloadbalance` tag. - mention in the documentation that secondaries are also registered in `pg_dist_node`. - update docker/kubernetes README files to include examples with secondaries being registered in `pg_dist_node`.	2024-08-27 09:34:12 +02:00
Polina Bungina	d4fd782038	Change all links and org references (#3086 ) * Change all links and org references * Update coverage status badge	2024-06-17 10:28:21 +02:00
Polina Bungina	c1ee99d81d	Update PG version in a couple of places (#2986 ) * All dockerfiles to use PG16 by default * PGVERSION env in the test pipelines to 16.1-1 by default * 11->14 in the dcs-pg mapping for test pipelines * Code comments fixes	2023-12-18 10:44:05 +01:00
Polina Bungina	25ceb68257	Fix k8s dockerfiles (#2870 ) - Allow pip to modify an EXTERNALLY-MANAGED Python installation by passing --break-system-packages - Build Citus for arm64 - Don't use PG_MAJOR argument	2023-09-18 15:30:35 +02:00
Alexander Kukushkin	6f91f4f4e2	Release v3.0.3 (#2719 ) * Bump version * Bump pyright version and fix newly reported issues * Update release notes * Fix typos, extend release process desc * Add readthedocs configuration file v2 * Fix Dockerfile.citus files	2023-06-22 10:46:02 +02:00
Polina Bungina	6c8a3b0d25	Remove bootstrap.pg_hba (#2684 ) * Remove bootstrap.pg_hba * Extend docs for postgresql.pg_hba/pg_ident * Add postgresql.pg_hba/pg_ident to dynamic config docs --------- Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>	2023-05-24 09:01:56 +02:00
Polina Bungina	db71ba3955	Fix dev Dockerfile.citus for arm (#2683 ) - Fix dev Dockerfile.citus for arm Don't purge lib required for citus run * Change citus repo url, update citus version	2023-05-22 16:15:40 +02:00
Alexander Kukushkin	1669a49b2d	Switch to Citus 11.2 (#2548 ) - Update Dockerfile.citus files - Enable behave tests with Citus	2023-02-03 15:29:25 +01:00
Alexander Kukushkin	7869f5e211	Release 3.0.0 (#2545 ) * bump version * update release notes * removed 2.7, 3.4, 3.5, and 3.6 from supported versions in setup.py * switched GH actions back to ubuntu-latest, removed tests with 2.7 and 3.6, and added 3.11 * some little fixes in Citus documentation and behave tests	2023-01-30 10:29:08 +01:00
Alexander Kukushkin	4872ac51e0	Citus integration (#2504 ) Citus cluster (coordinator and workers) will be stored in DCS as a fleet of Patroni logically grouped together: ``` /service/batman/ /service/batman/0/ /service/batman/0/initialize /service/batman/0/leader /service/batman/0/members/ /service/batman/0/members/m1 /service/batman/0/members/m2 /service/batman/ /service/batman/1/ /service/batman/1/initialize /service/batman/1/leader /service/batman/1/members/ /service/batman/1/members/m1 /service/batman/1/members/m2 ... ``` Where 0 is a Citus group for coordinator and 1, 2, etc are worker groups. Such hierarchy allows reading the entire Citus cluster with a single call to DCS (except Zookeeper). The get_cluster() method will be reading the entire Citus cluster on the coordinator because it needs to discover workers. For the worker cluster it will be reading the subtree of its own group. Besides that we introduce a new method get_citus_coordinator(). It will be used only by worker clusters. Since there is no hierarchical structures on K8s we will use the citus group suffix on all objects that Patroni creates. E.g. ``` batman-0-leader # the leader config map for the coordinator batman-0-config # the config map holding initialize, config, and history "keys" ... batman-1-leader # the leader config map for worker group 1 batman-1-config ... ``` Citus integration is enabled from patroni.yaml: ```yaml citus: database: citus group: 0 # 0 is for coordinator, 1, 2, etc are for workers ``` If enabled, Patroni will create the database, citus extension in it, and INSERTs INTO `pg_dist_authinfo` information required for Citus nodes to communicate between each other, i.e. 'password', 'sslcert', 'sslkey' for superuser if they are defined in the Patroni configuration file. When the new Citus coordinator/worker is bootstrapped, Patroni adds `synchronous_mode: on` to the `bootstrap.dcs` section. Besides that, Patroni takes over management of some Postgres GUCs: - `shared_preload_libraries` - Patroni ensures that the "citus" is added to the first place - `max_prepared_transactions` - if not set or set to 0, Patroni changes the value to `max_connections*2` - wal_level - automatically set to logical. It is used by Citus to move/split shards. Under the hood Citus is creating/removing replication slots and they are automatically added by Patroni to the `ignore_slots` configuration to avoid accidental removal. The coordinator primary actively discovers worker primary nodes and registers/updates them in the `pg_dist_node` table using citus_add_node() and citus_update_node() functions. Patroni running on the coordinator provides the new REST API endpoint: `POST /citus`. It is used by workers to facilitate controlled switchovers and restarts of worker primaries. When the worker primary needs to shut down Postgres because of restart or switchover, it calls the `POST /citus` endpoint on the coordinator and the Patroni on the coordinator starts a transaction and calls `citus_update_node(nodeid, 'host-demoted', port)` in order to pause client connections that work with the given worker. Once the new leader is elected or postgres started back, they perform another call to the `POST/citus` endpoint, that does another `citus_update_node()` call with actual hostname and port and commits a transaction. After transaction is committed, coordinator reestablishes connections to the worker node and client connections are unblocked. If clients don't run long transaction the operation finishes without client visible errors, but only a short latency spike. All operations on the `pg_dist_node` are serialized by Patroni on the coordinator. It allows to have more control and ROLLBACK transaction in progress if its lifetime exceeding a certain threshold and there are other worker nodes should be updated.	2023-01-24 16:14:58 +01:00
Polina Bungina	acecbe0d8f	Fix a couple of linter problems, delete TODO.md (#2526 ) Fix a couple of linter problems, remove trailing whitespaces Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>	2023-01-17 10:52:03 +01:00
Alexander Kukushkin	53f89faaab	Release v2.1.5 (#2462 ) * bump version * update release notes * run some behave tests on v15 * automate release process by building/pushing packages on tag creation and release publication	2022-11-28 10:45:04 +01:00
Michael Banck	415180048a	Add selector to StatefulSet definition (#2011 ) This adds the `selector` to the Patroni Kubernetes StatefulSet spec Without it, one gets errors like ``` error: error validating "patroni_k8s.yaml": error validating data: ValidationError(StatefulSet.spec): missing required field "selector" in io.k8s.api.apps.v1.StatefulSetSpec; if you choose to ignore these errors, turn validation off with --validate=false ``` (as mentioned in #1867)	2021-07-21 08:35:51 +02:00
Alexander Kukushkin	23dcfaab49	Make it possible to bypass kubernetes service (#1614 ) When running on K8s Patroni is communicating with API via the `kubernetes` service, which is address is exposed via the `KUBERNETES_SERVICE_HOST` environment variable. Like any other service, the `kubernetes` service is handled by `kube-proxy`, that depending on configuration is either relying on userspace program or `iptables` for traffic routing. During K8s upgrade, when master nodes are replaced, it is possible that `kube-proxy` doesn't update the service configuration in time and as a result Patroni fails to update the leader lock and demotes postgres. In order to improve the user experience and get more control on the problem we make it possible to bypass the `kubernetes` service and connect directly to API nodes. The strategy is very simple: 1. Resolve list IPs of API nodes from the kubernetes endpoint on every iteration of HA loop. 2. Stick to one of these IPs for API requests 3. Switch to a different IP if connected to IP is not from the list 4. If the request fails, switch to another IP and retry Such a strategy is already used for Etcd and proven to work quite well. In order to enable the feature, you need either to set to `true` `kubernetes.bypass_api_service` in the Patroni configuration file or `PATRONI_KUBERNETES_BYPASS_API_SERVICE` environment variable. If for some reason `GET /default/endpoints/kubernetes` isn't allowed Patroni will disable the feature.	2020-08-14 12:39:47 +02:00
Alexander Kukushkin	db8c634db3	Create readiness and liveness endpoints (#1590 ) They could be useful to eliminate "unhealthy" pods from subsets addresses when the K8s service with label selectors are used. Real-life example: the node where the primary was running has failed and being shutdown and Patroni can't update (remove) the role label. Therefore on OpenShift the leader service will have two pods assigned, one of them is a failed primary. With the readiness probe defined, the failed primary pod will be excluded from the list.	2020-07-10 14:08:39 +02:00
Paul Voss	7e17092809	Fix permissions for openshift block devices (#1361 ) OpenShift enforces securityContext.fsGroups for block devices and sets group stickybits for volumeMounts. This leads to patroni pods failing to start after the first restart: > 2020-01-13 14:46:13.695 UTC [143] FATAL: data directory "/home/postgres/pgdata/pgroot/data" has invalid permissions 2020-01-13 14:46:13.695 UTC [143] DETAIL: Permissions should be u=rwx (0700) or u=rwx,g=rx (0750). A initContainer which fixes the OpenShift tampering solves the issue. I stole the solution from the stable postgres helm chart: https://github.com/helm/charts/pull/14540/files Tested on OpenShift v3.11 Note: This error does not occur when using shared filesystems (like NFS)	2020-02-13 15:07:56 +01:00
joe-tss	cf0d712450	Grant deletecollection to patroni serviceaccount (#1033 ) Closes #1032	2019-04-11 15:17:06 +02:00
Alexander Kukushkin	0c516de147	Create headless service associated with $SCOPE-config endpoint (#958 ) if there is no service defined k8s assumes that endpoint is orphaned and removes it. Patroni tries to create the service only in case if use_endpoints is enabled if the following cases: 1. Upon start 2. When it tries to (re-)create the config endpoint If for some reason creation of the service has failed, Patroni will retry it on every cycle of HA loop. Usually it fails due to lack of permissions and if you don't want to give such permissions to the service account used by Patroni, you can create the service explicitly in the deployment manifest.	2019-02-15 13:35:04 +01:00
Kostiantyn Nemchenko	96ea01bee4	Fix kubernetes demo files (#885 ) - Update postgres docker image to the latest 11 version. - Remove empty lines inside the `RUN` command to make the Dockerfile compatible with future docker versions. - Set the `PATRONI_KUBERNETES_POD_IP` environment variable, which is required when _use_endpoints_ is enabled. Otherwise, the `KeyError` is raised [here](https://github.com/zalando/patroni/blob/master/patroni/dcs/kubernetes.py#L95). - Set `EDITOR` environment variable to make configuration changes via `patronictl edit-config`.	2018-12-03 15:46:25 +01:00
Shea Stewart	6519a192b1	add openshift customizations, templates, and test (#871 ) - It modifies the Dockerfile and entrypoint slightly to allow for OpenShift SCCs to operate correctly - It adds 2 template examples that can be easily modified by changing parameters Fixes #572	2018-11-21 18:01:39 +01:00
lwsbox	bd9f30372e	fix bug: change pip to pip3 (#851 ) should use python3 to install in postgres10 image.	2018-11-05 14:17:22 +01:00
Josh Berkus	d247b5ae17	Fix several build problems with the Docker image for testing Kubernet… (#735 ) 1. Update to Postgres 10 2. Install most of the python modules from deb packages 3. Do not upgrade pip	2018-11-04 08:31:11 +01:00
Oleksii Kliukin	00c2e1c2d0	Grant delete on endpoints and configmaps in RBAC. (#749 ) 'patronictl remove' deletes the cluster configuration (stored either in configmaps or endpoints) and cannot be run from the postgres pod w/o 'delete' on those objects being granted to the pod service account.	2018-07-23 20:39:46 +03:00
Maciej Szulik	a4aaf53212	Add proper rbac to run patroni on k8s (#616 ) Adds 3 resources that will properly setup the RBAC: 1. service account, which is also assigned to the pods of the cluster, so that they use those particular permissions 2. a role, which holds only the necessary permissions that patroni members need to interact with k8s cluster 3. a rolebinding, which connects two two former things together to use. The role and rolebinding was created using this tool https://github.com/liggitt/audit2rbac which looks at [audit logs](https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#advanced-audit) provided by the api server.	2018-01-30 12:00:49 +01:00
Maciej Szulik	3d293ac087	Change the pip definition in Dockerfile to use master now (#617 )	2018-01-30 10:58:08 +01:00
Alexander Kukushkin	4328c15010	Make Patroni Kubernetes native (#500 ) * Use ConfigMaps or Endpoins for leader elections and to keep cluster state * Label pods with a postgres role * change behavior of pip install. From now on it will not install all dependencies, you have to specify explicitly DCS you want to use Patroni with: `pip install patroni[etcd,zookeeper,kubernetes]`	2017-12-08 16:55:00 +01:00

30 Commits