patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-27 18:20:05 +00:00

Author	SHA1	Message	Date
علی سالمی	5c4ee30dae	Add JSON log format to logging configuration (#2982 ) Now patroni can be configured as bellow to log in json format. ```yaml log: type: json format: - asctime: '@timestamp' - levelname: level - message - module - name: logger_name static_fields: app: patroni ``` This config produce this log: ```json { "@timestamp": "2023-12-14 19:51:24,872", "level": "INFO", "message": "Lock owner: None; I am postgresql1", "module": "ha", "app": "patroni", "logger_name": "patroni.ha" } ```	2024-01-16 10:42:48 +01:00
Polina Bungina	71ccf91e36	Don't filter out contradictory nofailover tag (#2992 ) * Ensure that nofailover will always be used if both nofailover and failover_priority tags are provided * Call _validate_failover_tags from reload_local_configuration() as well * Properly check values in the _validate_failover_tags(): nofailover value should be casted to boolean like it is done when accessed in other places	2024-01-02 09:30:18 +01:00
Alexander Kukushkin	193c73f6b8	Make GlobalConfig really global (#2935 ) 1. extract `GlobalConfig` class to its own module 2. make the module instantiate the `GlobalConfig` object on load and replace sys.modules with the this instance 3. don't pass `GlobalConfig` object around, but use `patroni.global_config` module everywhere. 4. move `ignore_slots_matchers`, `max_timelines_history`, and `permanent_slots` from `ClusterConfig` to `GlobalConfig`. 5. add `use_slots` property to global_config and remove duplicated code from `Cluster` and `Postgresql.ConfigHandler`. Besides that improve readability of couple of checks in ha.py and formatting of `/config` key when saved from patronictl.	2023-11-24 09:26:05 +01:00
Mark Pekala	f5ee67fa1c	Feature: failover priority (#2780 ) The priority is configured with `failover_priority` tag. Possible values are from `0` till infinity, where `0` means that the node will never become the leader, which is the same as `nofailover` tag set to `true`. As a result, in the configuration file one should set only one of `failover_priority` or `nofailover` tags. The failover priority kicks in only when there are more than one node have the same receive/replay LSN and are ahead of other nodes in the cluster. In this case the node with higher value of `failover_priority` is preferred. If there is a node with higher values of receive/replay LSN, it will become the new leader even if it has lower value of `failover_priority` (except when priority is set to 0). Close https://github.com/zalando/patroni/issues/2759	2023-10-24 12:22:48 +02:00
Polina Bungina	efacc6c16b	Ignore synchronous_mode setting in a standby cluster (#2896 ) is_synchronous_mode() should always return False in standby clusters	2023-10-06 10:21:37 +02:00
Alexander Kukushkin	9283ebda64	Enforce loop_wait/retry_timeout/ttl rule (#2869 ) * hard-code minimal possible values * make adjustments if values are lower or if the rule is violated and show warnings * update documentation	2023-10-04 11:44:57 +02:00
Alexander Kukushkin	2bd821a768	Bugfix for GUC's values with units (#2883 ) Despite being validated by `IntValidator` some GUC's couldn't be casted directly to `int` because they include suffix. Example: `128MB`. Close https://github.com/zalando/patroni/issues/2879	2023-09-26 08:31:55 +02:00
Polina Bungina	71863cedcb	Always store CMDLINE_OPTIONS config values as int (#2861 )	2023-09-14 18:34:45 +02:00
Alexander Kukushkin	01d07f86cd	Set permissions for files and directories created in PGDATA (#2781 ) Postgres supports two types of permissions: 1. owner only 2. group readable By default the first one is used because it provides better security. But, sometimes people want to run a backup tool with the user that is different from postgres. In this case the second option becomes very useful. Unfortunately it didn't work correctly because Patroni was creating files with owner access only permissions. This PR changes the behavior and permissions on files and directories that are created by Patroni will be calculated based on permissions of PGDATA. I.e., they will get group readable access when it is necessary. Close #1899 Close #1901	2023-08-02 13:15:43 +02:00
Alexander Kukushkin	94bfea1a81	Do not fail validation for a value that is fine (#2791 ) In issue #2735 it was discussed that there should be some warning around PostgreSQL parameters that do not pass validation. This commit ensures something is logged for parameters that fail validation and therefore fall back to default values. Close #2735 Close #2740	2023-07-31 11:35:30 +02:00
Polina Bungina	21e92fd166	Add env vars for custom bin names (#2706 )	2023-06-01 14:06:11 +02:00
Alexander Kukushkin	76b3b99de2	Enable pyright strict mode (#2652 ) - added pyrightconfig.json with typeCheckingMode=strict - added type hints to all files except api.py - added type stubs for dns, etcd, consul, kazoo, pysyncobj and other modules - added type stubs for psycopg2 and urllib3 with some little fixes - fixes most of the issues reported by pyright - remaining issues will be addressed later, along with enabling CI linting task	2023-05-09 09:38:00 +02:00
Alexander Kukushkin	6f357a4e17	Factor out global configuration into a dedicated class (#2628 ) It will help to avoid code duplications.	2023-04-03 08:09:29 +02:00
Alexander Kukushkin	c1bfb0e6d6	Remove python 2.7 support (#2571 ) - get rid from 2.7 specific modules: `six`, `ipaddress` - use Python3 unpacking operator - use `shutil.which()` instead of `find_executable()`	2023-03-13 17:00:04 +01:00
Alexander Kukushkin	eefa15b390	Make K8s retriable HTTP status code configurable (#2585 ) Configuration parameter is `kubernetes.retriable_http_codes` or `PATRONI_KUBERNETES_RETRIABLE_HTTP_CODES` environment variable. These status codes are added to the default list of 500, 503, 504. Close https://github.com/zalando/patroni/issues/2536	2023-03-10 09:38:12 +01:00
Alexander Kukushkin	4c3af2d1a0	Change master->primary/leader/member (#2541 ) keep as much backward compatibility as possible. Following changes were made: 1. All internal checks are performed as `role in ('master', 'primary')` 2. All internal variables/functions/methods are renamed 3. `GET /metrics` endpoint returns `patroni_primary` in addition to `patroni_master`. 4. Logs are changed to use leader/primary/member/remote depending on the context 5. Unit-tests are using only role = 'primary' instead of 'master' to verify that 1 works. 6. patronictl still supports old syntax, but also accepts `--leader` and `--primary`. 7. `master_(start\|stop)_timeout` is automatically translated to `primary_(start\|stop)_timeout` if the last one is not set. 8. updated the documentation and some examples Future plan: in the next major release switch role name from `master` to `primary` and maybe drop `master` altogether. The Kubernetes implementation will require more work and keep two labels in parallel. Label values should probably be configurable as described in https://github.com/zalando/patroni/issues/2495.	2023-01-27 07:40:24 +01:00
Alexander Kukushkin	4872ac51e0	Citus integration (#2504 ) Citus cluster (coordinator and workers) will be stored in DCS as a fleet of Patroni logically grouped together: ``` /service/batman/ /service/batman/0/ /service/batman/0/initialize /service/batman/0/leader /service/batman/0/members/ /service/batman/0/members/m1 /service/batman/0/members/m2 /service/batman/ /service/batman/1/ /service/batman/1/initialize /service/batman/1/leader /service/batman/1/members/ /service/batman/1/members/m1 /service/batman/1/members/m2 ... ``` Where 0 is a Citus group for coordinator and 1, 2, etc are worker groups. Such hierarchy allows reading the entire Citus cluster with a single call to DCS (except Zookeeper). The get_cluster() method will be reading the entire Citus cluster on the coordinator because it needs to discover workers. For the worker cluster it will be reading the subtree of its own group. Besides that we introduce a new method get_citus_coordinator(). It will be used only by worker clusters. Since there is no hierarchical structures on K8s we will use the citus group suffix on all objects that Patroni creates. E.g. ``` batman-0-leader # the leader config map for the coordinator batman-0-config # the config map holding initialize, config, and history "keys" ... batman-1-leader # the leader config map for worker group 1 batman-1-config ... ``` Citus integration is enabled from patroni.yaml: ```yaml citus: database: citus group: 0 # 0 is for coordinator, 1, 2, etc are for workers ``` If enabled, Patroni will create the database, citus extension in it, and INSERTs INTO `pg_dist_authinfo` information required for Citus nodes to communicate between each other, i.e. 'password', 'sslcert', 'sslkey' for superuser if they are defined in the Patroni configuration file. When the new Citus coordinator/worker is bootstrapped, Patroni adds `synchronous_mode: on` to the `bootstrap.dcs` section. Besides that, Patroni takes over management of some Postgres GUCs: - `shared_preload_libraries` - Patroni ensures that the "citus" is added to the first place - `max_prepared_transactions` - if not set or set to 0, Patroni changes the value to `max_connections*2` - wal_level - automatically set to logical. It is used by Citus to move/split shards. Under the hood Citus is creating/removing replication slots and they are automatically added by Patroni to the `ignore_slots` configuration to avoid accidental removal. The coordinator primary actively discovers worker primary nodes and registers/updates them in the `pg_dist_node` table using citus_add_node() and citus_update_node() functions. Patroni running on the coordinator provides the new REST API endpoint: `POST /citus`. It is used by workers to facilitate controlled switchovers and restarts of worker primaries. When the worker primary needs to shut down Postgres because of restart or switchover, it calls the `POST /citus` endpoint on the coordinator and the Patroni on the coordinator starts a transaction and calls `citus_update_node(nodeid, 'host-demoted', port)` in order to pause client connections that work with the given worker. Once the new leader is elected or postgres started back, they perform another call to the `POST/citus` endpoint, that does another `citus_update_node()` call with actual hostname and port and commits a transaction. After transaction is committed, coordinator reestablishes connections to the worker node and client connections are unblocked. If clients don't run long transaction the operation finishes without client visible errors, but only a short latency spike. All operations on the `pg_dist_node` are serialized by Patroni on the coordinator. It allows to have more control and ROLLBACK transaction in progress if its lifetime exceeding a certain threshold and there are other worker nodes should be updated.	2023-01-24 16:14:58 +01:00
Alexander Kukushkin	8f8e9c9b81	Inptroduce postgresql.proxy_address (#2437 ) It will be written to member key in DCS as the `proxy_url` and could be used/useful for service discovery.	2022-10-24 10:23:06 +02:00
Alexander Kukushkin	62aa1333cd	Implemented allowlist for REST API (#1959 ) If configured, only IPs that matching rules would be allowed to call unsafe endpoints. In addition to that, it is possible to automatically include IPs of members of the cluster to the list. If neither of the above is configured the old behavior is retained. Partially address https://github.com/zalando/patroni/issues/1734	2021-07-05 09:43:56 +02:00
Floris van Nee	98f50423ca	Add support for configuration directories (#1669 ) (#1671 ) It is now also possible to point the configuration path to a directory instead of a file. Patroni will find all yml files in the directory and apply them in sorted order Close https://github.com/zalando/patroni/issues/1669	2020-09-02 13:57:22 +02:00
Alexander Kukushkin	bfbc4860d5	PoC: Patroni on pure RAFT (#375 ) * new node can join the cluster dynamically and become a part of consensus * it is also possible to join only Patroni cluster (without adding the node to the raft), just comment or remove `raft.self_addr` for that * when the node joins the cluster it is using values from `raft.partner_addrs` only for initial discovery. * It is possible to run Patroni and Postgres on two nodes plus one node with `patroni_raft_controller` (without Patroni and Postgres). In such setup one can temporarily lose one node without affecting the primary.	2020-07-29 15:34:44 +02:00
Igor Yanchenko	726ee46111	Implemented patroni --version (#1291 ) That required a refactoring of `Config` and `Patroni` classes. Now one has to explicitely create the instance of `Config` before creating `Patroni`. The Config file can optionally call the validate function.	2019-12-02 12:14:19 +01:00
Alexander Kukushkin	3d29cb7e50	Perform pg_ctl reload regardless of config changes (#1204 ) It is possible that some config files are not controlled by Patroni and when somebody is doing reload via REST API or by sending SIGHUP to Patroni process the usual expectation is that postgres will also be reloaded, but it didn't happen when there were no changes in the postgresql section of Patroni config. For example one might replace ssl_cert_file and ssl_key_file on the filesystem and starting from PostgreSQL 10 it just requires a reload, but Patroni wasn't doing it. In addition to that fix the issue with handling of `wal_buffers`. The default value depends on `shared_buffers` and `wal_segment_size` and therefore Patroni was exposing pending_restart when the new value in the config was explicitly set to -1 (default). Close https://github.com/zalando/patroni/issues/1198	2019-10-10 14:49:30 +02:00
Pavlo Golub	b53a29c022	Fix unit-tests for Windows (#1014 ) Closes #1013	2019-04-02 13:58:17 +02:00
Alexander Kukushkin	381a5b80d2	Release 1.5.4 (#931 ) * Bump version * Update release notes * Make it possible to configure registration of Service in Consul via env variables	2019-01-15 12:14:19 +01:00
Alexander Kukushkin	e080ded44b	Make logging configurable via YAML file (#927 ) It allows changing logging settings in runtime by updating config and doing reload or sending `SIGHUP` to the Patroni process. Important! Environment configuration names related to logging were renamed and documentation accordingly updated. For compatibility reasons Patroni still accepts `PATRONI_LOGLEVEL` and `PATRONI_FORMAT`, but some other variables related to logging, which were introduced only recently (between releases), will stop working. I think it is ok, since we didn't release the new version yet and therefore it is very unlikely that somebody is using them except authors of corresponding PRs. Example of log section in the config file: ```yaml log: dir: /where/to/write/patroni/logs # if not specified, write logs to stderr file_size: 50000000 # 50MB file_num: 10 # keep history of 10 files dateformat: '%Y-%m-%d %H:%M:%S' loggers: # increase log verbosity for etcd.client and urllib3 etcd.client: DEBUG urllib3: DEBUG ```	2019-01-15 08:42:13 +01:00
Dmitry Dolgov	11f7ceb521	Do not check types of standby_cluster configuration (#924 ) Simply allow valid keys	2019-01-14 14:16:15 +01:00
Alexander Kukushkin	76d1b4cfd8	Minor fixes (#808 ) * Use `shutil.move` instead of `os.replace`, which is available only from 3.3 * Introduce standby-leader health-check and consul service * Improve unit tests, some lines were not covered * rename `assertEquals` -> `assertEqual`, due to deprecation warning	2018-09-19 16:32:33 +02:00
Dmitry Dolgov	dd7c3c349f	[WIP] Standby cluster implementation (#679 ) Implementation of "standby cluster" described in #657. Standby cluster consists of a "standby leader", that replicates from a "remote master" (which is not a part of current patroni cluster and can be anywhere), and cascade replicas, that replicate from the corresponding standby leader. "Standby leader" behaves pretty much like a regular leader, which means that it holds a leader lock in DSC, in case if disappears there will be an election of a new "standby leader". One can define such a cluster using the section "standby_cluster" in patroni config file. This section provides parameters for standby cluster, that will be applied only once during bootstrap and can be changed only through DSC.	2018-09-07 10:10:56 +02:00
Alexander Kukushkin	87e9aab04c	Improve tests (#778 ) * Implement missing unit-tests * Add acceptance tests for ISSUE #776 * Update list of classifiers, keywords and authors	2018-08-29 11:29:37 +02:00
Alexander Kukushkin	4328c15010	Make Patroni Kubernetes native (#500 ) * Use ConfigMaps or Endpoins for leader elections and to keep cluster state * Label pods with a postgres role * change behavior of pip install. From now on it will not install all dependencies, you have to specify explicitly DCS you want to use Patroni with: `pip install patroni[etcd,zookeeper,kubernetes]`	2017-12-08 16:55:00 +01:00
jouir	4ca94a5dab	Add config_dir option for configuration files location (#466 ) On debian, the configuration files (postgresql.conf, pg_hba.conf, etc) are not stored in the data directory. It would be great to be able to configure the location of this separate directory. Patroni could override existing configuration files where they are used to be. The default is to store configuration files in the data directory. This setting is targeting custom installations like debian and any others moving configuration files out of the data directory. Fixes #465	2017-07-04 16:14:17 +02:00
Ants Aasma	1290b30b84	Introduce starting state and master start timeout. (#295 ) Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind. This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests. * Introduce timeout parameter to restart. When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.	2016-12-08 14:44:27 +01:00
Alexander Kukushkin	b299b12f58	Varios configuration parameters for etcd (#358 ) * Add https and auth support for etcd Also implement support of PATRONI_ETCD_URL and PATRONI_ETCD_SRV environment variables * Implement etcd.proxy etcd.cacert, etcd.cert and etcd.key support Now it should be possible to set up fully encrypted connection to etcd with authorization.	2016-12-06 16:40:21 +01:00
Alexander Kukushkin	ae88e7c96e	Document that every single zookeeper host:port MUST be quoted otherwise yaml library can not parse the list. And make visible yaml exception when trying to parse this list.	2016-06-29 14:25:50 +02:00
Alexander Kukushkin	49efb371f9	Make it possible to work without config.yml Most of the basic configuration could be done via ENV	2016-06-09 14:44:29 +02:00
Alexander Kukushkin	e9be5e8462	Configure exhibitor port via ENV	2016-06-09 11:40:10 +02:00
Alexander Kukushkin	b7d87f7d07	Implement possibility to configure Patroni via environment	2016-06-08 10:15:24 +02:00
Alexander Kukushkin	6700cd0aa6	Implement reload of config.yml with REST API call and acceptance tests for that	2016-05-26 17:09:40 +02:00
Alexander Kukushkin	ceace03646	Address codacy and travis issues	2016-05-25 14:49:33 +02:00
Alexander Kukushkin	7827951c8c	Dynamic configuration	2016-05-25 14:17:05 +02:00

41 Commits