patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-28 02:20:04 +00:00

Author	SHA1	Message	Date
Polina Bungina	7543e64000	Convert states to enums (#3293 ) - Postgresql._state - pg_isready state	2025-03-14 09:53:56 +01:00
Alexander Kukushkin	ce79152088	Take advantage of written_lsn and latest_end_lsn from pg_stat_wal_receiver (#3268 ) The first one if available starting from PostgreSQL v13 and contains the real write LSN. We will prefer it over value returned by pg_last_wal_receive_lsn(), which is in fact flush LSN. The second one is available starting from PostgreSQL v9.6 and points to WAL flush on the source host. In case of primary it will allow to better calculate the replay lag, because values stored in DCS are updated only every loop_wait seconds.	2025-02-17 15:06:36 +01:00
Alexander Kukushkin	6920b3af0e	Cleanup after unit tests (#3277 ) Close https://github.com/patroni/patroni/issues/3276	2025-02-14 13:29:34 +01:00
Polina Bungina	ff278705d6	Partially revert patroni@8c5ab4c (#3180 ) Still check against `postgres --describe-config` if a GUC does not have a validator but is a valid postgres GUC	2024-10-16 11:13:25 +02:00
Alexander Kukushkin	d5d6a51e2c	Make sure inactive hot physical replication slots don't hold xmin (#3148 ) Since `3.2.0` Patroni is able to create physical replication slots on replica nodes just for the case if this node at some moment will become the primary. There are two potential problems of having such slots: 1. They prevent recycling of WAL files. 2. They may affect vacuum on the primary is hot_standby_feedback is enabled. The first class of issues is already addressed by periodically calling pg_replication_slot_advance() function. However the second class of issues doesn't happen instantly, but only when the old primary switched to a replica. In this case physical replication slots that were at some moment activate will hold NOT NULL value of `xmin`, which will be propagated to the primary via hot_standby_feedback mechanism. To address the second problem we will detect that a physical replication slot is not supposed to be active, but having NOT NULL `xmin` and drop/crecreate it. Close #3146 Close #3153 Co-authored-by: Polina Bungina <27892524+hughcapet@users.noreply.github.com>	2024-09-10 08:24:26 +02:00
Polina Bungina	8c5ab4c07d	Improve GUCs validation (#3130 ) Due to postgres --describe-config not showing GUCs defined as GUC_NO_SHOW_ALL \| GUC_NOT_IN_SAMPLE \| GUC_DISALLOW_IN_FILE, Patroni was always ignoring some GUCs that a user might want to have configured with non-default values. - remove postgres --describe-config validation. - define minor versions for availability bounds of some back-patched GUCs	2024-08-23 14:20:16 +02:00
Alexander Kukushkin	93eb4edbe6	Reformat imports with isort (#3123 ) Besides that: 1. Introduce `setup.py isort` for quick check 2. Introduce GH actions to check imports	2024-08-13 17:53:59 +02:00
Alexander Kukushkin	0fa41502f1	Register Citus secondaries in pg_dist_node (#2755 ) 1. All nodes with role == 'replica' and state == 'running' are are registered. In case is state isn't running the node is removed. 2. In case of failover/switchover we always first update the primary 3. When switching to a registered secondary we call citus_update_node() three times: rename primary to primary-demoted, put the primary name to a promoted secondary row and put the promoted secondary name to the primary row State transitions are produced by the transition() method. First of all the method makes sure that the actual primary is registered in the metadata. In case if for a given group the primary didn't change, the method registers new secondaries and removes secondaries that are gone. It prefers to use citus_update_node() UDF to replace gone secondaries with added. Communication protocol between primary nodes remains the same and all old features work without any changes.	2024-08-13 09:12:03 +02:00
Alexander Kukushkin	b458bd992a	Use get_parameter_status() method instead of Connection.info.parameter_status() (#3119 ) The last one is only available since psycopg 2.8, while the first one since 2.0.8. For backward compatibility monkeypatch connection object returned by psycopg3. Close https://github.com/patroni/patroni/issues/3116	2024-08-12 15:17:36 +02:00
Alexander Kukushkin	b1d442e7a4	Advance permanent slots for cascading nodes while in failsafe (#3100 ) Lets consider a following replication setup: ``` primary->standby1->standby2(replicatefrom: standby1) ``` In this case the `primary` will not create a physical replication slot for standby2, because it is streaming from the `standby1`. Things will look differently if we have the following dynamic configuration: ```yaml slots: primary: type: physical standby1: type: physical standby2: type: physical ``` In this case `primary` will also have `standby2` physical replication slot, which periodically must be advanced. So far it was working by taking value of `xlog_location` from the `/members/standby2` key in DCS. But, when DCS is down and failsafe mode is activate, the `standby2` physical slot on the `primary` will not not be moved, because there was not way to get the latest value of `xlog_location`. This PR is addressing the problem by making replica nodes to return their `xlog_location` as `lsn` header in the response on `POST /failsafe` REST API request. The current primary will use these values to advance replication slots for nodes with `replicatefrom` tag.	2024-07-17 16:28:30 +02:00
Alexandre Detiste	dc7ba3fe15	drop dependency on ancient mock (#3074 )	2024-06-12 10:47:18 +02:00
Polina Bungina	bdd02324b4	Add pending restart reason information (#2978 ) Provide info about the PG parameters that caused "pending restart" flag to be set. Both `patronictl list` and `/patroni` REST API endpoint now show the parameters names and the diff as the "pending restart reason".	2024-02-14 08:54:20 +01:00
Alexander Kukushkin	688c85389c	Release v3.2.2 (#3007 ) - update release notes - bump Patroni version - bump pyright version and fix reported issues - improve compatibility with legacy psycopg2 Co-authored-by: Polina Bungina <bungina@gmail.com>	2024-01-17 08:31:08 +01:00
Polina Bungina	266cdc4810	Fixes around pending_restart flag (#3003 ) * Do not set pending_restart flag if hot_standby is set to 'off' during a custom bootstrap (even though we will have this flag actually set in PG, this configuration parameter is irrelevant on primary and there is no actual need for restart) * Skip hot_standby and wal_log_hints when querying parameters pending restart on config reload. They actually can be changed manually (e.g. via ALTER SYSTEM) and it will cause the pending_restart state in PG but Patroni anyway always passes those params to postmaster as command line options. And there they only can have one value - 'on' (except on primary when performing custom bootstrap)	2024-01-16 10:32:28 +01:00
zhjwpku	8acefefc42	Fix Citus bootstrap - CREATE DATABASE cannot be executed from a function (#2994 ) This was introduced by #2990: pod cannot be started and show the following logs: ``` 2023-12-26 03:29:25.569 UTC [47] CONTEXT: SQL statement "CREATE DATABASE "citus"" PL/pgSQL function inline_code_block line 5 at SQL statement 2023-12-26 03:29:25.569 UTC [47] STATEMENT: DO $$ BEGIN PERFORM * FROM pg_catalog.pg_database WHERE datname = 'citus'; IF NOT FOUND THEN CREATE DATABASE "citus"; END IF; END;$$ 2023-12-26 03:29:25,570 ERROR: post_bootstrap Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/bootstrap.py", line 474, in post_bootstrap self._postgresql.citus_handler.bootstrap() File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/mpp/citus.py", line 401, in bootstrap cur.execute(sql.encode('utf-8')) psycopg2.errors.ActiveSqlTransaction: CREATE DATABASE cannot be executed from a function CONTEXT: SQL statement "CREATE DATABASE "citus"" PL/pgSQL function inline_code_block line 5 at SQL statement ``` --------- Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>	2023-12-29 09:01:46 +01:00
Alexander Kukushkin	bcfd8438a5	Abstract CitusHandler and decouple it from configuration (#2950 ) the main issue was that the configuration for Citus handler and for DCS existed in two places, while ideally AbstractDCS should not know many details about what kind of MPP is in use. To solve the problem we first dynamically create an object implementing AbstractMPP interfaces, which is a configuration for DCS. Later this object is used to instantiate the class implementing AbstractMPPHandler interface. This is just a starting point, which does some heavy lifting. As a next steps all kind of variables named after Citus in files different from patroni/postgres/mpp/citus.py should be renamed. In other words this commit takes over the most complex part of #2940, which was never implemented. Co-authored-by: zhjwpku <zhjwpku@gmail.com>	2023-12-21 08:58:26 +01:00
Polina Bungina	efdedc7049	Reload postgres config if a server param was reset (#2975 ) Fix the case when a parameter value was changed and then reset back to the initial value without restart - before this fix, the second change was not reflected in the Postgres config. This commit also includes the related unit test refactoring.	2023-12-06 15:57:05 +01:00
Alexander Kukushkin	d93db20baa	Set citus.local_hostname (#2903 ) There are cases when Citus wants to have a connection to the local postgres. By default it uses `localhost` for that, which is not alwasy available. To solve it we will set `citus.local_hostname` GUC to custom value, which is the same as Patroni uses to connect to Postgres.	2023-10-16 10:21:50 +02:00
Alexander Kukushkin	4c1c804cfd	Read GUC's values when joining running Postgres (#2876 ) If restarted in pause Patroni was discarding `synchronous_standby_names` from `postgresql.conf` because in the internal cache this values was set to `None`. As a result synchronous replication transitioned to a broken state, with no synchronous replicas according to the `synchronous_standby_names` and Patroni not selecting/setting the new synchronous replicas (another bug). To solve the problem of broken initial state and to avoid similar issues with other GUC's we will read GUC's value if Patroni is joining running Postgres.	2023-09-26 10:40:51 +02:00
Alexander Kukushkin	bc15813de0	Permanent physical slots on standby nodes (#2852 ) Create permanent physical replication slots on standby nodes and use `pg_replication_slot_advance()` function to move them forward. The `restart_lsn` is advanced based on values stored in the `/status` key by the primary node. When slot is created on a replica it could be ahead the same slot on the primary and therefore there is some period of time when it doesn't protect WAL files from being recycled.	2023-09-20 16:50:37 +02:00
Alexander Kukushkin	19f20ec2eb	Refactor replication slots handling (#2851 ) 1. make _get_members_slots() method return data in the same format as _get_permanent_slots() method 2. move conflicting name handling from get_replication_slots() to _get_members_slots() method 3. enrich structure returned by get_replication_slots() with the LSN of permanent logical slots reported by primary 4. use the added information in the SlotsHandler instead of fetching it from the Cluster.slots 5. bugfix: don't try to advance logical slot that doesn't match required configuration	2023-09-07 12:56:07 +02:00
Alexander Kukushkin	366829e379	Refactor Connection class (#2815 ) 1. stop using the same cursor all the time, it creates problems when not carefully used from different threads. 2. introduce query() method in the Connection class and make it return a result set when it is possible. 3. refactor most of the code that is relying (directly or indirectly) on the Connection object to use the query() method as much as possible. This refactoring helps with reducing code complexity and will help with future introduction of a separate database connection for the REST API thread. The last one will help to improve reliability when system is under significant stress when simple monitoring queries are taking seconds to execute and the REST API starts blocking the main thread.	2023-08-17 15:42:11 +02:00
Polina Bungina	3734ecc851	Implement generate-config (#2786 ) New patroni.py option that allows to * generate patroni.yml configuration file with the values from a running cluster * generate a sample patroni.yml configuration file	2023-08-09 17:46:53 +02:00
Alexander Kukushkin	d46ca88e6b	Make it visible replication state on standbys (#2733 ) To do that we use `pg_stat_get_wal_receiver()` function, which is available since 9.6. For older versions the `patronictl list` output and REST API responses remain as before. In case if there is no wal receiver process we check if `restore_command` is set and show the state as `in archive recovery`. Example of `patronictl list` output: ```bash $ patronictl list + Cluster: batman -------------+---------+---------------------+----+-----------+ \| Member \| Host \| Role \| State \| TL \| Lag in MB \| +-------------+----------------+---------+---------------------+----+-----------+ \| postgresql0 \| 127.0.0.1:5432 \| Leader \| running \| 12 \| \| \| postgresql1 \| 127.0.0.1:5433 \| Replica \| in archive recovery \| 12 \| 0 \| +-------------+----------------+---------+---------------------+----+-----------+ $ patronictl list + Cluster: batman -------------+---------+-----------+----+-----------+ \| Member \| Host \| Role \| State \| TL \| Lag in MB \| +-------------+----------------+---------+-----------+----+-----------+ \| postgresql0 \| 127.0.0.1:5432 \| Leader \| running \| 12 \| \| \| postgresql1 \| 127.0.0.1:5433 \| Replica \| streaming \| 12 \| 0 \| +-------------+----------------+---------+-----------+----+-----------+ ``` Example of REST API response: ```bash $ curl -s localhost:8009 \| jq . { "state": "running", "postmaster_start_time": "2023-07-06 13:12:00.595118+02:00", "role": "replica", "server_version": 150003, "xlog": { "received_location": 335544480, "replayed_location": 335544480, "replayed_timestamp": null, "paused": false }, "timeline": 12, "replication_state": "in archive recovery", "dcs_last_seen": 1688642069, "database_system_identifier": "7252327498286490579", "patroni": { "version": "3.0.3", "scope": "batman" } } $ curl -s localhost:8009 \| jq . { "state": "running", "postmaster_start_time": "2023-07-06 13:12:00.595118+02:00", "role": "replica", "server_version": 150003, "xlog": { "received_location": 335544816, "replayed_location": 335544816, "replayed_timestamp": null, "paused": false }, "timeline": 12, "replication_state": "streaming", "dcs_last_seen": 1688642089, "database_system_identifier": "7252327498286490579", "patroni": { "version": "3.0.3", "scope": "batman" } } ```	2023-07-13 09:24:20 +02:00
Israel	df18885f20	Extend Postgres GUCs validator (#2671 ) * Use YAML files to validate Postgres GUCs through Patroni. Patroni used to have a static list of Postgres GUCs validators in `patroni.postgresql.validator`. One problem with that approach, for example, is that it would not allow GUCs from custom Postgres builds to be validated/accepted. The idea that we had to work around that issue was to move the validators from the source code to an external and extendable source. With that Patroni will start reading the current validators from that external source plus whatever custom validators are found. From this commit onwards Patroni will read and parse all YAML files that are found under the `patroni/postgresql/available_parameters` directory to build its Postgres GUCs validation rules. All the details about how this work can be found in the docstring of the introduced function `_load_postgres_gucs_validators`.	2023-05-31 13:54:54 +02:00
Alexander Kukushkin	76b3b99de2	Enable pyright strict mode (#2652 ) - added pyrightconfig.json with typeCheckingMode=strict - added type hints to all files except api.py - added type stubs for dns, etcd, consul, kazoo, pysyncobj and other modules - added type stubs for psycopg2 and urllib3 with some little fixes - fixes most of the issues reported by pyright - remaining issues will be addressed later, along with enabling CI linting task	2023-05-09 09:38:00 +02:00
Alexander Kukushkin	4c3af2d1a0	Change master->primary/leader/member (#2541 ) keep as much backward compatibility as possible. Following changes were made: 1. All internal checks are performed as `role in ('master', 'primary')` 2. All internal variables/functions/methods are renamed 3. `GET /metrics` endpoint returns `patroni_primary` in addition to `patroni_master`. 4. Logs are changed to use leader/primary/member/remote depending on the context 5. Unit-tests are using only role = 'primary' instead of 'master' to verify that 1 works. 6. patronictl still supports old syntax, but also accepts `--leader` and `--primary`. 7. `master_(start\|stop)_timeout` is automatically translated to `primary_(start\|stop)_timeout` if the last one is not set. 8. updated the documentation and some examples Future plan: in the next major release switch role name from `master` to `primary` and maybe drop `master` altogether. The Kubernetes implementation will require more work and keep two labels in parallel. Label values should probably be configurable as described in https://github.com/zalando/patroni/issues/2495.	2023-01-27 07:40:24 +01:00
Alexander Kukushkin	4872ac51e0	Citus integration (#2504 ) Citus cluster (coordinator and workers) will be stored in DCS as a fleet of Patroni logically grouped together: ``` /service/batman/ /service/batman/0/ /service/batman/0/initialize /service/batman/0/leader /service/batman/0/members/ /service/batman/0/members/m1 /service/batman/0/members/m2 /service/batman/ /service/batman/1/ /service/batman/1/initialize /service/batman/1/leader /service/batman/1/members/ /service/batman/1/members/m1 /service/batman/1/members/m2 ... ``` Where 0 is a Citus group for coordinator and 1, 2, etc are worker groups. Such hierarchy allows reading the entire Citus cluster with a single call to DCS (except Zookeeper). The get_cluster() method will be reading the entire Citus cluster on the coordinator because it needs to discover workers. For the worker cluster it will be reading the subtree of its own group. Besides that we introduce a new method get_citus_coordinator(). It will be used only by worker clusters. Since there is no hierarchical structures on K8s we will use the citus group suffix on all objects that Patroni creates. E.g. ``` batman-0-leader # the leader config map for the coordinator batman-0-config # the config map holding initialize, config, and history "keys" ... batman-1-leader # the leader config map for worker group 1 batman-1-config ... ``` Citus integration is enabled from patroni.yaml: ```yaml citus: database: citus group: 0 # 0 is for coordinator, 1, 2, etc are for workers ``` If enabled, Patroni will create the database, citus extension in it, and INSERTs INTO `pg_dist_authinfo` information required for Citus nodes to communicate between each other, i.e. 'password', 'sslcert', 'sslkey' for superuser if they are defined in the Patroni configuration file. When the new Citus coordinator/worker is bootstrapped, Patroni adds `synchronous_mode: on` to the `bootstrap.dcs` section. Besides that, Patroni takes over management of some Postgres GUCs: - `shared_preload_libraries` - Patroni ensures that the "citus" is added to the first place - `max_prepared_transactions` - if not set or set to 0, Patroni changes the value to `max_connections*2` - wal_level - automatically set to logical. It is used by Citus to move/split shards. Under the hood Citus is creating/removing replication slots and they are automatically added by Patroni to the `ignore_slots` configuration to avoid accidental removal. The coordinator primary actively discovers worker primary nodes and registers/updates them in the `pg_dist_node` table using citus_add_node() and citus_update_node() functions. Patroni running on the coordinator provides the new REST API endpoint: `POST /citus`. It is used by workers to facilitate controlled switchovers and restarts of worker primaries. When the worker primary needs to shut down Postgres because of restart or switchover, it calls the `POST /citus` endpoint on the coordinator and the Patroni on the coordinator starts a transaction and calls `citus_update_node(nodeid, 'host-demoted', port)` in order to pause client connections that work with the given worker. Once the new leader is elected or postgres started back, they perform another call to the `POST/citus` endpoint, that does another `citus_update_node()` call with actual hostname and port and commits a transaction. After transaction is committed, coordinator reestablishes connections to the worker node and client connections are unblocked. If clients don't run long transaction the operation finishes without client visible errors, but only a short latency spike. All operations on the `pg_dist_node` are serialized by Patroni on the coordinator. It allows to have more control and ROLLBACK transaction in progress if its lifetime exceeding a certain threshold and there are other worker nodes should be updated.	2023-01-24 16:14:58 +01:00
Alexander Kukushkin	3161f31088	Enhanced sync connections check (#2524 ) When `synchronous_standby_names` GUC is changed PostgreSQL nearly immediately starts reporting corresponding walsenders as synchronous, while in fact they maybe didn't reach this state yet. To mitigate this problem we memorize current flush lsn on the primary right after change of `synchronous_standby_names` got visible and use it as an additional check for walsenders. The walsender will be counted as truly "sync" only when write/flush/replay_lsn on it reached memorized LSN and the `application_name` is known to be a part of `synchronous_standby_names`. The size of PR mostly related to refactoring and moving the code responsible for working with `synchronous_standby_names` and `pg_stat_replication` to the dedicated file. And `parse_sync_standby_names()` function was mostly copied from #672.	2023-01-24 15:05:54 +01:00
Alexander Kukushkin	1e208736f8	Refactor drop_replication_slot() and _drop_incorrect_slots() (#2534 ) Use CTE to avoid running the second query if pg_drop_replication_slot() failed	2023-01-23 16:46:07 +01:00
Alexander Kukushkin	2ea0357854	DCS failsafe mode (#2379 ) If enabled it will allow Patroni to cope with DCS outages. In case of a DCS outage the leader tries to call all remaining members in the cluster via API and if all of them respond with success the leader will not be demoted. The failsafe_mode could be enabled by running ```sh patronictl edit-config -s failsafe_mode=true ``` or by calling the `/config` REST API endpoint. Co-authored-by: Polina Bungina <bungina@gmail.com>	2023-01-13 13:35:05 +01:00
Alexander Kukushkin	c12fe4146d	Run only one query per HA loop (#2516 ) If the cluster is stable (no nodes are joining/leaving/lagging) we want to run at most one monitor query per every HA loop. So far it worker perfectly except when synchronous_mode is enabled, where we run two additional queries: 1. SHOW synchronous_mode 2. SELECT ... FROM pg_stat_replication In order to solve it, we will include these "queries" to the common monitoring query is synchronous_mode is enabled. In addition to that make sure that `synchronous_standby_names` is reset on replicas that used to be a primary and avoid using replicas which are not in the 'running' state. P.S.: in the monitoring query we also extract the current value of synchronous_standby_names, because it will be useful for the quorum commit feature. Close https://github.com/zalando/patroni/issues/2469	2023-01-10 10:44:17 +01:00
Alexander Kukushkin	55e1549341	Do not rely on 'role' value when checking other nodes via REST API (#2503 ) When doing the leader race we need to check that the former primary isn't alive anymore. For that we relied on non-inclusive terms. In order to simplify future work on getting rid from all non-inclusive words we change the check to rely on a difference in format of wal/xlog field. There is only "location" for the primary and "replayed_location" + "received_location" for standbys. In addition to that we start supporting "wal" field as well as deprecated "xlog". Co-authored-by: Polina Bungina <bungina@gmail.com>	2022-12-29 09:13:09 +01:00
Alexander Kukushkin	4d77b444dc	Enforce search_path=pg_catalog for non-replication connections (#2496 ) There is a known [vector of attact](https://pganalyze.com/blog/5mins-postgres-security-patch-releases-pgspot-pghostile) by creating functions and/or operators in a public scheme with the same name and signature as corresponding objects in `pg_catalog`. Since Patroni is heavily relying on superuser connections we want to mitigate it by enforcing `search_path=pg_catalog` for all connections created by Patroni (except replication connections). It is achieved by introducing a new function, that wraps psycopg.connect() and appends ` -c search_path=pg_catalog` to `options` parameter. In addition to that, we set connection.autocommit to True before returning it.	2022-12-20 09:56:14 +01:00
Alexander Kukushkin	8f8e9c9b81	Inptroduce postgresql.proxy_address (#2437 ) It will be written to member key in DCS as the `proxy_url` and could be used/useful for service discovery.	2022-10-24 10:23:06 +02:00
Alexander Kukushkin	b901e62ad0	Enhanced checks of replica logical slots safety (#2285 ) The logical slot on a replica is safe to use when the physical replica slot on the primary: 1. has a nonzero/non-null `catalog_xmin` 2. has a `catalog_xmin` that is not newer (greater) than the `catalog_xmin` of any slot on the standby 3. the `catalog_xmin` is known to overtake `catalog_xmin` of logical slots on the primary observed during `1` In case if `1` doesn't take place, Patroni will run an additional check whether the `hot_standby_feedback` is actually in effect and shows the warning in case it is not.	2022-05-10 12:24:47 +02:00
Alexander Kukushkin	5f6197aaad	Don't copy logical slot if there is mismatch with the config (#2274 ) A couple of times we have seen in the wild that the database for the permanent logical slots was changed in the Patroni config. It resulted in the below situation. On the primary: 1. The slot must be dropped before creating it in a different DB. 2. Patroni fails to drop it because the slot is in use. Replica: 1. Patroni notice that the slot exists in the wrong DB and successfully dropping it. 2. Patroni copying the existing slot from the primary by its name with Postgres restart. And the loop repeats while the "wrong" slot exists on the primary. Basically, replicas are continuously restarting, which badly affects availability. In order to solve the problem, we will perform additional checks while copying replication slot files from the primary and discard them if `slot_type`, `database`, or `plugin` don't match our expectations.	2022-04-14 12:10:37 +02:00
Alexander Kukushkin	333d41d9f0	Release 2.1.3 (#2219 ) * Implement missing unit-tests * Bump version * Update release notes	2022-02-18 14:16:15 +01:00
Alexander Kukushkin	fce889cd04	Compatibility with psycopg 3.0 (#2088 ) By default `psycopg2` is preferred. The `psycopg>=3.0` will be used only if `psycopg2` is not available or its version is too old.	2021-11-19 14:32:54 +01:00
Alexander Kukushkin	c7173aadd7	Failover logical slots (#1820 ) Effectively, this PR consists of a few changes: 1. The easy part: In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots. In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster. 2. The tricky part: On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()` function. When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward. 3. Additional requirements: In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots. 4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use. Compatibility. The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary. The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed. Close: https://github.com/zalando/patroni/issues/1749	2021-03-25 16:18:23 +01:00
Mark Mercado	09f2f579d7	Quick attempt at Prometheus (#1848 ) Close https://github.com/zalando/patroni/issues/318	2021-03-04 12:37:29 +01:00
Alexander Kukushkin	13e24d832d	Advanced validation of PostgreSQL parameters (#1674 ) So far Patroni was performing a comparison of the old value (in the `pg_settings`) with the new value (from Patroni configuration or from DCS) in order to figure out if reload or restart is required when the parameter has been changed. If the given parameter was missing in the `pg_settings` Patroni was ignoring it and not writing into the `postgresql.conf`. In case if Postgres is not running, no validation has been performed and parameters and values were written into the config as it is. It is not a very common mistake, but people tend to mistype parameter names or values. Also, it happens that some parameters are removed in specific Postgres versions and some new are added (e.g. `checkpoint_segments` replaced with `min_wal_size` and `max_wal_size` in 9.5 or` wal_keep_segments` was replaced with `wal_keep_size` in 13). Writing nonexistent parameters or invalid values into the `postgresql.conf` makes postgres unstartable. This change doesn't solve the issue 100%, but at least approaching this goal very close.	2020-09-01 16:26:57 +02:00
Alexander Kukushkin	3341c898ff	Add Etcd v3 protocol support via api gRPC-gateway (#1162 ) The only python-etcd3 client working directly via gRPC still supports only a single endpoint, which is not very nice for high-availability. Since Patroni is already using a heavily hacked version of python-etcd with smart retries and auto-discovery out-of-the-box, I decided to enhance the existing code with limited support of v3 protocol via gRPC-gateway. Unfortunately, watches via gRPC-gateway requires us to open and keep the second connection to the etcd. Known limitations: * The very minimal supported version is 3.0.4. On earlier versions transactions don't work due to bugs in grpc-gateway. Without transactions we can't do atomic operations, i.e. leader locks. * Watches work only starting from 3.1.0 * Authentication works only starting from 3.3.0 * gRPC-gateway does not support authentication using TLS Common Name. This is because gRPC-proxy terminates TLS from its client so all the clients share a cert of the proxy: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/authentication.md#using-tls-common-name	2020-07-31 14:33:40 +02:00
Alexander Kukushkin	a68692a3e4	Get rid of kubernetes python module (#1586 ) The official python kubernetes client contains a lot of auto-generated code and therefore very heavy, but we need only a little fraction of it. The naive implementation, that covers all API methods we use, takes about 250 LoC, and about half of it is responsible for the handling of configuration files. Disadvantage: If somebody was using the `patronictl` outside of the pod (on his machine), it might not work anymore (depending on the environment).	2020-07-17 08:31:58 +02:00
ksarabu1	8a62999eaa	replica & async rest API health check enhancement (#1599 ) - ``GET /replica?lag=<max-lag>``: replica check endpoint. - ``GET /asynchronous?lag=<max-lag>`` or ``GET /async&lag=<max-lag>``: asynchronous standby check endpoint. Checks replication latency and returns status code 200 only when the latency is below a specified value. The key leader_optime from DCS is used for the leader WAL position and compute latency on the replica for performance reasons. Please note that the value in leader_optime might be a couple of seconds old (based on loop_wait). Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>	2020-07-15 10:36:48 +02:00
Alexander Kukushkin	98c2081c67	Detect a new timeline in the standby cluster (#1522 ) The standby cluster doesn't know about leader elections in the main cluster and therefore the usual mechanisms of detecting divergences don't work. For example, it could happen that the standby cluster is ahead of the new primary of the main cluster and must be rewound. There is a way to know that the new timeline has been created by checking the presence of a history file in pg_wal. If the new file is there, we will start usual procedures of making sure that we can continue streaming or will run the pg_rewind.	2020-05-29 14:14:47 +02:00
Alexander Kukushkin	6a0d2924a0	Separate received and replayed location (#1514 ) When making a decision whether the running replica is able to stream from the new primary or must be rewound we should use replayed location, therefore we extract received and replayed independently. Reuse the part of the query that extracts the timeline and locations in the REST API.	2020-05-27 13:33:37 +02:00
Alexander Kukushkin	ad5c686c11	Take advantage of pg_stat_wal_recevier (#1513 ) So far Patroni was parsing `recovery.conf` or querying `pg_settings` in order to get the current values of recovery parameters. On PostgreSQL earlier than 12 it could easily happen that the value of `primary_conninfo` in the `recovery.conf` has nothing to do with reality. Luckily for us, on PostgreSQL 9.6+ there is a `pg_stat_wal_receiver` view, which contains current values of `primary_conninfo` and `primary_slot_name`. The password field is masked through, but this is fine, because authentication happens only during opening the connection. All other parameters we compare as usual. Another advantage of `pg_stat_wal_recevier` - it contains the current timeline, therefore on 9.6+ we don't need to use the replication connection trick if walreceiver process is alive. If there is no walreceiver process available or it is not streaming we will stick to old methods.	2020-05-15 18:04:24 +02:00
Alexander Kukushkin	30aa355eb5	Shorten and beautify history log output (#1526 ) when Patroni is trying to figure out the necessity of pg_rewind it could write the content history file from the primary into the log. The history file is growing with every failover/switchover and eventually starts taking too many lines in the log, most of them are not so much useful. Instead of showing the raw data, we will show only 3 lines before the current replica timeline and 2 lines after.	2020-05-15 16:14:25 +02:00
Alexander Kukushkin	337f9efc9e	Improve patronictl list output (#1486 ) The redundant column `Column` will be presented in the table header. Depending on output format `Tags` are serialized differently: * For pretty format YAML is used, every element on the new line * For tsv format for YAML is also used, but all elements and on the same line (similar to JSON) * For json and yaml formats `Tags` are serialized into an appropriate format. <details><summary>Examples of output in pretty formats:</summary> ```bash $ patronictl list + Cluster: batman (6813309862653668387) +---------+----+-----------+---------------------+ \| Member \| Host \| Role \| State \| TL \| Lag in MB \| Tags \| +-------------+----------------+--------+---------+----+-----------+---------------------+ \| postgresql0 \| 127.0.0.1:5432 \| Leader \| running \| 3 \| \| clonefrom: true \| \| \| \| \| \| \| \| noloadbalance: true \| \| \| \| \| \| \| \| nosync: true \| +-------------+----------------+--------+---------+----+-----------+---------------------+ \| postgresql1 \| 127.0.0.1:5433 \| \| running \| 3 \| 0.0 \| \| +-------------+----------------+--------+---------+----+-----------+---------------------+ $ patronictl list badclustername + Cluster: badclustername (uninitialized) ------+ \| Member \| Host \| Role \| State \| TL \| Lag in MB \| +--------+------+------+-------+----+-----------+ +--------+------+------+-------+----+-----------+ ``` </details> <details><summary>Example in tsv format:</summary> ```bash Cluster Member Host Role State TL Lag in MB Pending restart Tags batman postgresql0 127.0.0.1:5432 Leader running 2 batman postgresql1 127.0.0.1:5433 running 2 0 {clonefrom: true, nofailover: true, noloadbalance: true, replicatefrom: postgresql0} batman postgresql2 127.0.0.1:5434 running 2 0 * {replicatefrom: postgres1} ``` </details> In addition to that, `patronictl list` command will stop showing keys with empty values in `json` and `yaml` formats. <details><summary>Examples:</summary> ```yaml $ patronictl list -f yaml - Cluster: batman Host: 127.0.0.1:5432 Member: postgresql0 Role: Leader State: running TL: 2 - Cluster: batman Host: 127.0.0.1:5433 Lag in MB: 0 Member: postgresql1 State: running TL: 2 Tags: clonefrom: true nofailover: true noloadbalance: true replicatefrom: postgresql0 - Cluster: batman Host: 127.0.0.1:5434 Lag in MB: 0 Member: postgresql2 Pending restart: '' State: running TL: 2 Tags: replicatefrom: postgres1 ``` ```json $ patronictl list -f json \| jq . [ { "Cluster": "batman", "Member": "postgresql0", "Host": "127.0.0.1:5432", "Role": "Leader", "State": "running", "TL": 2 }, { "Cluster": "batman", "Member": "postgresql1", "Host": "127.0.0.1:5433", "State": "running", "TL": 2, "Lag in MB": 0, "Tags": { "nofailover": true, "noloadbalance": true, "replicatefrom": "postgresql0", "clonefrom": true } }, { "Cluster": "batman", "Member": "postgresql2", "Host": "127.0.0.1:5434", "State": "running", "TL": 2, "Lag in MB": 0, "Pending restart": "", "Tags": { "replicatefrom": "postgres1" } } ] ``` </details>	2020-04-15 12:19:18 +02:00

1 2

61 Commits