patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-28 10:20:05 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	a8cfd46801	Retry one time on Etcd3 auth error (#3026 ) But do it only in case if we didn't authenticate right before executing a request. Previously retries only happened when the caller was executed with `Retry.__call__()`, which is not the case for methods like `set_failover_value()` or `set_config_value()`. Also, it seems that existing watchers aren't affected, therefore we will not restart them after reauthentication. In addition to that fix issues with `Retry.ensure_deadline(0)`: 1. the return value was ignored 2. we don't have to set `Retry.deadline` attr, it is not used anywhere Close https://github.com/zalando/patroni/issues/3023	2024-03-07 12:01:35 +01:00
Alexander Kukushkin	bcfd8438a5	Abstract CitusHandler and decouple it from configuration (#2950 ) the main issue was that the configuration for Citus handler and for DCS existed in two places, while ideally AbstractDCS should not know many details about what kind of MPP is in use. To solve the problem we first dynamically create an object implementing AbstractMPP interfaces, which is a configuration for DCS. Later this object is used to instantiate the class implementing AbstractMPPHandler interface. This is just a starting point, which does some heavy lifting. As a next steps all kind of variables named after Citus in files different from patroni/postgres/mpp/citus.py should be renamed. In other words this commit takes over the most complex part of #2940, which was never implemented. Co-authored-by: zhjwpku <zhjwpku@gmail.com>	2023-12-21 08:58:26 +01:00
Alexander Kukushkin	9209a5a133	Refactor delete_leader interface (#2810 ) similar to https://github.com/zalando/patroni/pull/2690, but it helps mostly Consul implementation.	2023-08-11 10:19:29 +02:00
Alexander Kukushkin	af8e5f0d0f	Refactor update_leader interface (#2690 ) pass reference to a last known leader object in order to avoid obtaining it from the `AbstractDCS.cluster` cache. This change is useful for Consul, Etcd3 and Zookeeper implementations.	2023-05-25 14:21:05 +02:00
Alexander Kukushkin	7941c86775	Refactor write_sync_state() (#2669 ) Make it return the new `SyncState` object in order to avoid reading the new cluster state in the Ha.process_sync_replication(). Now it is a small optimization, but it will become very handy in the quorum commit feature.	2023-05-11 09:58:15 +02:00
Alexander Kukushkin	76b3b99de2	Enable pyright strict mode (#2652 ) - added pyrightconfig.json with typeCheckingMode=strict - added type hints to all files except api.py - added type stubs for dns, etcd, consul, kazoo, pysyncobj and other modules - added type stubs for psycopg2 and urllib3 with some little fixes - fixes most of the issues reported by pyright - remaining issues will be addressed later, along with enabling CI linting task	2023-05-09 09:38:00 +02:00
Polina Bungina	3fe2a7868a	Ignore D401 in flake8-docstrings (#2627 ) * Ignore D401 in flake8-docstrings * Fix newly reported flake8 issues, ignore the old W503 rule * rely on concatenation of adjecent strings * Format behave scripts * Reformat ha.py according to new rules Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>	2023-04-03 09:52:22 +02:00
Alexander Kukushkin	4c3af2d1a0	Change master->primary/leader/member (#2541 ) keep as much backward compatibility as possible. Following changes were made: 1. All internal checks are performed as `role in ('master', 'primary')` 2. All internal variables/functions/methods are renamed 3. `GET /metrics` endpoint returns `patroni_primary` in addition to `patroni_master`. 4. Logs are changed to use leader/primary/member/remote depending on the context 5. Unit-tests are using only role = 'primary' instead of 'master' to verify that 1 works. 6. patronictl still supports old syntax, but also accepts `--leader` and `--primary`. 7. `master_(start\|stop)_timeout` is automatically translated to `primary_(start\|stop)_timeout` if the last one is not set. 8. updated the documentation and some examples Future plan: in the next major release switch role name from `master` to `primary` and maybe drop `master` altogether. The Kubernetes implementation will require more work and keep two labels in parallel. Label values should probably be configurable as described in https://github.com/zalando/patroni/issues/2495.	2023-01-27 07:40:24 +01:00
Alexander Kukushkin	4872ac51e0	Citus integration (#2504 ) Citus cluster (coordinator and workers) will be stored in DCS as a fleet of Patroni logically grouped together: ``` /service/batman/ /service/batman/0/ /service/batman/0/initialize /service/batman/0/leader /service/batman/0/members/ /service/batman/0/members/m1 /service/batman/0/members/m2 /service/batman/ /service/batman/1/ /service/batman/1/initialize /service/batman/1/leader /service/batman/1/members/ /service/batman/1/members/m1 /service/batman/1/members/m2 ... ``` Where 0 is a Citus group for coordinator and 1, 2, etc are worker groups. Such hierarchy allows reading the entire Citus cluster with a single call to DCS (except Zookeeper). The get_cluster() method will be reading the entire Citus cluster on the coordinator because it needs to discover workers. For the worker cluster it will be reading the subtree of its own group. Besides that we introduce a new method get_citus_coordinator(). It will be used only by worker clusters. Since there is no hierarchical structures on K8s we will use the citus group suffix on all objects that Patroni creates. E.g. ``` batman-0-leader # the leader config map for the coordinator batman-0-config # the config map holding initialize, config, and history "keys" ... batman-1-leader # the leader config map for worker group 1 batman-1-config ... ``` Citus integration is enabled from patroni.yaml: ```yaml citus: database: citus group: 0 # 0 is for coordinator, 1, 2, etc are for workers ``` If enabled, Patroni will create the database, citus extension in it, and INSERTs INTO `pg_dist_authinfo` information required for Citus nodes to communicate between each other, i.e. 'password', 'sslcert', 'sslkey' for superuser if they are defined in the Patroni configuration file. When the new Citus coordinator/worker is bootstrapped, Patroni adds `synchronous_mode: on` to the `bootstrap.dcs` section. Besides that, Patroni takes over management of some Postgres GUCs: - `shared_preload_libraries` - Patroni ensures that the "citus" is added to the first place - `max_prepared_transactions` - if not set or set to 0, Patroni changes the value to `max_connections*2` - wal_level - automatically set to logical. It is used by Citus to move/split shards. Under the hood Citus is creating/removing replication slots and they are automatically added by Patroni to the `ignore_slots` configuration to avoid accidental removal. The coordinator primary actively discovers worker primary nodes and registers/updates them in the `pg_dist_node` table using citus_add_node() and citus_update_node() functions. Patroni running on the coordinator provides the new REST API endpoint: `POST /citus`. It is used by workers to facilitate controlled switchovers and restarts of worker primaries. When the worker primary needs to shut down Postgres because of restart or switchover, it calls the `POST /citus` endpoint on the coordinator and the Patroni on the coordinator starts a transaction and calls `citus_update_node(nodeid, 'host-demoted', port)` in order to pause client connections that work with the given worker. Once the new leader is elected or postgres started back, they perform another call to the `POST/citus` endpoint, that does another `citus_update_node()` call with actual hostname and port and commits a transaction. After transaction is committed, coordinator reestablishes connections to the worker node and client connections are unblocked. If clients don't run long transaction the operation finishes without client visible errors, but only a short latency spike. All operations on the `pg_dist_node` are serialized by Patroni on the coordinator. It allows to have more control and ROLLBACK transaction in progress if its lifetime exceeding a certain threshold and there are other worker nodes should be updated.	2023-01-24 16:14:58 +01:00
Alexander Kukushkin	92d3e1c167	Introduce the failsafe key in DCS (#2485 ) Extracted from #2379	2022-12-13 11:35:06 +01:00
Alexander Kukushkin	6ad5fee99d	Raise DCSError when communication with DCS fails (#2484 ) Previously such an exception was raised only from the `get_cluster()` method, and now we will to do the same from the `update_leader()` and `attempt_to_acquire_leader()` methods. These methods influence Postgres promotion and demotion and we want to make a difference between different types of failures. Specifically, if calls have failed because DCS isn't accessible or due to a timeout. This commit is extracted from the #2379	2022-12-13 11:06:55 +01:00
Alexander Kukushkin	816b66311b	A small fix in unit tests (#2427 ) Not all external resources were properly mocked	2022-10-13 10:53:13 +02:00
Alexander Kukushkin	fb06af9adb	Release 2.1.4 (#2322 ) - bump version - update release notes - implement missing unit-tests	2022-06-01 16:00:56 +02:00
Alexander Kukushkin	dc9ff4cb8a	Release 2.1.2 (#2136 ) * Implement missing unit-tests * Bump version * Update release notes	2021-12-03 15:49:57 +01:00
Alexander Kukushkin	93efa91bbd	Release 2.1.1 (#2039 ) * Update release notes * Bump version * Improve unit-test coverage	2021-08-19 15:44:37 +02:00
Tommy Li	ed0e308b9b	Support dynamically registering/deregistering as a consul service and changing tags (#1993 ) Close #1988	2021-08-17 16:38:10 +02:00
Alexander Kukushkin	c7173aadd7	Failover logical slots (#1820 ) Effectively, this PR consists of a few changes: 1. The easy part: In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots. In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster. 2. The tricky part: On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()` function. When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward. 3. Additional requirements: In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots. 4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use. Compatibility. The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary. The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed. Close: https://github.com/zalando/patroni/issues/1749	2021-03-25 16:18:23 +01:00
Alexander Kukushkin	a9f86aa195	Add compatibility with python-consul2 (#1812 ) the good old python-consul is not maintained for a few years in a row, therefore someone forked under a different name, but package files are installed into the same location as for the old. The API of both modules is mostly compatible therefore it wasn't hard to add the support of both modules in Patroni. Taking into account that python-consul is not a direct requirement for Patroni, but extra, now the end-user has a choice what to install. Close https://github.com/zalando/patroni/issues/1810	2021-01-15 14:30:48 +01:00
Alexander Kukushkin	c6207933d1	Properly handle the exception raised from refresh_session (#1531 ) The `touch_member()` could be called from the finally block of the `_run_cycle()`. In case if it raised an exception the whole Patroni process was crashing. In order to avoid future crashes we wrap `_run_cycle()` into the try..except block and ask a user to report a BUG. Close https://github.com/zalando/patroni/issues/1529	2020-05-29 14:14:11 +02:00
Alexander Kukushkin	278bf9852b	Release 1.6.0 (#1131 ) * Implement missing tests and do a few minor fixes * Bump version to 1.6.0 * Update release notes	2019-08-05 15:08:04 +02:00
Alexander Kukushkin	a4bd6a9b4b	Refactor postgresql class (#1060 ) * Convert postgresql.py into a package * Factor out cancellable process into a separate class * Factor out connection handler into a separate class * Move postmaster into postgresql package * Factor out pg_rewind into a separate class * Factor out bootstrap into a separate class * Factor out slots handler into a separate class * Factor out postgresql config handler into a separate class * Move callback_executor into postgresql package This is just a careful refactoring, without code changes.	2019-05-21 16:02:47 +02:00
Alexander Kukushkin	680444ae13	Reduce lock time taken by dcs.get_cluster() (#989 ) `dcs.cluster` and `dcs.get_cluster()` are using the same lock resource and therefore when get_cluster call is slow due to the slowness of DCS it was also affecting the `dcs.cluster` call, which in return was making health-check requests slow.	2019-03-12 22:37:11 +01:00
Alexander Kukushkin	f1d7ccf36e	Make sure we refresh session at least once per HA loop (#880 ) Fixes https://github.com/zalando/patroni/issues/879	2018-12-03 16:35:14 +01:00
Pavel Kirillov	2e9cb412e4	Register service in consul (#802 ) Кegister service 'scope_name' with tag 'master' or 'replica' example with scope 'pgsql-pgpi' ```[root@pgpi1 ~]# host -t SRV pgsql-pgpi.service.consul. 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi1.node.dc.consul. pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi2.node.dc.consul. [root@pgpi1 ~]# host -t SRV master.pgsql-pgpi.service.consul. 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: master.pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi2.node.dc.consul. [root@pgpi1 ~]# host -t SRV replica.pgsql-pgpi.service.consul. 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: replica.pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi1.node.dc.consul.``` Fixes: https://github.com/zalando/patroni/issues/771	2018-09-07 15:17:56 +02:00
Alexander Kukushkin	4ca8a6e506	Make retries of calls to DCS consistent across implementations (#805 ) in addition to that do a small refactoring of zookeeper and consul and try to improve the stability of AT	2018-09-06 08:37:26 +02:00
Alexander Kukushkin	87e9aab04c	Improve tests (#778 ) * Implement missing unit-tests * Add acceptance tests for ISSUE #776 * Update list of classifiers, keywords and authors	2018-08-29 11:29:37 +02:00
Alexander Kukushkin	03c2a85d23	Expose current timeline in DCS and via API (#591 ) It is very easy to get current timeline on the master by executing ```sql SELECT ('x' \|\| SUBSTR(pg_walfile_name(pg_current_wal_lsn()), 1, 8))::bit(32)::int ``` Unfortunately the same method doesn't work when postgres is_in_recovery. Therefore we will use replication connection for that on the replicas. In order to avoid opening and closing replication connection on every HA loop we will cache the result if its value matches with the timeline of the master. Also this PR introduces a new key in DCS: `/history`. It will contain a json serialized object with timeline history in a format similar to the usual history files. The differences are: * Second column is the absolute wal position in bytes, instead of LSN * Optionally there might be a fourth column - timestamp, (mtime of history file)	2018-01-05 15:25:56 +01:00
Alexander Kukushkin	4328c15010	Make Patroni Kubernetes native (#500 ) * Use ConfigMaps or Endpoins for leader elections and to keep cluster state * Label pods with a postgres role * change behavior of pip install. From now on it will not install all dependencies, you have to specify explicitly DCS you want to use Patroni with: `pip install patroni[etcd,zookeeper,kubernetes]`	2017-12-08 16:55:00 +01:00
V Aitvaras	ad7a1b8a16	Make it possible to provide datacenter configuration for Consul (#558 ) ```yaml consul: url: http://consul.host:8500 token: long-token-here dc: dev1-d1 ```	2017-11-06 16:44:30 +01:00
Alexander Kukushkin	8d926cbc86	Always send token in X-Consul-Token http header (#555 ) Fixes https://github.com/zalando/patroni/issues/552	2017-11-03 16:22:07 +01:00
Alexander Kukushkin	823a4d6b8e	Adjust session ttl if supplied value is smaller than minimum possible (#556 ) It could happen that ttl provided in Patroni configuration is smaller than minimum supported by Consul. In such case Consul agent fails to create a new session and responds with 500 Internal Server Error and http body contains something like: "Invalid Session TTL '3000000000', must be between [10s=24h0m0s]". Without session Patroni is not able to create member and leader keys in the Consul KV store and it means that cluster becomes completely unhealthy. As a workaround we will handle such exception, adjust ttl to the minimum possible and retry session creation. In addition to that make it possible to define custom log format via environment variable `PATRONI_LOGFORMAT`	2017-11-03 16:21:53 +01:00
Alexander Kukushkin	3919b322f4	Release 1.3.4 (#515 ) Fix documentation and update release notes	2017-09-08 10:56:09 +02:00
Alexander Kukushkin	5ef01cfdfa	Advanced configuration for Consul (#506 ) * possibility to specify client certs and cacert * possibility to specify token * compatibility with python-consul-0.7.1	2017-08-24 07:56:12 +02:00
Alexander Kukushkin	038b5aed72	Improve leader watch functionality (#356 ) Previously replicas were always watching for leader key (even if the postgres was not in the running there). It was not a big issue, but it was not possible to interrupt such watch in cases if the postgres started up or stopped successfully. Also it was delaying update_member call and we had kind of stale information in DCS up to `loop_wait` seconds. This commit changes such behavior. If the async_executor is busy by starting/stopping or restarting postgres we will not watch for leader key but waiting for event from async_executor up to `loop_wait` seconds. Async executor will fire such event only in case if the function it was calling returned something what could be evaluated to boolean True. Such functionality is really needed to change the way how we are making decision about necessity of pg_rewind. It will require to have a local postgres running and for us it is really important to get such notification as soon as possible.	2016-11-22 16:22:30 +01:00
Alexander Kukushkin	37b020e7a3	Various bugfixes and improvements: (#346 ) * Replace pytz.UTC with dateutil.tz.tzutc, it helps to reduce memory by more than 4Mb... * fix check of python version: 0x0300000 => 0x3000000 * Update leader key before restart and demote	2016-11-04 18:42:56 +02:00
Ants Aasma	7e53a604d4	Add synchronous replication support. (#314 ) Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions. To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over. We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions. On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS. When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock. Added acceptance tests and documentation. Implementation by @ants with extensive review by @CyberDem0n.	2016-10-19 16:12:51 +02:00
Alexander Kukushkin	1e573aec8f	Do session/renew call to Consul when update_leader is called (#336 )	2016-10-10 10:05:55 +02:00
Alexander Kukushkin	e38dfaf1ba	Call touch_member at the end of HA loop (#321 ) To make sure that we have up-to-date state of member in DCS after HA loop has changed something.	2016-09-27 16:25:11 +02:00
Alexander Kukushkin	298357c099	Implement retry and timeout strategy for consul (#305 ) ..the same way as for etcd Change HTTPClient implementation from using `requests.session` to `urllib3.PoolManager`, because reference implementation from python-consul didn't really worked with timeouts and was blocking HA loop...	2016-09-27 16:24:30 +02:00
Alexander Kukushkin	5265e71fc2	Don't write leader optime into DCS if it didn't changed (#319 )	2016-09-21 14:22:22 +02:00
Alexander Kukushkin	c8b5003b86	Set __do_not_watch flag when ttl needs to be changed it's more readable comparing to `reset_cluster`	2016-06-01 13:41:49 +02:00
Alexander Kukushkin	b3ada161cf	Implement possibility to configure `retry_timeout` globally Previously it was hardcoded all over the place.	2016-05-31 10:30:53 +02:00
Alexander Kukushkin	45cbc8ca70	Implement acceptance test for dynamic configuration functionality and fix some bugs revealed by acceptance tests	2016-05-26 10:16:24 +02:00
Alexander Kukushkin	7827951c8c	Dynamic configuration	2016-05-25 14:17:05 +02:00
Alexander Kukushkin	0c2aad98a3	Move dcs implementations into dcs package	2016-05-19 10:57:18 +02:00
Alexander Kukushkin	eabfd82a5d	Implement Consul support	2016-04-27 10:59:01 +02:00

46 Commits