patroni

mirror of https://github.com/optim-enterprises-bv/patroni.git synced 2026-01-11 18:35:15 +00:00

Author	SHA1	Message	Date
Kian-Meng Ang	4ce0f99cfb	Fix typos (#3204 ) Found via `codespell -H` and `typos --hidden --format brief`	2024-11-12 10:06:53 +01:00
Alexander Kukushkin	efba02f52e	Make sure only supported parameters are written to connection string (#3207 ) Close #3206	2024-11-12 09:24:30 +01:00
Polina Bungina	ff278705d6	Partially revert patroni@8c5ab4c (#3180 ) Still check against `postgres --describe-config` if a GUC does not have a validator but is a valid postgres GUC	2024-10-16 11:13:25 +02:00
Alexander Kukushkin	e91e6b5484	Add support of sslnegotiation client-side connection option (#3173 ) It is available in PostgreSQL 17 Besides that, enable PG17 in behave tests and include PG17 to supported versions in docs.	2024-09-27 11:27:09 +02:00
Alexander Kukushkin	bfa9b0ca4b	Fix flake8 for tests directory (#3168 ) Followup on #3123	2024-09-16 17:20:00 +02:00
Alexander Kukushkin	2f800173a5	Handle exception from iterdir while discovering static files (#3152 ) Close https://github.com/patroni/patroni/issues/3151	2024-09-09 15:03:20 +02:00
Alexander Kukushkin	835d93951d	Add line with localhost to pgpass when unix sockets are detected (#3139 ) There are two cases when libpq may search for "localhost": 1. When host in the connection string is not specified and it is using default socket directory path. 2. When specified host matches default socket directory path. Since we don't know the value of default socket directory path and effectively can't detect the case 2, the best strategy to mitigate the problem would be to add "localhost" if we detected a "host" be a unix socket directory (it starts with '/' character). Close #3134	2024-08-27 13:39:03 +02:00
Polina Bungina	8c5ab4c07d	Improve GUCs validation (#3130 ) Due to postgres --describe-config not showing GUCs defined as GUC_NO_SHOW_ALL \| GUC_NOT_IN_SAMPLE \| GUC_DISALLOW_IN_FILE, Patroni was always ignoring some GUCs that a user might want to have configured with non-default values. - remove postgres --describe-config validation. - define minor versions for availability bounds of some back-patched GUCs	2024-08-23 14:20:16 +02:00
Alexander Kukushkin	93eb4edbe6	Reformat imports with isort (#3123 ) Besides that: 1. Introduce `setup.py isort` for quick check 2. Introduce GH actions to check imports	2024-08-13 17:53:59 +02:00
Alexandre Detiste	dc7ba3fe15	drop dependency on ancient mock (#3074 )	2024-06-12 10:47:18 +02:00
Alexander Kukushkin	1ed207cbf0	Compatibility with 17-beta1 (#3076 ) - updated list of GUCs - updated regex for filtering backend processes by name - `primary_conninfo` will contain `dbname` parameter The last one is required for synchronizing logical replication slots by slotsync worker and doesn't create problems on older versions.	2024-06-12 10:29:52 +02:00
Alexander Kukushkin	d7454f7bcd	Use `target_session_attrs` only when multiple hosts in standby_cluster (#3040 ) Actually comment in the code was already saying that, but on practice it didn't happen. It should help #3039	2024-04-02 11:59:57 +02:00
Waynerv	ceb2965ab8	Use importlib_resources to read validators file (#3018 ) When packaged into pyz (zip file), resources are not directly available on filesystem and therefore we can't always rely on os.listdir() and open() to enumerate and read them. We are going to use importlib.resources() to solve this problem, except python 3.8 and older, where there is no function (files()) available to enumerate resources. For legacy (3.8 actually becomes EOL in October 2024) python versions we are going to use os.listdir() as a fallback. Close #3017	2024-04-02 09:00:37 +02:00
Polina Bungina	bdd02324b4	Add pending restart reason information (#2978 ) Provide info about the PG parameters that caused "pending restart" flag to be set. Both `patronictl list` and `/patroni` REST API endpoint now show the parameters names and the diff as the "pending restart reason".	2024-02-14 08:54:20 +01:00
Polina Bungina	f6943a859d	Improve logging for Pg param change (#3008 ) * Convert old value to a human-readable format * Add log line about pg_controldata/global config mismatch that causes pending restart flag to be set	2024-01-29 10:44:25 +01:00
Polina Bungina	266cdc4810	Fixes around pending_restart flag (#3003 ) * Do not set pending_restart flag if hot_standby is set to 'off' during a custom bootstrap (even though we will have this flag actually set in PG, this configuration parameter is irrelevant on primary and there is no actual need for restart) * Skip hot_standby and wal_log_hints when querying parameters pending restart on config reload. They actually can be changed manually (e.g. via ALTER SYSTEM) and it will cause the pending_restart state in PG but Patroni anyway always passes those params to postmaster as command line options. And there they only can have one value - 'on' (except on primary when performing custom bootstrap)	2024-01-16 10:32:28 +01:00
Alexander Kukushkin	5d8c2fb559	Restore recovery GUCs when joining running standby (#2998 ) Close https://github.com/zalando/patroni/issues/2993	2024-01-08 08:35:53 +01:00
Polina Bungina	efdedc7049	Reload postgres config if a server param was reset (#2975 ) Fix the case when a parameter value was changed and then reset back to the initial value without restart - before this fix, the second change was not reflected in the Postgres config. This commit also includes the related unit test refactoring.	2023-12-06 15:57:05 +01:00
Alexander Kukushkin	193c73f6b8	Make GlobalConfig really global (#2935 ) 1. extract `GlobalConfig` class to its own module 2. make the module instantiate the `GlobalConfig` object on load and replace sys.modules with the this instance 3. don't pass `GlobalConfig` object around, but use `patroni.global_config` module everywhere. 4. move `ignore_slots_matchers`, `max_timelines_history`, and `permanent_slots` from `ClusterConfig` to `GlobalConfig`. 5. add `use_slots` property to global_config and remove duplicated code from `Cluster` and `Postgresql.ConfigHandler`. Besides that improve readability of couple of checks in ha.py and formatting of `/config` key when saved from patronictl.	2023-11-24 09:26:05 +01:00
Alexander Kukushkin	552e8643d9	Verify that replica nodes received checkpoint LSN on shutdown (#2939 ) In case if archiving is enabled the `Postgresql.latest_checkpoint_location()` method returns LSN of the prev (SWITCH) record, which points to the beginning of the WAL file. It is done in order to make it possible to safely promote replica which recovers WAL files from the archive and wasn't streaming when the primary was stopped (primary doesn't archive this WAL file). But, in certain cases using the LSN pointing to SWITCH record was causing unnecessary pg_rewind, if replica didn't managed to replay shutdown checkpoint record before it was promoted. In order to mitigate the problem we need to check that replica received/replayed exactly the shutdown checkpoint LSN. But, at the same time we will still write LSN of the SWITCH record to the `/status` key when releasing the leader lock.	2023-11-07 11:05:54 +01:00
Alexander Kukushkin	4c1c804cfd	Read GUC's values when joining running Postgres (#2876 ) If restarted in pause Patroni was discarding `synchronous_standby_names` from `postgresql.conf` because in the internal cache this values was set to `None`. As a result synchronous replication transitioned to a broken state, with no synchronous replicas according to the `synchronous_standby_names` and Patroni not selecting/setting the new synchronous replicas (another bug). To solve the problem of broken initial state and to avoid similar issues with other GUC's we will read GUC's value if Patroni is joining running Postgres.	2023-09-26 10:40:51 +02:00
Polina Bungina	71863cedcb	Always store CMDLINE_OPTIONS config values as int (#2861 )	2023-09-14 18:34:45 +02:00
Alexander Kukushkin	30f0f132e8	Don't start stopped postgres in pause (#2848 ) Due to a race condition Patroni was falsely assuming that the standby should be restarted because some recovery parameters (primary_conninfo or similar) were changed. Close https://github.com/zalando/patroni/issues/2834	2023-09-06 08:57:56 +02:00
Alexander Kukushkin	89d794facc	Introduce connection pool (#2829 ) Make it hold connection kwargs for local connections and all `NamedConnection` objects use them automatically. Also get rid of redundant `ConfigHandler.local_connect_kwargs`. On top of that we will introduce a dedicated connection for the REST API thread.	2023-08-24 16:13:22 +02:00
Alexander Kukushkin	366829e379	Refactor Connection class (#2815 ) 1. stop using the same cursor all the time, it creates problems when not carefully used from different threads. 2. introduce query() method in the Connection class and make it return a result set when it is possible. 3. refactor most of the code that is relying (directly or indirectly) on the Connection object to use the query() method as much as possible. This refactoring helps with reducing code complexity and will help with future introduction of a separate database connection for the REST API thread. The last one will help to improve reliability when system is under significant stress when simple monitoring queries are taking seconds to execute and the REST API starts blocking the main thread.	2023-08-17 15:42:11 +02:00
Alexander Kukushkin	6a75b1591b	Use pg_current_wal_flush_lsn() starting from 9.6 (#2813 ) Due to historical reasons (not available before 9.6) we used `pg_current_wal_lsn()`/`pg_current_xlog_location()` functions to get current WAL LSN on the primary. But, this LSN is not necessarily synced to disk, and could be lost if the primary node crashed.	2023-08-15 09:01:37 +02:00
Alexander Kukushkin	efaba9f183	Rename Postgresql.is_leader() to is_primary() (#2809 ) It'll help to avoid confusion with the Ha.is_leader() method.	2023-08-09 14:47:53 +02:00
Alexander Kukushkin	84aac437c1	Release v3.1.0 (#2801 ) - bump pyright and resolve reported issues - bump Patroni version - update release notes	2023-08-03 13:02:29 +02:00
Alexander Kukushkin	01d07f86cd	Set permissions for files and directories created in PGDATA (#2781 ) Postgres supports two types of permissions: 1. owner only 2. group readable By default the first one is used because it provides better security. But, sometimes people want to run a backup tool with the user that is different from postgres. In this case the second option becomes very useful. Unfortunately it didn't work correctly because Patroni was creating files with owner access only permissions. This PR changes the behavior and permissions on files and directories that are created by Patroni will be calculated based on permissions of PGDATA. I.e., they will get group readable access when it is necessary. Close #1899 Close #1901	2023-08-02 13:15:43 +02:00
Alexander Kukushkin	7e89583ec7	Please new flake8 (#2789 ) it stopped liking lack of space character between `,` and `\` ```python foo,\ bar ```	2023-07-31 09:08:46 +02:00
Israel	df18885f20	Extend Postgres GUCs validator (#2671 ) * Use YAML files to validate Postgres GUCs through Patroni. Patroni used to have a static list of Postgres GUCs validators in `patroni.postgresql.validator`. One problem with that approach, for example, is that it would not allow GUCs from custom Postgres builds to be validated/accepted. The idea that we had to work around that issue was to move the validators from the source code to an external and extendable source. With that Patroni will start reading the current validators from that external source plus whatever custom validators are found. From this commit onwards Patroni will read and parse all YAML files that are found under the `patroni/postgresql/available_parameters` directory to build its Postgres GUCs validation rules. All the details about how this work can be found in the docstring of the introduced function `_load_postgres_gucs_validators`.	2023-05-31 13:54:54 +02:00
Alexander Kukushkin	66a0e44371	Enable pyright job for every commit (#2675 ) And fix remaining issues that the job doesn't fail.	2023-05-15 11:38:40 +02:00
Alexander Kukushkin	76b3b99de2	Enable pyright strict mode (#2652 ) - added pyrightconfig.json with typeCheckingMode=strict - added type hints to all files except api.py - added type stubs for dns, etcd, consul, kazoo, pysyncobj and other modules - added type stubs for psycopg2 and urllib3 with some little fixes - fixes most of the issues reported by pyright - remaining issues will be addressed later, along with enabling CI linting task	2023-05-09 09:38:00 +02:00
Le Duane	bebe6754fc	Add before stop hook (#2642 ) The two cases we have in mind are: * In spite of following all best practices client-side, logical replication connections can sometimes hang the Postgres shutdown sequence. We'd like to sigterm any misbehaving logical replication connections which remain after x seconds. These will inevitably get killed anyway on master stop timeout. * remove "role=master" label on current primary when not using k8s as DCS. Waiting until after Postgres fully stops can sometimes be too long for this. * Pause pgbouncer connections before switchover Close #2596	2023-04-27 13:07:32 +02:00
Alexander Kukushkin	2c7b547a29	Introduce patroni.collections (#2629 ) For now it implements: - CaseInsensitiveDict() - CaseInsensitiveSet() Update `patroni.postgresql.sync.parse_sync_standby_names()` to use `CaseInsensitiveSet()` instead of `CaseInsensitiveDict()`	2023-04-03 11:19:08 +02:00
Polina Bungina	3fe2a7868a	Ignore D401 in flake8-docstrings (#2627 ) * Ignore D401 in flake8-docstrings * Fix newly reported flake8 issues, ignore the old W503 rule * rely on concatenation of adjecent strings * Format behave scripts * Reformat ha.py according to new rules Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>	2023-04-03 09:52:22 +02:00
Alexander Kukushkin	6f357a4e17	Factor out global configuration into a dedicated class (#2628 ) It will help to avoid code duplications.	2023-04-03 08:09:29 +02:00
Alexander Kukushkin	c1bfb0e6d6	Remove python 2.7 support (#2571 ) - get rid from 2.7 specific modules: `six`, `ipaddress` - use Python3 unpacking operator - use `shutil.which()` instead of `find_executable()`	2023-03-13 17:00:04 +01:00
Alexander Kukushkin	2afcaa9d83	Don't write to PGDATA if major version is not known (#2583 ) It could happen that Patroni is started up before PGDATA was mounted. In this case Patroni can't determine major Postgres version from PG_VERSION file. Later, when PGDATA is mounted, Patroni was trying to create the recovery.conf even if the actual Postgres major version is newver than 12. To mitigate the problem we double check that the `Postgresql._major_version` is set before writing recovery configuration or starting postgres up. Close https://github.com/zalando/patroni/issues/2434	2023-03-06 16:33:32 +01:00
Alexander Kukushkin	09d0d78b74	Don't allow on_reload callback kill other callbacks (#2578 ) Since a long time Patroni enforcing only one callback script running at a time. If the new callback is executed while the old one is still running, the old one is killed (including all child processes). Such behavior is fine for all callbacks but on_reload, because the last one may accidentally cancel important ones, that for example updating DNS or assigning/removing Virtual IP. To mitigate the problem we introduce a dedicated executor for on_reload callbacks, so that on_reload may only cancel another on_reload. Ref: https://github.com/zalando/patroni/issues/2445	2023-03-06 16:33:03 +01:00
Alexander Kukushkin	c985974ece	Set hot_standby=off only if recovery_target_action=promote (#2570 ) During custom bootstrap the `hot_standby` is set to off to protect postgres from panicking and shutting down when some parameters like `max_connections` are increased on the primary. According to the [documentation](https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-RECOVERY-TARGET-ACTION), `hot_standby` set to `off` affects behavior of the `recovery_target_action`, and `pause` starts acting as the `shutdown`: > If [hot_standby](https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-HOT-STANDBY) is not enabled, a setting of pause will act the same as shutdown This is not what users expect/need, because normally they resolve pause state on their own. To solve the problem we will set `hot_standby` to `off` during custom bootstrap only if `recovery_target_action` is set to 'promote'. Close https://github.com/zalando/patroni/issues/2569	2023-02-28 10:08:42 +01:00
Alexander Kukushkin	4c3af2d1a0	Change master->primary/leader/member (#2541 ) keep as much backward compatibility as possible. Following changes were made: 1. All internal checks are performed as `role in ('master', 'primary')` 2. All internal variables/functions/methods are renamed 3. `GET /metrics` endpoint returns `patroni_primary` in addition to `patroni_master`. 4. Logs are changed to use leader/primary/member/remote depending on the context 5. Unit-tests are using only role = 'primary' instead of 'master' to verify that 1 works. 6. patronictl still supports old syntax, but also accepts `--leader` and `--primary`. 7. `master_(start\|stop)_timeout` is automatically translated to `primary_(start\|stop)_timeout` if the last one is not set. 8. updated the documentation and some examples Future plan: in the next major release switch role name from `master` to `primary` and maybe drop `master` altogether. The Kubernetes implementation will require more work and keep two labels in parallel. Label values should probably be configurable as described in https://github.com/zalando/patroni/issues/2495.	2023-01-27 07:40:24 +01:00
Alexander Kukushkin	3161f31088	Enhanced sync connections check (#2524 ) When `synchronous_standby_names` GUC is changed PostgreSQL nearly immediately starts reporting corresponding walsenders as synchronous, while in fact they maybe didn't reach this state yet. To mitigate this problem we memorize current flush lsn on the primary right after change of `synchronous_standby_names` got visible and use it as an additional check for walsenders. The walsender will be counted as truly "sync" only when write/flush/replay_lsn on it reached memorized LSN and the `application_name` is known to be a part of `synchronous_standby_names`. The size of PR mostly related to refactoring and moving the code responsible for working with `synchronous_standby_names` and `pg_stat_replication` to the dedicated file. And `parse_sync_standby_names()` function was mostly copied from #672.	2023-01-24 15:05:54 +01:00
William Albertus Dembo	f06d432dab	Keep only latest failed data directory (#2471 ) Use constant postfix when moving data directory due to failure so it only keeps data from the latest failure.	2023-01-19 21:47:41 +01:00
Alexander Kukushkin	c12fe4146d	Run only one query per HA loop (#2516 ) If the cluster is stable (no nodes are joining/leaving/lagging) we want to run at most one monitor query per every HA loop. So far it worker perfectly except when synchronous_mode is enabled, where we run two additional queries: 1. SHOW synchronous_mode 2. SELECT ... FROM pg_stat_replication In order to solve it, we will include these "queries" to the common monitoring query is synchronous_mode is enabled. In addition to that make sure that `synchronous_standby_names` is reset on replicas that used to be a primary and avoid using replicas which are not in the 'running' state. P.S.: in the monitoring query we also extract the current value of synchronous_standby_names, because it will be useful for the quorum commit feature. Close https://github.com/zalando/patroni/issues/2469	2023-01-10 10:44:17 +01:00
Alexander Kukushkin	92d3e1c167	Introduce the failsafe key in DCS (#2485 ) Extracted from #2379	2022-12-13 11:35:06 +01:00
Alexander Kukushkin	580530b30f	Behave tests on Windows (#2432 ) Windows doesn't support `SIGTERM`, but our behave tests in majority of cases relying on Patroni graceful shutdown. In order to emulate the behaviour we introduced the new REST API endpoint `POST /sigterm`. The endpoint works only on Windows and when `BEHAVE_DEBUG` environment variable is set. Besides that some minor adjustments in behave tests were done. Mainly related to backslash-slash handling. In addition to that improve test coverage on Windows by properly mocking access to filesystem and avoiding calling `subprocess.call()`. Specifically, symlink creation on Windows requires Admin privileges and there is no `true.exe`.	2022-10-21 12:24:24 +02:00
Alexander Kukushkin	5b1fd23776	Always return checkpoint location as integer (#2349 ) before it was also returning a str in some cases	2022-06-30 10:52:28 +02:00
Alexander Kukushkin	96b75fa7cb	Special handling of check_recovery_conf for v12+ (#2292 ) When starting as a replica it may take some time before Postgres starts accepting new connections, but meanwhile, it could happen that the leader transitioned to a different member and the `primary_conninfo` must be updated. On pre v12 Patroni regularly checks `recovery.conf` in order to check that recovery parameters match the expectation. Starting from v12 recovery parameters were converted to GUC's and Patroni gets current values from the `pg_settings` view. The last one creates a problem when it takes more than a minute for Postgres to start accepting new connections. Since Patroni attempts to execute at least `pg_is_in_recovery()` every HA loop, and it is raising at exception, the `check_recovery_conf()` effectively wasn't reachable until recovery is finished, but it changed when #2082 was introduced. As a result of #2082 we got the following behavior: 1. Up to v12 (not including) everything was working as expected 2. v12 and v13 - Patroni restarting Postgres after 1m of recovery 3. v14+ - the `check_recovery_conf()` is not executed because the `replay_paused()` method raising an exception. In order to properly handle changes of recovery parameters or leader transitioned to a different node on v12+, we will rely on the cached values of recovery parameters until Postgres becomes ready to execute queries. Close https://github.com/zalando/patroni/issues/2289	2022-05-12 07:45:49 +02:00
Michael Banck	2d15e0dae6	Add target_session_attrs=read-write to standby_leader primary_conninfo (#2193 ) This allows to have multiple hosts in a standby_cluster and ensures that the standby leader follows the main cluster's new leader after a switchover. Partially addresses #2189	2022-02-10 15:50:14 +01:00

1 2 3 4 5 ...

285 Commits