patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-27 10:20:10 +00:00

Author	SHA1	Message	Date
Polina Bungina	e013c1a6ee	Reword confirmation message	2023-09-12 15:53:42 +02:00
Polina Bungina	03d9226633	Implement Failover.is_failover/switchover properties	2023-09-12 14:02:17 +02:00
Polina Bungina	5b4291bfab	Refactor manual failover checks - Implement the dedicated class that represents a manual failover request - Move manual failover/switchover prechecks to the class method and use for both ctl and api - Use a single parse_schedule function in both ctl and api - Implement has_members_eligible_to_promote Ha method - Fix get_members + role='any' exception msg	2023-09-12 13:13:17 +02:00
Polina Bungina	b31a4d55c9	Ensure strict failover/switchover definition difference (#2784 ) - Don't set leader in failover key from patronictl failover - Show warning and execute switchover if leader option is provided for patronictl failover command - Be more precise in the log messages - Allow to failover to an async candidate in sync mode - Check if candidate is the same as the leader specified in api - Fix and extend some tests - Add documentation	2023-09-12 08:51:17 +02:00
Israel	3c24c33e59	Document how to change Postgres settings that touch shared memory (#2843 ) Some special handling is required when changing either of these settings in a Postgres cluster that has standby nodes: * `max_connections` * `max_prepared_transactions` * `max_locks_per_transaction` * `max_wal_senders` * `max_worker_processes` If one attempts to decrease `max_connections` dynamic setting and restart all nodes at the same time (primary and standbys), Patroni will refuse to apply the new value on the standbys and require the user to restart it again later, once replication catches up. That behavior is correct, but it is not documented. This commit adds information to documentation about that behavior and why it's required. References: PAT-166.	2023-09-11 19:25:01 +02:00
Matt Baker	83a060fc15	Extend documentation with package installation and upgrade process (#2854 )	2023-09-11 19:02:13 +02:00
Israel	a2ceff1517	Generate documentation of private members through sphinx docs (#2831 ) * Generate documentation of private members through sphinx docs With this commit we make sphinx build API docs for the following things, which were missing up to this point: * `__init__` method of classes; * "private" members (properties, functions, methods, attributes, etc., which name starts with an underscore); * members that are missing a docstring, so we can still reference them with links in the documentation. The third point can be removed later, if we wish, when we reach a point where everything has proper docstrings in the Patroni code base. * Fix documentation problems found after enabling private methods in sphinx * `:cvar:` is not a valid domain role. Replaced with `:attr:`. * documentation for `consul.base.Consul.__init__` has a single backtick quoted string which is interpreted as a reference which cannot be found. Therefore, the docstring has been copied as a block quote. * various list spacing problems and indentation problems. * code blocks added where indentation is interpreted incorrectly * literal string quoting issues. --------- Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com> Co-authored-by: Matt Baker <matt.baker@enterprisedb.com>	2023-09-11 15:41:34 +02:00
Alexander Kukushkin	19f20ec2eb	Refactor replication slots handling (#2851 ) 1. make _get_members_slots() method return data in the same format as _get_permanent_slots() method 2. move conflicting name handling from get_replication_slots() to _get_members_slots() method 3. enrich structure returned by get_replication_slots() with the LSN of permanent logical slots reported by primary 4. use the added information in the SlotsHandler instead of fetching it from the Cluster.slots 5. bugfix: don't try to advance logical slot that doesn't match required configuration	2023-09-07 12:56:07 +02:00
Alexander Kukushkin	30f0f132e8	Don't start stopped postgres in pause (#2848 ) Due to a race condition Patroni was falsely assuming that the standby should be restarted because some recovery parameters (primary_conninfo or similar) were changed. Close https://github.com/zalando/patroni/issues/2834	2023-09-06 08:57:56 +02:00
Alexander Kukushkin	941e883dde	Override write_leader_optime method in K8s implementation (#2850 ) It is being called when postgres is already shut down cleanly but there are no healthy replicas to take it over. Close https://github.com/zalando/patroni/issues/2837 Close https://github.com/zalando/patroni/pull/2838	2023-09-05 07:41:45 +02:00
Polina Bungina	89a162e000	Return system id to the ctl list title (#2840 )	2023-09-05 07:27:34 +02:00
Alexander Kukushkin	0ab5b49757	Introduce a dedicated postgres connection for REST API (#2833 ) Sharing a single connection between REST API and the main thread (doing heartbeats) was working mostly fine, except when Postgres becomes so slow that REST API queries start blocking the main loop. If the dedicated REST API connection isn't available we use the heartbeat connection as a fallback.	2023-09-05 07:26:44 +02:00
SK	80a03a4892	Enreach some endpoints with the scope and name (#2846 ) - monitoring endpoints - added `name` to the `patroni`, next to the `scope` and `version` - metrics endpoint - added name to labels	2023-09-05 07:24:17 +02:00
Matt Baker	d2603402ea	Debian docker image pip error (#2849 ) * Use virtualenv to install tox in behave Dockerfile Upstream change in postgres docker image uses debian restriction on installing system-wide non-debian python packages. Debian doesn't provide a tox>=4, so we need to install with pip. * Exclude all output directories generated using `tox-wrapper.sh` The `tox-wrapper.sh` script created by `features/Dockerfile` creates directories like features/output-tox-pg14-docker-behave-etcd-lin-973719674/ * Reduce footprint of tox behave docker image	2023-09-04 21:24:26 +02:00
Alexander Kukushkin	6b7f914da7	Fix bug with kubernetes.standby_leader_label_value (#2832 ) When running with the leader lock Patroni was just setting the `role` label to `master` and effectively `kubernetes.standby_leader_label_value` feature never worked. Now it is fixed, but in order to not introduce breaking changes we just update default value of the `standby_leader_label_value` to the `master`.	2023-09-04 10:03:37 +02:00
Israel	03107e6d8b	`patronictl --help` was showing `ctl` function's docstring (#2845 ) `patronictl` is implemented using `click` module, and that module uses the functions' docstrings for creating a helper text. As a consequence the docstring for `ctl` function was being shown to the user, which doesn't make sense. This PR fixes that issue by adding a user-friendly description to be shown on `patronictl --help`. We use a `\f` to tell `click` when to stop capturing text to show in the helper. Note that `patronictl` commands are implemented using `@ctl.command` decorator, and we always provide them with `help` argument. That said, none of the subcommands are affected by the aforementioned issue, only the entry point of the CLI. References: PAT-201.	2023-09-04 09:27:46 +02:00
Alexander Kukushkin	77dba39585	Pin version of sphinx-github-style (#2847 ) 1.0.3 removed support of `top_level` configuration parameter and builds now are failing. Besides that remove redundant pyyaml from requirements.docs.txt	2023-09-04 09:00:24 +02:00
Alexander Kukushkin	89d794facc	Introduce connection pool (#2829 ) Make it hold connection kwargs for local connections and all `NamedConnection` objects use them automatically. Also get rid of redundant `ConfigHandler.local_connect_kwargs`. On top of that we will introduce a dedicated connection for the REST API thread.	2023-08-24 16:13:22 +02:00
Alexander Kukushkin	3333e78500	Factor out tags handling into a dedicated class (#2823 ) The same (almost) logic was used in three different places: 1. `Patroni` class 2. `Member` class 3. `_MemberStatus` class Now they all inherit newly intoduced `Tags` class.	2023-08-21 17:03:14 +02:00
Matt Baker	0ab4bc9d27	Exclude sphinx build files from git (#2828 )	2023-08-21 16:23:04 +02:00
Polina Bungina	13cfe0af36	Pin sphinx_rtd_theme to >1 (#2825 ) Earlier versions are incompatible with sphinx>7	2023-08-21 07:55:02 +02:00
Polina Bungina	7319d12026	Remove accidentally added .DS_Store (#2826 ) And extend .gitignore	2023-08-21 07:50:45 +02:00
Polina Bungina	2ec9834c60	Update api examples (#2824 ) * Add failsafe_mode_is_active to /patroni and /metrics * Add patroni_primary to /metrics * Add examples showing that failsafe_mode_is_active and cluster_unlocked are only shown for /patroni when the value is "true" * Update /patroni and /config examples	2023-08-18 16:13:13 +02:00
Alexander Kukushkin	2be64e5131	Don't return logical slots for standby cluster (#2816 ) Cluster.get_replication_slots() didn't take into account that there can not be logical replication slots in a standby cluster replicas. It was only skipping logical slots for the standby_leader, but replicas were expecting that they will have to copy them over. Also on replicas in a standby cluster these logical slots were falsely added to the `_replication_slots` dict.	2023-08-18 13:36:32 +02:00
Alexander Kukushkin	93be10a655	Remove Python 2 install instructions from docs/README (#2822 ) docs/README.rst mainly duplicates README.rst and also should be changed. Besides that remove test/coverage badges. followup on #2821	2023-08-17 16:17:34 +02:00
Alexander Kukushkin	366829e379	Refactor Connection class (#2815 ) 1. stop using the same cursor all the time, it creates problems when not carefully used from different threads. 2. introduce query() method in the Connection class and make it return a result set when it is possible. 3. refactor most of the code that is relying (directly or indirectly) on the Connection object to use the query() method as much as possible. This refactoring helps with reducing code complexity and will help with future introduction of a separate database connection for the REST API thread. The last one will help to improve reliability when system is under significant stress when simple monitoring queries are taking seconds to execute and the REST API starts blocking the main thread.	2023-08-17 15:42:11 +02:00
Jelte Fennema	899cad1c0f	Remove Python 2 install instructions from README (#2821 )	2023-08-17 13:18:37 +02:00
Israel	a4ac4963d1	Fix `IntValidator` regarding validation of value `0` (#2818 ) Previous to this commit `IntValidator` would always consider the value `0` invalid, even if in the allowed range. The problem was that `parse_int` was returning `0` in the following line: ```python value = parse_int(value, self.base_unit) or "" ``` However the `or ""` was evaluating to an empty string. As `parse_int` returns either an `int` if able to parse, or `None` otherwise, the `isinstance(value, int)` is enough to error out when not a valid `int`. Closes #2817	2023-08-17 12:55:42 +02:00
Alexander Kukushkin	704d36815a	Explicitly enable synchronous mode (#2820 ) Close https://github.com/zalando/patroni/issues/2819 Co-authored-by: Polina Bungina <27892524+hughcapet@users.noreply.github.com>	2023-08-17 12:33:15 +02:00
Israel	4138d0b830	Add docstrings to `patroni.config` (#2708 ) Besides adding docstrings to `patroni.config`, a few side changes have been applied: * Reference `config_file` property instead of internal attribute `_config_file` in method `_load_config_file`; * Have `_AUTH_ALLOWED_PARAMETERS[:2]` as default value of `params` argument in method `_get_auth` instead of using `params or _AUTH_ALLOWED_PARAMETERS[:2]` in the body; * Use `len(PATRONI_ENV_PREFIX)` instead of a hard-coded `8` when removing the prefix from environment variable names; * Fix documentation of `wal_log_hints` setting. The previous docs mentioned it was a dynamic setting that could be changed. However it is managed by Patroni, which forces `on` value. References: PAT-123.	2023-08-17 11:19:49 +02:00
Matt Baker	b7ea511511	Generate API docs from code with sphinx autodoc (#2699 ) Expanding on the addition of docstrings in code, this adds python module API docs to sphinx documentation. A developer can preview what this might look like by running this locally: ``` tox -m docs ``` The option `-W` is added to the tox env so that warning messages are considered errors. Adds doc generation using the above method to the test GitHub workflow to catch documentation problems on PRs. Some docstrings have been reformatted and fixed to satisfy errors generated with the above setup.	2023-08-17 10:27:33 +02:00
Israel	badf1da183	Add docstrings to `patroni.__main__` (#2701 ) References: PAT-117.	2023-08-15 09:05:25 +02:00
Alexander Kukushkin	6a75b1591b	Use pg_current_wal_flush_lsn() starting from 9.6 (#2813 ) Due to historical reasons (not available before 9.6) we used `pg_current_wal_lsn()`/`pg_current_xlog_location()` functions to get current WAL LSN on the primary. But, this LSN is not necessarily synced to disk, and could be lost if the primary node crashed.	2023-08-15 09:01:37 +02:00
Matt Baker	82d2ef4878	Make docs more clear on changes to the `bootstrap.dcs` section of YAML config (#2811 ) It seems that a common pitfall for new users of Patroni is that the `bootstrap.dcs` section is only used to initialize the configuration in DCS. This moves the comment about this to an info block so it is more visible to the reader.	2023-08-11 10:31:31 +02:00
Mark Pekala	b83f1c0f44	[Refactor] Rename _is_leader to _leader_expiry (#2807 )	2023-08-11 10:30:20 +02:00
Alexander Kukushkin	9209a5a133	Refactor delete_leader interface (#2810 ) similar to https://github.com/zalando/patroni/pull/2690, but it helps mostly Consul implementation.	2023-08-11 10:19:29 +02:00
Polina Bungina	3734ecc851	Implement generate-config (#2786 ) New patroni.py option that allows to * generate patroni.yml configuration file with the values from a running cluster * generate a sample patroni.yml configuration file	2023-08-09 17:46:53 +02:00
Alexander Kukushkin	713244975c	Silence useless warnings in patronictl (#2808 ) Close https://github.com/zalando/patroni/issues/2805	2023-08-09 14:48:18 +02:00
Alexander Kukushkin	efaba9f183	Rename Postgresql.is_leader() to is_primary() (#2809 ) It'll help to avoid confusion with the Ha.is_leader() method.	2023-08-09 14:47:53 +02:00
Polina Bungina	f24db395c6	Refactor is_failover_possible() (#2804 ) * Refactor is_failover_possible() Move all the members filtering inside the function. * Remove check_synchronous parameter * Add sync_mode_is_active() method and user it everywhere where it is appropriate * Reduce nesting --------- Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>	2023-08-08 11:50:02 +02:00
Matt Baker	9dd177e5c9	Add docstrings to `patroni.postgresql.slots.py` (#2778 ) Also, made some small code changes to satisfy formatting and pylint.	2023-08-08 08:54:55 +02:00
Matt Baker	eb100fd586	Add docs to `patroni.dcs.__init__.py` (#2777 ) Also, made some small code changes to satisfy formatting and pylint.	2023-08-08 08:49:12 +02:00
ChenChangAo	a74985f41d	reset failsafe state when promote (#2803 ) consider the scenario(enable failsafe_mode): 0. node1(primary) - node2(replica) 1. stop all etcd nodes; wait ttl seconds; start all etcd nodes; (node2's failsafe will contain the info about node1) 2. switchover to node2; (node2's failsafe still contain the info about node1) 3. stop all etcd nodes; wait ttl seconds; start all etcd nodes; 4. node2 will demote because it consider node1 as primary Resetting failsafe state when running as a primary fixes the issue.	2023-08-04 13:55:56 +02:00
Alexander Kukushkin	da9aaf6cdf	Mock request() method when running tests (#2802 ) 1. Unit tests should not really try accessing any resources. 2. Not doing so results in significant execution time of unit tests on Windows In addition to that perform a request with timeout 3s. Usually this is more than enough to figure out whether resource is accessible. Followup on #2724	2023-08-04 07:37:49 +02:00
Alexander Kukushkin	84aac437c1	Release v3.1.0 (#2801 ) - bump pyright and resolve reported issues - bump Patroni version - update release notes v3.1.0	2023-08-03 13:02:29 +02:00
Israel	48e3d31e1d	Refactor docs about migration to Patroni (#2796 ) This PR is an attempt of refactoring the docs about migration to Patroni. These are a few enhancements that we propose through this PR: * Docs used to mention the procedure can only be performed in a single-node cluster. We changed that so the procedure considers a cluster composed of primary and standbys; * Teach how to deal with pre-existing replication slots; * Explain how to create the user for `pg_rewind`, if user intends to enable `use_pg_rewind`. References: PAT-143.	2023-08-03 09:01:16 +02:00
Alexander Kukushkin	01d07f86cd	Set permissions for files and directories created in PGDATA (#2781 ) Postgres supports two types of permissions: 1. owner only 2. group readable By default the first one is used because it provides better security. But, sometimes people want to run a backup tool with the user that is different from postgres. In this case the second option becomes very useful. Unfortunately it didn't work correctly because Patroni was creating files with owner access only permissions. This PR changes the behavior and permissions on files and directories that are created by Patroni will be calculated based on permissions of PGDATA. I.e., they will get group readable access when it is necessary. Close #1899 Close #1901	2023-08-02 13:15:43 +02:00
Matt Baker	b6fc4bc393	Replace instances of typing `Generator[X, None, None]` with `Iterator[X]` (#2799 )	2023-08-02 12:36:29 +02:00
Israel	018a2f4dd9	Enhance docs of `slots` dynamic configuration (#2797 ) The docs of `slots` configuration used to have this mention: ``` my_slot_name: the name of replication slot. If the permanent slot name matches with the name of the current primary it will not be created. Everything else is the responsibility of the operator to make sure that there are no clashes in names between replication slots automatically created by Patroni for members and permanent replication slots. ``` However that is not true in the sense that Patroni does not check for clashes between `my_slot_name` and the name of replication slots created for replicating changes among members. If you specify a slot name that clashes with the name of a replication slot used by a member, it turns out Patroni will make the slot permanent in the primary even if the member key expire from the DCS. Through this commit we also enhance the docs in terms of explaining that physical permanent slots are maintained only in the primary, while logical replication slots are copied from primary to standbys. Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>	2023-08-01 15:40:07 +02:00
Alexander Kukushkin	b7caf3b7f2	Fix behaviour of replicas in standby cluster in pause (#2795 ) When the leader key expires replicas should not follow the remote node but keep `primary_conninfo` as it is.	2023-08-01 14:09:46 +02:00

1 2 3 4 5 ...

2275 Commits