patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-27 18:20:05 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	c8e32775df	Release v3.2.2 (#3007 ) - update release notes - bump Patroni version - bump pyright version and fix reported issues - improve compatibility with legacy psycopg2 Co-authored-by: Polina Bungina <bungina@gmail.com> v3.2.2	2024-01-17 08:35:35 +01:00
Polina Bungina	f2919f9c2f	Fixes around pending_restart flag (#3003 ) * Do not set pending_restart flag if hot_standby is set to 'off' during a custom bootstrap (even though we will have this flag actually set in PG, this configuration parameter is irrelevant on primary and there is no actual need for restart) * Skip hot_standby and wal_log_hints when querying parameters pending restart on config reload. They actually can be changed manually (e.g. via ALTER SYSTEM) and it will cause the pending_restart state in PG but Patroni anyway always passes those params to postmaster as command line options. And there they only can have one value - 'on' (except on primary when performing custom bootstrap)	2024-01-16 10:44:30 +01:00
Alexander Kukushkin	f59c79740f	Optimize priority failover behave tests (#3004 ) 1. get rid of useless sleep calls 2. call `POST /failover` on the node where we want to failover to	2024-01-15 12:24:42 +01:00
Alexander Kukushkin	2a64bfd459	Restore recovery GUCs when joining running standby (#2998 ) Close https://github.com/zalando/patroni/issues/2993	2024-01-08 09:17:17 +01:00
Israel	23067d7ea7	Close the doors for a possible future bug in the config generator (#3000 ) The `AbstractConfigGenerator._format_config` method was missing a comma in the declaration of a tuple. As a consequence it was concatenating the strings `ctl` and `citus` instead of creating two separate items in the tuple. There is currently no observed bug from that issue in the code because the template configuration created by the method `AbstractConfigGenerator.get_template_config` doesn't include either of `ctl` or `citus` keys. However, it is still important that we close the doors for possible future bugs that would come up if we ever attempt to use either of those keys in the template, for example. References: PAT-231.	2024-01-05 10:17:07 +01:00
Sophia Ruan	47063de46d	call freeze_support in main module to solve pyinstaller frozen issue (#2996 ) Close #2995	2024-01-05 10:17:01 +01:00
Polina Bungina	3e9bceac11	Don't filter out contradictory nofailover tag (#2992 ) * Ensure that nofailover will always be used if both nofailover and failover_priority tags are provided * Call _validate_failover_tags from reload_local_configuration() as well * Properly check values in the _validate_failover_tags(): nofailover value should be casted to boolean like it is done when accessed in other places	2024-01-05 10:16:52 +01:00
zhjwpku	9cc1f8e763	Fix Citus bootstrap - CREATE DATABASE cannot be executed from a function (#2994 ) This was introduced by #2990: pod cannot be started and show the following logs: ``` 2023-12-26 03:29:25.569 UTC [47] CONTEXT: SQL statement "CREATE DATABASE "citus"" PL/pgSQL function inline_code_block line 5 at SQL statement 2023-12-26 03:29:25.569 UTC [47] STATEMENT: DO $$ BEGIN PERFORM * FROM pg_catalog.pg_database WHERE datname = 'citus'; IF NOT FOUND THEN CREATE DATABASE "citus"; END IF; END;$$ 2023-12-26 03:29:25,570 ERROR: post_bootstrap Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/bootstrap.py", line 474, in post_bootstrap self._postgresql.citus_handler.bootstrap() File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/mpp/citus.py", line 401, in bootstrap cur.execute(sql.encode('utf-8')) psycopg2.errors.ActiveSqlTransaction: CREATE DATABASE cannot be executed from a function CONTEXT: SQL statement "CREATE DATABASE "citus"" PL/pgSQL function inline_code_block line 5 at SQL statement ``` --------- Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>	2024-01-05 10:16:26 +01:00
Alexander Kukushkin	d00f5a645b	Create citus database and extension idempotently (#2990 ) Consider a task: we want to create an extension _before_ citus in a database. Currently `post_bootstrab` script is executed before `CitusHandler.bootstrap()` method, which seems to allow doing that, but in fact `CitusHandler.bootstrap()` will fail to create already existing database and as a result the whole bootstrap will fail. Changing the order of execution of `post_bootstrab` hook and `CitusHandler.bootstrap()` seems to be useless, because it will not allow creating another extension _before_ citus. Therefore the only way of solving it is making CREATE DATABASE and CREATE EXTENSION idempotent. It will allow to create citus database and all dependencies from the `post_bootstrab` hook.	2024-01-05 10:14:45 +01:00
Polina Bungina	15b57c5bdc	Exclude leader from failover candidates in ctl (#2983 ) Exclude actual leader (not the passed leader argument) from the candidates list in the `patronictl failover` prompt. Abort `patronictl failover` execution if candidate specified is the same as the current cluster leader	2024-01-05 10:12:33 +01:00
Polina Bungina	f10e4805db	Actually allow failover to an async candidate in sync mode (#2980 )	2024-01-05 10:05:28 +01:00
Polina Bungina	3e0e91f905	Reload postgres config if a server param was reset (#2975 ) Fix the case when a parameter value was changed and then reset back to the initial value without restart - before this fix, the second change was not reflected in the Postgres config. This commit also includes the related unit test refactoring.	2024-01-05 09:56:01 +01:00
Alexander Kukushkin	51a148fcf3	Use consistent read when fetching just updated sync key (#2974 ) Consul doesn't provide any interface to immediately get `ModifyIndex` for the key that we just updated, therefore we have to perform an explicit read operation. By default stale reads are allowed and sometimes we may read stale data. As a result write_sync_state() call was considered as failed. To mitigate the problem we switch to `consistent` reads when that executed after update of the `/sync` key. Close #2972	2024-01-05 09:55:43 +01:00
Alexander Kukushkin	c3697738b1	Disable SSL for MacOS GH action runners (#2976 ) Latest runners release (20231127.1) somehow broke our tests. Connections to postgres somehow failing with strange error: ``` could not accept SSL connection: Socket operation on non-socket ```	2024-01-05 09:55:27 +01:00
Alexander Kukushkin	722b4b72a8	Don't let replica restore initialize key when DCS was wiped (#2970 ) It was happening from the branch where Patroni was supposed to be complain about converting standalone PG cluster to be governed by Patroni and exit.	2024-01-05 09:55:10 +01:00
Alexander Kukushkin	65b43c39fa	Release/v3.2.1 (#2968 ) - bump version - bump pyright - update release notes v3.2.1	2023-11-30 16:51:21 +01:00
Waynerv	aea9a2b0ca	Cache `postgres --describe-config` output results (#2967 ) We don't expect GUCs list to change for the same major version and don't expect major version to change while Patroni is running.	2023-11-30 12:07:06 +01:00
Sophia Ruan	71ccd41915	Fix the issue that REST API returns unknown after postgres restart (#2956 ) Close #2955	2023-11-30 10:16:51 +01:00
Alexander Kukushkin	49e4a6ed7d	Fix Citus transaction rollback condition check (#2964 ) It seems that sometimes we get an exact match, what makes behave tests to fail.	2023-11-30 09:02:50 +01:00
Alexander Kukushkin	ebd05871d9	Bump pyright to 1.1.336 (#2952 ) and fix newly reported issues	2023-11-30 09:02:16 +01:00
Alexander Kukushkin	42cd803619	Fix bug with custom bootstrap (#2948 ) Patroni was falsely applying `--command` argument. Close https://github.com/zalando/patroni/issues/2947	2023-11-30 09:01:47 +01:00
Alexander Kukushkin	bae72df5b1	Fix pg_rewind behavior with Postgres v16+ (#2944 ) The error message format was changed in `4ac30ba4f2`, what caused `pg_rewind` being called by Patroni even when it was not necessary.	2023-11-30 09:01:41 +01:00
Alexander Kukushkin	f2a129f209	Fix Etcd v2 with Citus (#2943 ) When deploying a new Citus cluster with Etcd v2 Patroni was failing to start with the following exception: ```python 2023-11-09 10:51:41,246 INFO: Selected new etcd server http://localhost:2379 Traceback (most recent call last): File "/home/akukushkin/git/patroni/./patroni.py", line 6, in <module> main() File "/home/akukushkin/git/patroni/patroni/__main__.py", line 343, in main return patroni_main(args.configfile) File "/home/akukushkin/git/patroni/patroni/__main__.py", line 237, in patroni_main abstract_main(Patroni, configfile) File "/home/akukushkin/git/patroni/patroni/daemon.py", line 172, in abstract_main controller = cls(config) File "/home/akukushkin/git/patroni/patroni/__main__.py", line 66, in __init__ self.ensure_unique_name() File "/home/akukushkin/git/patroni/patroni/__main__.py", line 112, in ensure_unique_name cluster = self.dcs.get_cluster() File "/home/akukushkin/git/patroni/patroni/dcs/__init__.py", line 1654, in get_cluster cluster = self._get_citus_cluster() if self.is_citus_coordinator() else self.__get_patroni_cluster() File "/home/akukushkin/git/patroni/patroni/dcs/__init__.py", line 1638, in _get_citus_cluster cluster = groups.pop(CITUS_COORDINATOR_GROUP_ID, Cluster.empty()) AttributeError: 'Cluster' object has no attribute 'pop' ``` It is broken since #2909. In addition to that fix `_citus_cluster_loader()` interface by allowing it to return only dict obj.	2023-11-30 09:01:19 +01:00
Alexander Kukushkin	df0fd91614	Do a real http request when performing name uniqueness check (#2942 ) When running in containers it is possible that the traffic is routed using `docker-proxy`, which listens on the port and accepting incoming connections. This commit effectively sticks to the original solution from #2878	2023-11-30 09:01:11 +01:00
Alexander Kukushkin	43f23df974	Verify that replica nodes received checkpoint LSN on shutdown (#2939 ) In case if archiving is enabled the `Postgresql.latest_checkpoint_location()` method returns LSN of the prev (SWITCH) record, which points to the beginning of the WAL file. It is done in order to make it possible to safely promote replica which recovers WAL files from the archive and wasn't streaming when the primary was stopped (primary doesn't archive this WAL file). But, in certain cases using the LSN pointing to SWITCH record was causing unnecessary pg_rewind, if replica didn't managed to replay shutdown checkpoint record before it was promoted. In order to mitigate the problem we need to check that replica received/replayed exactly the shutdown checkpoint LSN. But, at the same time we will still write LSN of the SWITCH record to the `/status` key when releasing the leader lock.	2023-11-30 09:01:05 +01:00
Alexander Kukushkin	42bf1f95a3	Limit accepted values for --format argument (#2938 ) It used to accept any arbitrary string Close https://github.com/zalando/patroni/issues/2936	2023-11-30 09:00:39 +01:00
Israel	23200daada	Add a FAQ page to the docs (#2933 ) This commit introduces a FAQ page to the docs. The idea is to get most frequently asked questions answered before-hand, so the user is able to get them answered quickly without going into detail in the docs or having to go to Slack/GitHub to clarify questions. --------- Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>	2023-11-30 09:00:22 +01:00
Alexander Kukushkin	ce10e5fccc	Release v3.2.0 (#2930 ) - bump version - bump pyright and apply fixes - update release notes v3.2.0	2023-10-25 16:13:30 +02:00
Israel	bb90feb393	Add support for additional parameters on custom bootstrap (#2927 ) Previous to this commit, if a user would ever like to add parameters to the custom bootstrap script call, they would need to configure Patroni like this: ``` bootstrap: method: custom_method_name custom_method_name: command: /path/to/my/custom_script --arg1=value1 --arg2=value2 ... ``` This commit extends that so we achieve a similar behavior that is seen when using `create_replica_methods`, i.e., we also allow the following syntax: ``` bootstrap: method: custom_method_name custom_method_name: command: /path/to/my/custom_script arg1: value1 arg2: value2 ``` All keys in the mapping which are not recognized by Patroni, will be dealt with as if they were additional named arguments to be passed down to the `command` call. References: PAT-218.	2023-10-25 15:01:08 +02:00
Alexander Kukushkin	3d527f5728	Improve formatting of generated config and validation of ints (#2928 ) - order sections similar to sample configs - add warnings and comments to `bootstrap.dcs` section. - add `tags` and `log` sections. - use discovered IPs in `postgresql.connect_address` and `postgresql.listen` - set `wal_level` to `replica` for PostgreSQL 9.6+ - make unit tests pass with python 3.6 - improve config validator so it doesn't complain when some ints are strings in YAML file.	2023-10-25 14:23:57 +02:00
Polina Bungina	6c06f5cc96	Add initial docs for patroni --validate/generate config (#2929 ) For now it will sit in the section about the Patroni configuration. We can later move it to (or reference from) a new section where all the functionality of the `patroni` executable will be described.	2023-10-25 14:20:17 +02:00
Mark Pekala	f5ee67fa1c	Feature: failover priority (#2780 ) The priority is configured with `failover_priority` tag. Possible values are from `0` till infinity, where `0` means that the node will never become the leader, which is the same as `nofailover` tag set to `true`. As a result, in the configuration file one should set only one of `failover_priority` or `nofailover` tags. The failover priority kicks in only when there are more than one node have the same receive/replay LSN and are ahead of other nodes in the cluster. In this case the node with higher value of `failover_priority` is preferred. If there is a node with higher values of receive/replay LSN, it will become the new leader even if it has lower value of `failover_priority` (except when priority is set to 0). Close https://github.com/zalando/patroni/issues/2759	2023-10-24 12:22:48 +02:00
Israel	65030c56ee	Add capability of specifying namespace through `--dcs` argument (#2926 ) This commit changes the `patronictl` application in such a way its `--dcs` argument is now able to receive a namespace. Previous to this commit this was the format of that argument's value: `DCS://HOST:PORT`. From now on it accepts this format: `DCS://HORT:PORT/NAMESPACE`. As all previous parts of the argument value, `NAMESPACE` is optional, and if not given `patronictl` will fallback to the value from the configuration file, if any, or to `service`. This change is specifically useful when you are running a cluster in a custom namespace, and from a machine where you don't have a configuration file for Patroni or `patronictl`. It can avoid that you would have to create a configuration file only with `namespace` filed in that case. Issue reported by: Shaun Thomas <shaun@bonesmoses.org> Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>	2023-10-24 12:09:44 +02:00
Alexander Kukushkin	d471f1156d	Handle AuthOldRevision error (#2913 ) The error is raised if Etcd is configured to use JWT auth tokens and when the user database in Etcd is updated, because the update invalidates all tokens. If retries are requested - try to get a new new token and repeat the request. Repeat it in a loop until request is successfully executed or until `retry_timeout` is exhausted. This is the only way of solving a race condition, because between authentication and executing the request yet another modification of the user database in Etcd might happen. In case if the request doesn't have to be immediately retried - set a flag that the next API request should perform the authentication first and let Patroni to naturally repeat the request on the next heartbeat loop. Co-authored-by: Kenny Do <kedo@render.com> Ref: https://github.com/zalando/patroni/pull/2911	2023-10-23 14:00:37 +02:00
Alexander Kukushkin	6d98944e73	Add warning to the sample config about bootstrap section (#2925 ) often people are trying to change it and coming with the questions why it doesn't work.	2023-10-23 10:03:18 +02:00
zhjwpku	6cfd90401e	get rid of stale comment of get_cluster (#2922 ) PR #2909 remove the cache in Zookeeper implementation of DCS, so the comment of get_cluster should be changed to 'Retrieve a fresh view of DCS' since every implementation does so. Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>	2023-10-23 08:30:13 +02:00
GuanqunYang193	ce187bec38	Remove user creation related docs (#2920 ) * Remove user creation related docs * remove template	2023-10-23 08:29:09 +02:00
Alexander Kukushkin	c5fffb3c97	Further work on permanent physical slots (#2891 ) - Fixed issues with has_permanent_slots() method. It didn't took into account the case of permanent physical slots for members, falsely concluding that there are no permanent slots. - Write to the status key only LSNs for permanent slots (not just for slots that exist on the primary). - Include pg_current_wal_flush_lsn() to slots feedback, so that slots on standby nodes could be advanced - Improved behave tests: - Verify that permanent slots are properly created on standby nodes - Verify that permanent slots are properly advanced, including DCS failsafe mode - Verify that only permanent slots are written to the `/status`	2023-10-23 08:24:28 +02:00
zhjwpku	cb5f34b721	add some guide to run tests in different scopes (#2921 ) Introduce ways to run tests in different scopes which should be helpful for beginners.	2023-10-23 08:17:53 +02:00
zhjwpku	260ab36f2e	mock getaddrinfo in case test failure (#2918 ) Close #2915	2023-10-17 19:53:19 +02:00
Alexander Kukushkin	fc67ba73f0	Allow to specify psycopg* in extras and switch to `build` (#2907 ) * remove check_psycopg() call from the setup.py, when installing from wheel it doesn't work anyway. * call check_psycopg() function before process_arguments(), because the last one is trying to import psycopg and fails with the stacktrace, while the first one shows a nice human-readable error message. * add psycopg2, psycopg2-binary, and psycopg3 extras, that will install psycopg2>=2.5.4, psycopg2-binary, or psycopg[binary]>=3.0.0 modules respectively. * move check_psycopg() function to the __main__.py. * introduce the new extra called `all`, it will allow to install all dependencies at once (except psycopg related). * use the `build` module in order to create sdist bdist_wheel packages. * update the documentation regarding psycopg and extras (dependencies).	2023-10-17 14:46:15 +02:00
GuanqunYang193	60d8bc3a70	Add warning of removing user creation (#2893 )	2023-10-17 13:04:59 +02:00
Alexander Kukushkin	e513f7f127	Attempt to reduce flakiness for recovery behave test on K8s (#2917 ) wait until Postgres is properly started after the first crash before changing `primary_start_timeout` and killing it once again.	2023-10-17 11:27:41 +02:00
Alexander Kukushkin	aa3ebe0af8	Don't cache anything in Zookeeper implementation (#2909 ) Cache creates a lot of problems and prevents implementing a feature of automatic retention of physical replication slots for members with configurable retention policy. Just read the entire cluster from Zookeeper instead and use watchers only for the `/leader` and `/config` keys.	2023-10-17 08:56:31 +02:00
Alexander Kukushkin	c96e35c807	Enable Citus behave tests for Postgres v16 (#2914 ) and reduce flakiness	2023-10-16 16:05:27 +02:00
André Litfin	88b35252c3	Update README.md to reflect changes in etcd v3 (#2912 ) In etcdctl v3 the ls command isn't present anymore, it has to be changed to etcdctl get --keys-only --prefix	2023-10-16 15:18:25 +02:00
Alexander Kukushkin	d93db20baa	Set citus.local_hostname (#2903 ) There are cases when Citus wants to have a connection to the local postgres. By default it uses `localhost` for that, which is not alwasy available. To solve it we will set `citus.local_hostname` GUC to custom value, which is the same as Patroni uses to connect to Postgres.	2023-10-16 10:21:50 +02:00
Alexander Kukushkin	42976df86f	Make it easier to debug callbacks (#2902 ) 1. Introduce DEBUG logs for callbacks 2. Configure log format in behave tests to include filename, line, and method name that triggered the callback and enable DEBUG logs for `patroni.postgresql.callback_executor` module. P.S. unfortunately it works only starting from python 3.8, but it should be good enough for debug purpose because 3.7 is already EOL.	2023-10-16 08:55:07 +02:00
zhjwpku	6f4c2fe132	%s/iter_dcs_modules/iter_dcs_classes/g (#2905 )	2023-10-11 13:17:18 +02:00
Chris Bandy	588df5da05	Refine the documentation about custom_conf (#2901 ) some back icks in this section needed to be balanced.	2023-10-11 08:41:11 +02:00

1 2 3 4 5 ...

2350 Commits