patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-28 02:20:04 +00:00

Author	SHA1	Message	Date
Polina Bungina	264b2e8be7	Re-enable SSL for MacOS GH action runners	2024-03-06 09:22:18 +01:00
Junwang Zhao	fd3e3ca472	add missing busybox install command (#3029 ) before this patch, when execute `ps` inside container, we see the following error: postgres@a97c9e438eae:~$ ps bash: ps: command not found Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>	2024-03-06 08:06:47 +01:00
zhjwpku	e131065d74	rename citus_handler to mpp_handler (#2991 ) obey the following 5 meanings of terminology _cluster_ in Patroni. 1. PostgreSQL cluster: a cluster of postgresql instances which have the same system identifier. 2. MPP cluster: a cluster of PostgreSQL clusters that one of them acts as Coodinator and others act as workers. 3. Coordinator cluster: a PostgreSQL cluster which act the role of 'coordinator' within a MPP cluster. 4. Worker cluster: a PostgreSQL cluster which act the role 'worker' within a MPP cluster. 5. Patroni cluster: all cluster managed by Patroni can be called Patroni cluster, but we usually use this term to refering a single PostgreSQL cluster or an MPP cluster.	2024-02-28 06:16:20 +01:00
Polina Bungina	bdd02324b4	Add pending restart reason information (#2978 ) Provide info about the PG parameters that caused "pending restart" flag to be set. Both `patronictl list` and `/patroni` REST API endpoint now show the parameters names and the diff as the "pending restart reason".	2024-02-14 08:54:20 +01:00
Israel	7adfc0dbe7	Patroni doesn't filter out some not allowed options from `pg_basebackup` (#3015 ) When running `pg_basebackup` to bootstrap a replica, Patroni sanitizes the custom user options that come from `postgresql.basebackup` configuration section using the `process_user_options` method. However, there is a bug in that method: it filters out not allowed options that are in the format `- setting`, but not the ones in the format `- setting: value` from `postgresql.basebackup`. An example of that issue is the `dbname` setting. If you specify something like this in the configuration file: ```yaml postgresql: basebackup: - dbname: "host=RANDOM" ``` You end up with `--dbname` being specified twice for `pg_basebackup`, with `--dbname='host=RANDOM'` taking precedence as it comes up later in the command. This commit fixes that issue by adding a `continue` statement when the setting in format `- setting: value` is not allowed, thus skipping it. --------- Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>	2024-02-06 08:36:11 +01:00
Polina Bungina	f6943a859d	Improve logging for Pg param change (#3008 ) * Convert old value to a human-readable format * Add log line about pg_controldata/global config mismatch that causes pending restart flag to be set	2024-01-29 10:44:25 +01:00
Alexander Kukushkin	e532f9dc38	Fix bugs introduced in the jsonlog implementation (#3006 ) 1. RotatingFileHandler is a child of StreamHandler, therefore we can't rely on `not isinstance(handler, logging.StreamHandler)`. 2. If the legacy version of `python-json-logger` is installed (that doesn't support rename_fields or static_fields), we want do what is possible rather than fail with the exception. Besides that: 1. improve code coverage 2. make unit tests pass without python-json-logger installed or if only some old version is installed.	2024-01-29 10:37:15 +01:00
Alexander Kukushkin	688c85389c	Release v3.2.2 (#3007 ) - update release notes - bump Patroni version - bump pyright version and fix reported issues - improve compatibility with legacy psycopg2 Co-authored-by: Polina Bungina <bungina@gmail.com>	2024-01-17 08:31:08 +01:00
علی سالمی	5c4ee30dae	Add JSON log format to logging configuration (#2982 ) Now patroni can be configured as bellow to log in json format. ```yaml log: type: json format: - asctime: '@timestamp' - levelname: level - message - module - name: logger_name static_fields: app: patroni ``` This config produce this log: ```json { "@timestamp": "2023-12-14 19:51:24,872", "level": "INFO", "message": "Lock owner: None; I am postgresql1", "module": "ha", "app": "patroni", "logger_name": "patroni.ha" } ```	2024-01-16 10:42:48 +01:00
Polina Bungina	266cdc4810	Fixes around pending_restart flag (#3003 ) * Do not set pending_restart flag if hot_standby is set to 'off' during a custom bootstrap (even though we will have this flag actually set in PG, this configuration parameter is irrelevant on primary and there is no actual need for restart) * Skip hot_standby and wal_log_hints when querying parameters pending restart on config reload. They actually can be changed manually (e.g. via ALTER SYSTEM) and it will cause the pending_restart state in PG but Patroni anyway always passes those params to postmaster as command line options. And there they only can have one value - 'on' (except on primary when performing custom bootstrap)	2024-01-16 10:32:28 +01:00
Alexander Kukushkin	2ac1efea54	Optimize priority failover behave tests (#3004 ) 1. get rid of useless sleep calls 2. call `POST /failover` on the node where we want to failover to	2024-01-15 12:03:14 +01:00
Alexander Kukushkin	5d8c2fb559	Restore recovery GUCs when joining running standby (#2998 ) Close https://github.com/zalando/patroni/issues/2993	2024-01-08 08:35:53 +01:00
Israel	4e5b2ee249	Close the doors for a possible future bug in the config generator (#3000 ) The `AbstractConfigGenerator._format_config` method was missing a comma in the declaration of a tuple. As a consequence it was concatenating the strings `ctl` and `citus` instead of creating two separate items in the tuple. There is currently no observed bug from that issue in the code because the template configuration created by the method `AbstractConfigGenerator.get_template_config` doesn't include either of `ctl` or `citus` keys. However, it is still important that we close the doors for possible future bugs that would come up if we ever attempt to use either of those keys in the template, for example. References: PAT-231.	2024-01-04 12:30:28 +01:00
Sophia Ruan	3390ee9dea	call freeze_support in main module to solve pyinstaller frozen issue (#2996 ) Close #2995	2024-01-04 12:30:03 +01:00
Polina Bungina	71ccf91e36	Don't filter out contradictory nofailover tag (#2992 ) * Ensure that nofailover will always be used if both nofailover and failover_priority tags are provided * Call _validate_failover_tags from reload_local_configuration() as well * Properly check values in the _validate_failover_tags(): nofailover value should be casted to boolean like it is done when accessed in other places	2024-01-02 09:30:18 +01:00
zhjwpku	8acefefc42	Fix Citus bootstrap - CREATE DATABASE cannot be executed from a function (#2994 ) This was introduced by #2990: pod cannot be started and show the following logs: ``` 2023-12-26 03:29:25.569 UTC [47] CONTEXT: SQL statement "CREATE DATABASE "citus"" PL/pgSQL function inline_code_block line 5 at SQL statement 2023-12-26 03:29:25.569 UTC [47] STATEMENT: DO $$ BEGIN PERFORM * FROM pg_catalog.pg_database WHERE datname = 'citus'; IF NOT FOUND THEN CREATE DATABASE "citus"; END IF; END;$$ 2023-12-26 03:29:25,570 ERROR: post_bootstrap Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/bootstrap.py", line 474, in post_bootstrap self._postgresql.citus_handler.bootstrap() File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/mpp/citus.py", line 401, in bootstrap cur.execute(sql.encode('utf-8')) psycopg2.errors.ActiveSqlTransaction: CREATE DATABASE cannot be executed from a function CONTEXT: SQL statement "CREATE DATABASE "citus"" PL/pgSQL function inline_code_block line 5 at SQL statement ``` --------- Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>	2023-12-29 09:01:46 +01:00
Alexander Kukushkin	dd548c4964	Create citus database and extension idempotently (#2990 ) Consider a task: we want to create an extension _before_ citus in a database. Currently `post_bootstrab` script is executed before `CitusHandler.bootstrap()` method, which seems to allow doing that, but in fact `CitusHandler.bootstrap()` will fail to create already existing database and as a result the whole bootstrap will fail. Changing the order of execution of `post_bootstrab` hook and `CitusHandler.bootstrap()` seems to be useless, because it will not allow creating another extension _before_ citus. Therefore the only way of solving it is making CREATE DATABASE and CREATE EXTENSION idempotent. It will allow to create citus database and all dependencies from the `post_bootstrab` hook.	2023-12-21 09:25:51 +01:00
Alexander Kukushkin	bcfd8438a5	Abstract CitusHandler and decouple it from configuration (#2950 ) the main issue was that the configuration for Citus handler and for DCS existed in two places, while ideally AbstractDCS should not know many details about what kind of MPP is in use. To solve the problem we first dynamically create an object implementing AbstractMPP interfaces, which is a configuration for DCS. Later this object is used to instantiate the class implementing AbstractMPPHandler interface. This is just a starting point, which does some heavy lifting. As a next steps all kind of variables named after Citus in files different from patroni/postgres/mpp/citus.py should be renamed. In other words this commit takes over the most complex part of #2940, which was never implemented. Co-authored-by: zhjwpku <zhjwpku@gmail.com>	2023-12-21 08:58:26 +01:00
Alexander Kukushkin	5c3e1a693e	Implement validation of the `log` section (#2989 ) Somehow it was always forgotten.	2023-12-20 10:49:33 +01:00
Polina Bungina	206ee91b07	Exclude leader from failover candidates in ctl (#2983 ) Exclude actual leader (not the passed leader argument) from the candidates list in the `patronictl failover` prompt. Abort `patronictl failover` execution if candidate specified is the same as the current cluster leader	2023-12-20 09:54:04 +01:00
Polina Bungina	c1ee99d81d	Update PG version in a couple of places (#2986 ) * All dockerfiles to use PG16 by default * PGVERSION env in the test pipelines to 16.1-1 by default * 11->14 in the dcs-pg mapping for test pipelines * Code comments fixes	2023-12-18 10:44:05 +01:00
Polina Bungina	f0719d148c	Actually allow failover to an async candidate in sync mode (#2980 )	2023-12-13 08:40:47 +01:00
Polina Bungina	efdedc7049	Reload postgres config if a server param was reset (#2975 ) Fix the case when a parameter value was changed and then reset back to the initial value without restart - before this fix, the second change was not reflected in the Postgres config. This commit also includes the related unit test refactoring.	2023-12-06 15:57:05 +01:00
Alexander Kukushkin	bbddca6a76	Use consistent read when fetching just updated sync key (#2974 ) Consul doesn't provide any interface to immediately get `ModifyIndex` for the key that we just updated, therefore we have to perform an explicit read operation. By default stale reads are allowed and sometimes we may read stale data. As a result write_sync_state() call was considered as failed. To mitigate the problem we switch to `consistent` reads when that executed after update of the `/sync` key. Close #2972	2023-12-06 15:55:51 +01:00
Alexander Kukushkin	a4e0a2220d	Disable SSL for MacOS GH action runners (#2976 ) Latest runners release (20231127.1) somehow broke our tests. Connections to postgres somehow failing with strange error: ``` could not accept SSL connection: Socket operation on non-socket ```	2023-12-06 15:28:03 +01:00
Alexander Kukushkin	0e6a2ff3a9	Don't let replica restore initialize key when DCS was wiped (#2970 ) It was happening from the branch where Patroni was supposed to be complain about converting standalone PG cluster to be governed by Patroni and exit.	2023-12-05 08:30:20 +01:00
Alexander Kukushkin	6976939f09	Release/v3.2.1 (#2968 ) - bump version - bump pyright - update release notes	2023-11-30 16:50:42 +01:00
Waynerv	ef5f320602	Cache `postgres --describe-config` output results (#2967 ) We don't expect GUCs list to change for the same major version and don't expect major version to change while Patroni is running.	2023-11-30 12:02:42 +01:00
Sophia Ruan	47cadc9f63	Fix the issue that REST API returns unknown after postgres restart (#2956 ) Close #2955	2023-11-30 10:02:19 +01:00
Ali Mehraji	5a77cbb087	Update: etcd flags in command in docker-compose.yml and docker-compose-citus.yml (#2966 )	2023-11-30 09:45:07 +01:00
Alexander Kukushkin	92f4aa2ef9	Simplify methods related to replication slots in the Cluster class (#2958 ) Instead of passing around names, specific tags, and Postgres version just pass Postgresql object and objects implementing Tags interface. It should simplify implementation of #2842	2023-11-29 14:22:49 +01:00
Alexander Kukushkin	7c3ce78231	Fix Citus transaction rollback condition check (#2964 ) It seems that sometimes we get an exact match, what makes behave tests to fail.	2023-11-29 08:44:35 +01:00
Laotree	76e19ecfe2	Update README.rst (#2965 ) fix setting.rst link 404, from #2661	2023-11-29 08:43:07 +01:00
Alexander Kukushkin	9afaf6eb51	Don't pass around is_paused to sync_replication_slots (#2963 ) Oversight of #2935	2023-11-28 08:37:22 +01:00
Konstantin Demin	36e3dfbe41	update Dockerfiles (#2937 ) - better cleanup for vim - introduce dumb-init for patroni containers	2023-11-27 09:38:03 +01:00
zhjwpku	bb804074f7	[doc]: fix typos (#2961 )	2023-11-27 08:28:46 +01:00
zhjwpku	ed9d4750f9	fix typo and add gitignore entries (#2959 ) Split unrelated changes from #2940 Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>	2023-11-24 15:17:20 +01:00
Alexander Kukushkin	193c73f6b8	Make GlobalConfig really global (#2935 ) 1. extract `GlobalConfig` class to its own module 2. make the module instantiate the `GlobalConfig` object on load and replace sys.modules with the this instance 3. don't pass `GlobalConfig` object around, but use `patroni.global_config` module everywhere. 4. move `ignore_slots_matchers`, `max_timelines_history`, and `permanent_slots` from `ClusterConfig` to `GlobalConfig`. 5. add `use_slots` property to global_config and remove duplicated code from `Cluster` and `Postgresql.ConfigHandler`. Besides that improve readability of couple of checks in ha.py and formatting of `/config` key when saved from patronictl.	2023-11-24 09:26:05 +01:00
Alexander Kukushkin	91327f943c	Factor out dynamic class finder/loader to a dedicated file (#2954 ) It could be reused to do the same for MPP modules/classes. Ref: #2940 and #2950	2023-11-23 17:04:23 +01:00
Ali Mehraji	ac6f6ae1c2	Add ETCDCTL_API=3 env to Dockerfiles and update docker/README.md (#2946 )	2023-11-22 08:55:51 +01:00
Alexander Kukushkin	70b0991e6a	Bump pyright to 1.1.336 (#2952 ) and fix newly reported issues	2023-11-20 10:22:52 +01:00
Alexander Kukushkin	5dab735534	Compatibility with antient mock (#2951 ) Just in case is someone still uses ubuntu 18.04	2023-11-15 11:25:46 +01:00
Alexander Kukushkin	ecf158bce3	Get rid of pass_obj() in most of patronictl commands (#2945 ) The `obj` could be easily obtained with the help of `click.get_current_context().obj`. Introduced function `is_citus_cluster()` will simplify future refactoring to add support of other MPP databases. In addition to that refactor ctl.py unit tests by moving most of mocks to the global scope.,	2023-11-14 13:44:54 +01:00
Alexander Kukushkin	1870dcd8f9	Fix bug with custom bootstrap (#2948 ) Patroni was falsely applying `--command` argument. Close https://github.com/zalando/patroni/issues/2947	2023-11-13 15:01:57 +01:00
Alexander Kukushkin	7370f70f13	Fix pg_rewind behavior with Postgres v16+ (#2944 ) The error message format was changed in `4ac30ba4f2`, what caused `pg_rewind` being called by Patroni even when it was not necessary.	2023-11-10 09:23:45 +01:00
Alexander Kukushkin	1b96ae9c0a	Fix Etcd v2 with Citus (#2943 ) When deploying a new Citus cluster with Etcd v2 Patroni was failing to start with the following exception: ```python 2023-11-09 10:51:41,246 INFO: Selected new etcd server http://localhost:2379 Traceback (most recent call last): File "/home/akukushkin/git/patroni/./patroni.py", line 6, in <module> main() File "/home/akukushkin/git/patroni/patroni/__main__.py", line 343, in main return patroni_main(args.configfile) File "/home/akukushkin/git/patroni/patroni/__main__.py", line 237, in patroni_main abstract_main(Patroni, configfile) File "/home/akukushkin/git/patroni/patroni/daemon.py", line 172, in abstract_main controller = cls(config) File "/home/akukushkin/git/patroni/patroni/__main__.py", line 66, in __init__ self.ensure_unique_name() File "/home/akukushkin/git/patroni/patroni/__main__.py", line 112, in ensure_unique_name cluster = self.dcs.get_cluster() File "/home/akukushkin/git/patroni/patroni/dcs/__init__.py", line 1654, in get_cluster cluster = self._get_citus_cluster() if self.is_citus_coordinator() else self.__get_patroni_cluster() File "/home/akukushkin/git/patroni/patroni/dcs/__init__.py", line 1638, in _get_citus_cluster cluster = groups.pop(CITUS_COORDINATOR_GROUP_ID, Cluster.empty()) AttributeError: 'Cluster' object has no attribute 'pop' ``` It is broken since #2909. In addition to that fix `_citus_cluster_loader()` interface by allowing it to return only dict obj.	2023-11-09 11:09:38 +01:00
Alexander Kukushkin	3ffd598a1c	Do a real http request when performing name uniqueness check (#2942 ) When running in containers it is possible that the traffic is routed using `docker-proxy`, which listens on the port and accepting incoming connections. This commit effectively sticks to the original solution from #2878	2023-11-08 14:08:02 +01:00
Alexander Kukushkin	552e8643d9	Verify that replica nodes received checkpoint LSN on shutdown (#2939 ) In case if archiving is enabled the `Postgresql.latest_checkpoint_location()` method returns LSN of the prev (SWITCH) record, which points to the beginning of the WAL file. It is done in order to make it possible to safely promote replica which recovers WAL files from the archive and wasn't streaming when the primary was stopped (primary doesn't archive this WAL file). But, in certain cases using the LSN pointing to SWITCH record was causing unnecessary pg_rewind, if replica didn't managed to replay shutdown checkpoint record before it was promoted. In order to mitigate the problem we need to check that replica received/replayed exactly the shutdown checkpoint LSN. But, at the same time we will still write LSN of the SWITCH record to the `/status` key when releasing the leader lock.	2023-11-07 11:05:54 +01:00
Israel	269b04be5d	Add a contrib script for remote Barman recovery (#2931 ) A contrib script, which can be used as a custom bootstrap method, or as a custom create replica method. The script communicates with the pg-backup-api on the Barman node so Patroni is able to restore a Barman backup remotely. The `--help` option of the script, along with the script docstring, should provide some context on how to use fill its parameters. Patroni docs were updated accordingly to share examples about how to configure the script as a custom bootstrap method, or as a custom create replica method. References: PAT-216.	2023-11-06 16:25:27 +01:00
Alexander Kukushkin	8adddb3467	Limit accepted values for --format argument (#2938 ) It used to accept any arbitrary string Close https://github.com/zalando/patroni/issues/2936	2023-11-03 13:02:39 +01:00

1 2 3 4 5 ...

2375 Commits