* Do not set pending_restart flag if hot_standby is set to 'off' during a custom bootstrap (even though we will have this flag actually set in PG, this configuration parameter is irrelevant on primary and there is no actual need for restart)
* Skip hot_standby and wal_log_hints when querying parameters pending restart on config reload. They actually can be changed manually (e.g. via ALTER SYSTEM) and it will cause the pending_restart state in PG but Patroni anyway always passes those params to postmaster as command line options. And there they only can have one value - 'on' (except on primary when performing custom bootstrap)
The `AbstractConfigGenerator._format_config` method was missing a comma in the declaration of a tuple. As a consequence it was concatenating the strings `ctl` and `citus` instead of creating two separate items in the tuple.
There is currently no observed bug from that issue in the code because the template configuration created by the method `AbstractConfigGenerator.get_template_config` doesn't include either of `ctl` or `citus` keys.
However, it is still important that we close the doors for possible future bugs that would come up if we ever attempt to use either of those keys in the template, for example.
References: PAT-231.
* Ensure that nofailover will always be used if both nofailover and
failover_priority tags are provided
* Call _validate_failover_tags from reload_local_configuration() as well
* Properly check values in the _validate_failover_tags(): nofailover value should be casted to boolean like it is done when accessed in other places
This was introduced by #2990: pod cannot be started and show the
following logs:
```
2023-12-26 03:29:25.569 UTC [47] CONTEXT: SQL statement "CREATE DATABASE "citus""
PL/pgSQL function inline_code_block line 5 at SQL statement
2023-12-26 03:29:25.569 UTC [47] STATEMENT: DO $$
BEGIN
PERFORM * FROM pg_catalog.pg_database WHERE datname = 'citus';
IF NOT FOUND THEN
CREATE DATABASE "citus";
END IF;
END;$$
2023-12-26 03:29:25,570 ERROR: post_bootstrap
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/bootstrap.py", line 474, in post_bootstrap
self._postgresql.citus_handler.bootstrap()
File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/mpp/citus.py", line 401, in bootstrap
cur.execute(sql.encode('utf-8'))
psycopg2.errors.ActiveSqlTransaction: CREATE DATABASE cannot be executed from a function
CONTEXT: SQL statement "CREATE DATABASE "citus""
PL/pgSQL function inline_code_block line 5 at SQL statement
```
---------
Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
Consider a task: we want to create an extension _before_ citus in a database. Currently `post_bootstrab` script is executed before `CitusHandler.bootstrap()` method, which seems to allow doing that, but in fact `CitusHandler.bootstrap()` will fail to create already existing database and as a result the whole bootstrap will fail.
Changing the order of execution of `post_bootstrab` hook and `CitusHandler.bootstrap()` seems to be useless, because it will not allow creating another extension _before_ citus. Therefore the only way of solving it is making CREATE DATABASE and CREATE EXTENSION idempotent. It will allow to create citus database and all dependencies from the `post_bootstrab` hook.
Exclude actual leader (not the passed leader argument) from the
candidates list in the `patronictl failover` prompt.
Abort `patronictl failover` execution if candidate specified is
the same as the current cluster leader
Fix the case when a parameter value was changed and then reset back to
the initial value without restart - before this fix, the second change
was not reflected in the Postgres config.
This commit also includes the related unit test refactoring.
Consul doesn't provide any interface to immediately get `ModifyIndex` for the key that we just updated, therefore we have to perform an explicit read operation. By default stale reads are allowed and sometimes we may read stale data. As a result write_sync_state() call was considered as failed. To mitigate the problem we switch to `consistent` reads when that executed after update of the `/sync` key.
Close#2972
When deploying a new Citus cluster with Etcd v2 Patroni was failing to start with the following exception:
```python
2023-11-09 10:51:41,246 INFO: Selected new etcd server http://localhost:2379
Traceback (most recent call last):
File "/home/akukushkin/git/patroni/./patroni.py", line 6, in <module>
main()
File "/home/akukushkin/git/patroni/patroni/__main__.py", line 343, in main
return patroni_main(args.configfile)
File "/home/akukushkin/git/patroni/patroni/__main__.py", line 237, in patroni_main
abstract_main(Patroni, configfile)
File "/home/akukushkin/git/patroni/patroni/daemon.py", line 172, in abstract_main
controller = cls(config)
File "/home/akukushkin/git/patroni/patroni/__main__.py", line 66, in __init__
self.ensure_unique_name()
File "/home/akukushkin/git/patroni/patroni/__main__.py", line 112, in ensure_unique_name
cluster = self.dcs.get_cluster()
File "/home/akukushkin/git/patroni/patroni/dcs/__init__.py", line 1654, in get_cluster
cluster = self._get_citus_cluster() if self.is_citus_coordinator() else self.__get_patroni_cluster()
File "/home/akukushkin/git/patroni/patroni/dcs/__init__.py", line 1638, in _get_citus_cluster
cluster = groups.pop(CITUS_COORDINATOR_GROUP_ID, Cluster.empty())
AttributeError: 'Cluster' object has no attribute 'pop'
```
It is broken since #2909.
In addition to that fix `_citus_cluster_loader()` interface by allowing it to return only dict obj.
When running in containers it is possible that the traffic is routed using `docker-proxy`, which listens on the port and accepting incoming connections.
This commit effectively sticks to the original solution from #2878
In case if archiving is enabled the `Postgresql.latest_checkpoint_location()` method returns LSN of the prev (SWITCH) record, which points to the beginning of the WAL file. It is done in order to make it possible to safely promote replica which recovers WAL files from the archive and wasn't streaming when the primary was stopped (primary doesn't archive this WAL file).
But, in certain cases using the LSN pointing to SWITCH record was causing unnecessary pg_rewind, if replica didn't managed to replay shutdown checkpoint record before it was promoted.
In order to mitigate the problem we need to check that replica received/replayed exactly the shutdown checkpoint LSN. But, at the same time we will still write LSN of the SWITCH record to the `/status` key when releasing the leader lock.
This commit introduces a FAQ page to the docs. The idea is to get
most frequently asked questions answered before-hand, so the user
is able to get them answered quickly without going into detail in
the docs or having to go to Slack/GitHub to clarify questions.
---------
Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
Previous to this commit, if a user would ever like to add parameters to the custom bootstrap script call, they would need to configure Patroni like this:
```
bootstrap:
method: custom_method_name
custom_method_name:
command: /path/to/my/custom_script --arg1=value1 --arg2=value2 ...
```
This commit extends that so we achieve a similar behavior that is seen when using `create_replica_methods`, i.e., we also allow the following syntax:
```
bootstrap:
method: custom_method_name
custom_method_name:
command: /path/to/my/custom_script
arg1: value1
arg2: value2
```
All keys in the mapping which are not recognized by Patroni, will be dealt with as if they were additional named arguments to be passed down to the `command` call.
References: PAT-218.
- order sections similar to sample configs
- add warnings and comments to `bootstrap.dcs` section.
- add `tags` and `log` sections.
- use discovered IPs in `postgresql.connect_address` and `postgresql.listen`
- set `wal_level` to `replica` for PostgreSQL 9.6+
- make unit tests pass with python 3.6
- improve config validator so it doesn't complain when some ints are strings in YAML file.
For now it will sit in the section about the Patroni configuration. We can later move it to (or reference from) a new section where all the functionality of the `patroni` executable will be described.
The priority is configured with `failover_priority` tag. Possible values are from `0` till infinity, where `0` means that the node will never become the leader, which is the same as `nofailover` tag set to `true`. As a result, in the configuration file one should set only one of `failover_priority` or `nofailover` tags.
The failover priority kicks in only when there are more than one node have the same receive/replay LSN and are ahead of other nodes in the cluster. In this case the node with higher value of `failover_priority` is preferred. If there is a node with higher values of receive/replay LSN, it will become the new leader even if it has lower value of `failover_priority` (except when priority is set to 0).
Close https://github.com/zalando/patroni/issues/2759
This commit changes the `patronictl` application in such a way its
`--dcs` argument is now able to receive a namespace.
Previous to this commit this was the format of that argument's value:
`DCS://HOST:PORT`.
From now on it accepts this format: `DCS://HORT:PORT/NAMESPACE`. As all
previous parts of the argument value, `NAMESPACE` is optional, and if
not given `patronictl` will fallback to the value from the configuration
file, if any, or to `service`.
This change is specifically useful when you are running a cluster in a
custom namespace, and from a machine where you don't have a configuration
file for Patroni or `patronictl`. It can avoid that you would have to
create a configuration file only with `namespace` filed in that case.
Issue reported by: Shaun Thomas <shaun@bonesmoses.org>
Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
The error is raised if Etcd is configured to use JWT auth tokens and when the user database in Etcd is updated, because the update invalidates all tokens.
If retries are requested - try to get a new new token and repeat the request. Repeat it in a loop until request is successfully executed or until `retry_timeout` is exhausted. This is the only way of solving a race condition, because between authentication and executing the request yet another modification of the user database in Etcd might happen.
In case if the request doesn't have to be immediately retried - set a flag that the next API request should perform the authentication first and let Patroni to naturally repeat the request on the next heartbeat loop.
Co-authored-by: Kenny Do <kedo@render.com>
Ref: https://github.com/zalando/patroni/pull/2911
PR #2909 remove the cache in Zookeeper implementation of DCS, so
the comment of get_cluster should be changed to 'Retrieve a fresh
view of DCS' since every implementation does so.
Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
- Fixed issues with has_permanent_slots() method. It didn't took into account the case of permanent physical slots for members, falsely concluding that there are no permanent slots.
- Write to the status key only LSNs for permanent slots (not just for slots that exist on the primary).
- Include pg_current_wal_flush_lsn() to slots feedback, so that slots on standby nodes could be advanced
- Improved behave tests:
- Verify that permanent slots are properly created on standby nodes
- Verify that permanent slots are properly advanced, including DCS failsafe mode
- Verify that only permanent slots are written to the `/status`
* remove check_psycopg() call from the setup.py, when installing from wheel it doesn't work anyway.
* call check_psycopg() function before process_arguments(), because the last one is trying to import psycopg and fails with the stacktrace, while the first one shows a nice human-readable error message.
* add psycopg2, psycopg2-binary, and psycopg3 extras, that will install psycopg2>=2.5.4, psycopg2-binary, or psycopg[binary]>=3.0.0 modules respectively.
* move check_psycopg() function to the __main__.py.
* introduce the new extra called `all`, it will allow to install all dependencies at once (except psycopg related).
* use the `build` module in order to create sdist bdist_wheel packages.
* update the documentation regarding psycopg and extras (dependencies).
Cache creates a lot of problems and prevents implementing a feature of automatic retention of physical replication slots for members with configurable retention policy.
Just read the entire cluster from Zookeeper instead and use watchers only for the `/leader` and `/config` keys.
There are cases when Citus wants to have a connection to the local postgres. By default it uses `localhost` for that, which is not alwasy available. To solve it we will set `citus.local_hostname` GUC to custom value, which is the same as Patroni uses to connect to Postgres.
1. Introduce DEBUG logs for callbacks
2. Configure log format in behave tests to include filename, line, and method name that triggered the callback and enable DEBUG logs for `patroni.postgresql.callback_executor` module.
P.S. unfortunately it works only starting from python 3.8, but it should be good enough for debug purpose because 3.7 is already EOL.