2350 Commits

Author SHA1 Message Date
Alexander Kukushkin
c8e32775df Release v3.2.2 (#3007)
- update release notes
- bump Patroni version
- bump pyright version and fix reported issues
- improve compatibility with legacy psycopg2

Co-authored-by: Polina Bungina <bungina@gmail.com>
v3.2.2
2024-01-17 08:35:35 +01:00
Polina Bungina
f2919f9c2f Fixes around pending_restart flag (#3003)
* Do not set pending_restart flag if hot_standby is set to 'off' during a custom bootstrap (even though we will have this flag actually set in PG, this configuration parameter is irrelevant on primary and there is no actual need for restart)
* Skip hot_standby and wal_log_hints when querying parameters pending restart on config reload. They actually can be changed manually (e.g. via ALTER SYSTEM) and it will cause the pending_restart state in PG but Patroni anyway always passes those params to postmaster as command line options. And there they only can have one value - 'on' (except on primary when performing custom bootstrap)
2024-01-16 10:44:30 +01:00
Alexander Kukushkin
f59c79740f Optimize priority failover behave tests (#3004)
1. get rid of useless sleep calls
2. call `POST /failover` on the node where we want to failover to
2024-01-15 12:24:42 +01:00
Alexander Kukushkin
2a64bfd459 Restore recovery GUCs when joining running standby (#2998)
Close https://github.com/zalando/patroni/issues/2993
2024-01-08 09:17:17 +01:00
Israel
23067d7ea7 Close the doors for a possible future bug in the config generator (#3000)
The `AbstractConfigGenerator._format_config` method was missing a comma in the declaration of a tuple. As a consequence it was concatenating the strings `ctl` and `citus` instead of creating two separate items in the tuple.

There is currently no observed bug from that issue in the code because the template configuration created by the method `AbstractConfigGenerator.get_template_config` doesn't include either of `ctl` or `citus` keys.

However, it is still important that we close the doors for possible future bugs that would come up if we ever attempt to use either of those keys in the template, for example.

References: PAT-231.
2024-01-05 10:17:07 +01:00
Sophia Ruan
47063de46d call freeze_support in main module to solve pyinstaller frozen issue (#2996)
Close #2995
2024-01-05 10:17:01 +01:00
Polina Bungina
3e9bceac11 Don't filter out contradictory nofailover tag (#2992)
* Ensure that nofailover will always be used if both nofailover and
failover_priority tags are provided
* Call _validate_failover_tags from reload_local_configuration() as well
* Properly check values in the _validate_failover_tags(): nofailover value should be casted to boolean like it is done when accessed in other places
2024-01-05 10:16:52 +01:00
zhjwpku
9cc1f8e763 Fix Citus bootstrap - CREATE DATABASE cannot be executed from a function (#2994)
This was introduced by #2990: pod cannot be started and show the
following logs:

```
2023-12-26 03:29:25.569 UTC [47] CONTEXT:  SQL statement "CREATE DATABASE "citus""
        PL/pgSQL function inline_code_block line 5 at SQL statement
2023-12-26 03:29:25.569 UTC [47] STATEMENT:  DO $$
        BEGIN
            PERFORM * FROM pg_catalog.pg_database WHERE datname = 'citus';
            IF NOT FOUND THEN
                CREATE DATABASE "citus";
            END IF;
        END;$$
2023-12-26 03:29:25,570 ERROR: post_bootstrap
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/bootstrap.py", line 474, in post_bootstrap
    self._postgresql.citus_handler.bootstrap()
  File "/usr/local/lib/python3.11/dist-packages/patroni/postgresql/mpp/citus.py", line 401, in bootstrap
    cur.execute(sql.encode('utf-8'))
psycopg2.errors.ActiveSqlTransaction: CREATE DATABASE cannot be executed from a function
CONTEXT:  SQL statement "CREATE DATABASE "citus""
PL/pgSQL function inline_code_block line 5 at SQL statement
```
---------

Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
2024-01-05 10:16:26 +01:00
Alexander Kukushkin
d00f5a645b Create citus database and extension idempotently (#2990)
Consider a task: we want to create an extension _before_ citus in a database. Currently `post_bootstrab` script is executed before `CitusHandler.bootstrap()` method, which seems to allow doing that, but in fact `CitusHandler.bootstrap()` will fail to create already existing database and as a result the whole bootstrap will fail.

Changing the order of execution of `post_bootstrab` hook and `CitusHandler.bootstrap()` seems to be useless, because it will not allow creating another extension _before_ citus. Therefore the only way of solving it is making CREATE DATABASE and CREATE EXTENSION idempotent. It will allow to create citus database and all dependencies from the `post_bootstrab` hook.
2024-01-05 10:14:45 +01:00
Polina Bungina
15b57c5bdc Exclude leader from failover candidates in ctl (#2983)
Exclude actual leader (not the passed leader argument) from the
candidates list in the `patronictl failover` prompt.
Abort `patronictl failover` execution if candidate specified is
the same as the current cluster leader
2024-01-05 10:12:33 +01:00
Polina Bungina
f10e4805db Actually allow failover to an async candidate in sync mode (#2980) 2024-01-05 10:05:28 +01:00
Polina Bungina
3e0e91f905 Reload postgres config if a server param was reset (#2975)
Fix the case when a parameter value was changed and then reset back to
the initial value without restart - before this fix, the second change
was not reflected in the Postgres config.
This commit also includes the related unit test refactoring.
2024-01-05 09:56:01 +01:00
Alexander Kukushkin
51a148fcf3 Use consistent read when fetching just updated sync key (#2974)
Consul doesn't provide any interface to immediately get `ModifyIndex` for the key that we just updated, therefore we have to perform an explicit read operation. By default stale reads are allowed and sometimes we may read stale data. As a result write_sync_state() call was considered as failed. To mitigate the problem we switch to `consistent` reads when that executed after update of the `/sync` key.

Close #2972
2024-01-05 09:55:43 +01:00
Alexander Kukushkin
c3697738b1 Disable SSL for MacOS GH action runners (#2976)
Latest runners release (20231127.1) somehow broke our tests. Connections to postgres somehow failing with strange error:
```
could not accept SSL connection: Socket operation on non-socket
```
2024-01-05 09:55:27 +01:00
Alexander Kukushkin
722b4b72a8 Don't let replica restore initialize key when DCS was wiped (#2970)
It was happening from the branch where Patroni was supposed to be complain about converting standalone PG cluster to be governed by Patroni and exit.
2024-01-05 09:55:10 +01:00
Alexander Kukushkin
65b43c39fa Release/v3.2.1 (#2968)
- bump version
- bump pyright
- update release notes
v3.2.1
2023-11-30 16:51:21 +01:00
Waynerv
aea9a2b0ca Cache postgres --describe-config output results (#2967)
We don't expect GUCs list to change for the same major version and don't expect major version to change while Patroni is running.
2023-11-30 12:07:06 +01:00
Sophia Ruan
71ccd41915 Fix the issue that REST API returns unknown after postgres restart (#2956)
Close #2955
2023-11-30 10:16:51 +01:00
Alexander Kukushkin
49e4a6ed7d Fix Citus transaction rollback condition check (#2964)
It seems that sometimes we get an exact match, what makes behave tests to fail.
2023-11-30 09:02:50 +01:00
Alexander Kukushkin
ebd05871d9 Bump pyright to 1.1.336 (#2952)
and fix newly reported issues
2023-11-30 09:02:16 +01:00
Alexander Kukushkin
42cd803619 Fix bug with custom bootstrap (#2948)
Patroni was falsely applying `--command` argument.

Close https://github.com/zalando/patroni/issues/2947
2023-11-30 09:01:47 +01:00
Alexander Kukushkin
bae72df5b1 Fix pg_rewind behavior with Postgres v16+ (#2944)
The error message format was changed in
4ac30ba4f2, what caused `pg_rewind` being called by Patroni even when it was not necessary.
2023-11-30 09:01:41 +01:00
Alexander Kukushkin
f2a129f209 Fix Etcd v2 with Citus (#2943)
When deploying a new Citus cluster with Etcd v2 Patroni was failing to start with the following exception:
```python
2023-11-09 10:51:41,246 INFO: Selected new etcd server http://localhost:2379
Traceback (most recent call last):
  File "/home/akukushkin/git/patroni/./patroni.py", line 6, in <module>
    main()
  File "/home/akukushkin/git/patroni/patroni/__main__.py", line 343, in main
    return patroni_main(args.configfile)
  File "/home/akukushkin/git/patroni/patroni/__main__.py", line 237, in patroni_main
    abstract_main(Patroni, configfile)
  File "/home/akukushkin/git/patroni/patroni/daemon.py", line 172, in abstract_main
    controller = cls(config)
  File "/home/akukushkin/git/patroni/patroni/__main__.py", line 66, in __init__
    self.ensure_unique_name()
  File "/home/akukushkin/git/patroni/patroni/__main__.py", line 112, in ensure_unique_name
    cluster = self.dcs.get_cluster()
  File "/home/akukushkin/git/patroni/patroni/dcs/__init__.py", line 1654, in get_cluster
    cluster = self._get_citus_cluster() if self.is_citus_coordinator() else self.__get_patroni_cluster()
  File "/home/akukushkin/git/patroni/patroni/dcs/__init__.py", line 1638, in _get_citus_cluster
    cluster = groups.pop(CITUS_COORDINATOR_GROUP_ID, Cluster.empty())
AttributeError: 'Cluster' object has no attribute 'pop'
```

It is broken since #2909.

In addition to that fix `_citus_cluster_loader()` interface by allowing it to return only dict obj.
2023-11-30 09:01:19 +01:00
Alexander Kukushkin
df0fd91614 Do a real http request when performing name uniqueness check (#2942)
When running in containers it is possible that the traffic is routed using `docker-proxy`, which listens on the port and accepting incoming connections.

This commit effectively sticks to the original solution from #2878
2023-11-30 09:01:11 +01:00
Alexander Kukushkin
43f23df974 Verify that replica nodes received checkpoint LSN on shutdown (#2939)
In case if archiving is enabled the `Postgresql.latest_checkpoint_location()` method returns LSN of the prev (SWITCH) record, which points to the beginning of the WAL file. It is done in order to make it possible to safely promote replica which recovers WAL files from the archive and wasn't streaming when the primary was stopped (primary doesn't archive this WAL file).

But, in certain cases using the LSN pointing to SWITCH record was causing unnecessary pg_rewind, if replica didn't managed to replay shutdown checkpoint record before it was promoted.

In order to mitigate the problem we need to check that replica received/replayed exactly the shutdown checkpoint LSN. But, at the same time we will still write LSN of the SWITCH record to the `/status` key when releasing the leader lock.
2023-11-30 09:01:05 +01:00
Alexander Kukushkin
42bf1f95a3 Limit accepted values for --format argument (#2938)
It used to accept any arbitrary string

Close https://github.com/zalando/patroni/issues/2936
2023-11-30 09:00:39 +01:00
Israel
23200daada Add a FAQ page to the docs (#2933)
This commit introduces a FAQ page to the docs. The idea is to get
most frequently asked questions answered before-hand, so the user
is able to get them answered quickly without going into detail in
the docs or having to go to Slack/GitHub to clarify questions.

---------
Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
2023-11-30 09:00:22 +01:00
Alexander Kukushkin
ce10e5fccc Release v3.2.0 (#2930)
- bump version
- bump pyright and apply fixes
- update release notes
v3.2.0
2023-10-25 16:13:30 +02:00
Israel
bb90feb393 Add support for additional parameters on custom bootstrap (#2927)
Previous to this commit, if a user would ever like to add parameters to the custom bootstrap script call, they would need to configure Patroni like this:

```
bootstrap:
  method: custom_method_name
  custom_method_name:
    command: /path/to/my/custom_script --arg1=value1 --arg2=value2 ...
```

This commit extends that so we achieve a similar behavior that is seen when using `create_replica_methods`, i.e., we also allow the following syntax:

```
bootstrap:
  method: custom_method_name
  custom_method_name:
    command: /path/to/my/custom_script
    arg1: value1
    arg2: value2
```

All keys in the mapping which are not recognized by Patroni, will be dealt with as if they were additional named arguments to be passed down to the `command` call.

References: PAT-218.
2023-10-25 15:01:08 +02:00
Alexander Kukushkin
3d527f5728 Improve formatting of generated config and validation of ints (#2928)
- order sections similar to sample configs
- add warnings and comments to `bootstrap.dcs` section.
- add `tags` and `log` sections.
- use discovered IPs in `postgresql.connect_address` and `postgresql.listen`
- set `wal_level` to `replica` for PostgreSQL 9.6+
- make unit tests pass with python 3.6
- improve config validator so it doesn't complain when some ints are strings in YAML file.
2023-10-25 14:23:57 +02:00
Polina Bungina
6c06f5cc96 Add initial docs for patroni --validate/generate config (#2929)
For now it will sit in the section about the Patroni configuration. We can later move it to (or reference from) a new section where all the functionality of the `patroni` executable will be described.
2023-10-25 14:20:17 +02:00
Mark Pekala
f5ee67fa1c Feature: failover priority (#2780)
The priority is configured with `failover_priority` tag. Possible values are from `0` till infinity, where `0` means that the node will never become the leader, which is the same as `nofailover` tag set to `true`. As a result, in the configuration file one should set only one of `failover_priority` or `nofailover` tags.

The failover priority kicks in only when there are more than one node have the same receive/replay LSN and are ahead of other nodes in the cluster. In this case the node with higher value of `failover_priority` is preferred. If there is a node with higher values of receive/replay LSN, it will become the new leader even if it has lower value of `failover_priority` (except when priority is set to 0).

Close https://github.com/zalando/patroni/issues/2759
2023-10-24 12:22:48 +02:00
Israel
65030c56ee Add capability of specifying namespace through --dcs argument (#2926)
This commit changes the `patronictl` application in such a way its
`--dcs` argument is now able to receive a namespace.

Previous to this commit this was the format of that argument's value:
`DCS://HOST:PORT`.

From now on it accepts this format: `DCS://HORT:PORT/NAMESPACE`. As all
previous parts of the argument value, `NAMESPACE` is optional, and if
not given `patronictl` will fallback to the value from the configuration
file, if any, or to `service`.

This change is specifically useful when you are running a cluster in a
custom namespace, and from a machine where you don't have a configuration
file for Patroni or `patronictl`. It can avoid that you would have to
create a configuration file only with `namespace` filed in that case.

Issue reported by: Shaun Thomas <shaun@bonesmoses.org>

Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
2023-10-24 12:09:44 +02:00
Alexander Kukushkin
d471f1156d Handle AuthOldRevision error (#2913)
The error is raised if Etcd is configured to use JWT auth tokens and when the user database in Etcd is updated, because the update invalidates all tokens.

If retries are requested - try to get a new new token and repeat the request. Repeat it in a loop until request is successfully executed or until `retry_timeout` is exhausted. This is the only way of solving a race condition, because between authentication and executing the request yet another modification of the user database in Etcd might happen.

In case if the request doesn't have to be immediately retried - set a flag that the next API request should perform the authentication first and let Patroni to naturally repeat the request on the next heartbeat loop.

Co-authored-by: Kenny Do <kedo@render.com>
Ref: https://github.com/zalando/patroni/pull/2911
2023-10-23 14:00:37 +02:00
Alexander Kukushkin
6d98944e73 Add warning to the sample config about bootstrap section (#2925)
often people are trying to change it and coming with the questions why it doesn't work.
2023-10-23 10:03:18 +02:00
zhjwpku
6cfd90401e get rid of stale comment of get_cluster (#2922)
PR #2909 remove the cache in Zookeeper implementation of DCS, so
the comment of get_cluster should be changed to 'Retrieve a fresh
view of DCS' since every implementation does so.

Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
2023-10-23 08:30:13 +02:00
GuanqunYang193
ce187bec38 Remove user creation related docs (#2920)
* Remove user creation related docs
* remove template
2023-10-23 08:29:09 +02:00
Alexander Kukushkin
c5fffb3c97 Further work on permanent physical slots (#2891)
- Fixed issues with has_permanent_slots() method. It didn't took into account the case of permanent physical slots for members, falsely concluding that there are no permanent slots.
- Write to the status key only LSNs for permanent slots (not just for slots that exist on the primary).
  - Include pg_current_wal_flush_lsn() to slots feedback, so that slots on standby nodes could be advanced
- Improved behave tests:
  - Verify that permanent slots are properly created on standby nodes
  - Verify that permanent slots are properly advanced, including DCS failsafe mode
  - Verify that only permanent slots are written to the `/status`
2023-10-23 08:24:28 +02:00
zhjwpku
cb5f34b721 add some guide to run tests in different scopes (#2921)
Introduce ways to run tests in different scopes which should be helpful for beginners.
2023-10-23 08:17:53 +02:00
zhjwpku
260ab36f2e mock getaddrinfo in case test failure (#2918)
Close #2915
2023-10-17 19:53:19 +02:00
Alexander Kukushkin
fc67ba73f0 Allow to specify psycopg* in extras and switch to build (#2907)
* remove check_psycopg() call from the setup.py, when installing from wheel it doesn't work anyway.
* call check_psycopg() function before process_arguments(), because the last one is trying to import psycopg and fails with the stacktrace, while the first one shows a nice human-readable error message.
* add psycopg2, psycopg2-binary, and psycopg3 extras, that will install psycopg2>=2.5.4, psycopg2-binary, or psycopg[binary]>=3.0.0 modules respectively.
* move check_psycopg() function to the __main__.py.
* introduce the new extra called `all`, it will allow to install all dependencies at once (except psycopg related).
* use the `build` module in order to create sdist bdist_wheel packages.
* update the documentation regarding psycopg and extras (dependencies).
2023-10-17 14:46:15 +02:00
GuanqunYang193
60d8bc3a70 Add warning of removing user creation (#2893) 2023-10-17 13:04:59 +02:00
Alexander Kukushkin
e513f7f127 Attempt to reduce flakiness for recovery behave test on K8s (#2917)
wait until Postgres is properly started after the first crash before changing `primary_start_timeout` and killing it once again.
2023-10-17 11:27:41 +02:00
Alexander Kukushkin
aa3ebe0af8 Don't cache anything in Zookeeper implementation (#2909)
Cache creates a lot of problems and prevents implementing a feature of automatic retention of physical replication slots for members with configurable retention policy.

Just read the entire cluster from Zookeeper instead and use watchers only for the `/leader` and `/config` keys.
2023-10-17 08:56:31 +02:00
Alexander Kukushkin
c96e35c807 Enable Citus behave tests for Postgres v16 (#2914)
and reduce flakiness
2023-10-16 16:05:27 +02:00
André Litfin
88b35252c3 Update README.md to reflect changes in etcd v3 (#2912)
In etcdctl v3 the ls command isn't present anymore, it has to be changed to etcdctl get --keys-only --prefix
2023-10-16 15:18:25 +02:00
Alexander Kukushkin
d93db20baa Set citus.local_hostname (#2903)
There are cases when Citus wants to have a connection to the local postgres. By default it uses `localhost` for that, which is not alwasy available. To solve it we will set `citus.local_hostname` GUC to custom value, which is the same as Patroni uses to connect to Postgres.
2023-10-16 10:21:50 +02:00
Alexander Kukushkin
42976df86f Make it easier to debug callbacks (#2902)
1. Introduce DEBUG logs for callbacks
2. Configure log format in behave tests to include filename, line, and method name that triggered the callback and enable DEBUG logs for `patroni.postgresql.callback_executor` module.

P.S. unfortunately it works only starting from python 3.8, but it should be good enough for debug purpose because 3.7 is already EOL.
2023-10-16 08:55:07 +02:00
zhjwpku
6f4c2fe132 %s/iter_dcs_modules/iter_dcs_classes/g (#2905) 2023-10-11 13:17:18 +02:00
Chris Bandy
588df5da05 Refine the documentation about custom_conf (#2901)
some back icks in this section needed to be balanced.
2023-10-11 08:41:11 +02:00