Commit Graph

302 Commits

Author SHA1 Message Date
Alexander Kukushkin
db82a83eb4 Fix bug in member slots retention feature (#3142)
If `name` contains upper case or special characters the node was creating unused replication slot for itself.
2024-08-30 16:36:05 +02:00
Polina Bungina
3ecdf01b50 Release v4.0.0 (#3141)
- bump version
- update release notes
- adjust docs
- bump pyright version
- improve unit-test coverage
2024-08-29 14:37:13 +02:00
Sahil Naphade
c9322df095 Added a new flag to ignore unsuccessful bind (#3138) 2024-08-29 09:39:38 +02:00
Alexander Kukushkin
b470ade20e Change master->primary, take two (#3127)
This commit is a breaking change:
1. `role` in DCS is written as "primary" instead of "master".
2. `role` in REST API responses is also written as "primary".
3. REST API no longer accepts role=master in requests (for example switchover/failover/restart endpoints).
4. `/metrics` REST API endpoint will no longer report `patroni_master`.
5. `patronictl` no longer accepts `--master` argument.
6. `no_master` option in declarative configuration of custom replica creation methods is no longer treated as a special option, please use `no_leader` instead.
7. `patroni_wale_restore` doesn't accept `--no_master` anymore.
8. `patroni_barman` doesn't accept `--role=master` anymore.
9. callback scripts will be executed with role=primary instead of role=master
10. On Kubernetes Patroni by default will set role label to primary. In case if you want to keep old behavior and avoid downtime or lengthy complex migrations you can configure `kubernetes.leader_label_value` and `kubernetes.standby_leader_label_value` to `master`.

However, a few exceptions regarding master are still in place:
1. `GET /master` REST API endpoint will continue to work.
2. `master_start_timeout` and `master_stop_timeout` in global configuration are still accepted.
3. `master` tag is still preserved in Consul services in addition to `primary`.

Rationale for these exceptions: DBA doesn't always 100% control the infrastructure and can't adjust the configuration.
2024-08-28 17:19:00 +02:00
Alexander Kukushkin
8cdb0c25d9 Follow up on #2755 (#3137)
- don't register secondaries with `noloadbalance` tag.
- mention in the documentation that secondaries are also registered in `pg_dist_node`.
- update docker/kubernetes README files to include examples with secondaries being registered in `pg_dist_node`.
2024-08-27 09:34:12 +02:00
Alexander Kukushkin
6d65aa311a Configurable retention of members replication slots (#3108)
Current problem of Patroni that strikes many people is that it removes replication slot for member which key is expired from DCS. As a result, when the replica comes back from a scheduled maintenance WAL segments could be already absent, and it can't continue streaming without pulling files from archive.
With PostgreSQL 16 and newer we get another problem: logical slot on a standby node could be invalidated if physical replication slot on the primary was removed (and `pg_catalog` vacuumed).
The most problematic environment is Kubernetes, where slot is removed nearly instantly when member Pod is deleted.

So far, one of the recommended solutions was to configure permanent physical slots with names that match member names to avoid removal of replication slots. It works, but depending on environment might be non-trivial to implement (when for example members may change their names).

This PR implements support of `member_slots_ttl` global configuration parameter, that controls for how long member replication slots should be kept when the member key is absent. Default value is set to `30min`.
The feature is supported only starting from PostgreSQL 11 and newer, because we want to retain slots not only on the leader node, but on all nodes that could potentially become the new leader, and they should be moved forward using `pg_replication_slot_advance()` function.

One could disable feature and get back to the old behavior by setting `member_slots_ttl` to `0`.
2024-08-23 14:50:36 +02:00
Polina Bungina
31cf951b69 Remove patronictl failover --leader option (#3129)
Option has been deprecated and should be removed in the new major release
2024-08-16 10:18:18 +02:00
Polina Bungina
fc5a8ed01c Add synchronous_node_count to dynamic conf doc (#3124) 2024-08-13 15:28:37 +02:00
Alexander Kukushkin
384705ad97 Quorum based failover (#2668)
To enable quorum commit:
```diff
$ patronictl.py edit-config
--- 
+++ 
@@ -5,3 +5,4 @@
   use_pg_rewind: true
 retry_timeout: 10
 ttl: 30
+synchronous_mode: quorum

Apply these changes? [y/N]: y
Configuration changed
```

By default Patroni will use `ANY 1(list,of,stanbys)` in `synchronous_standby_names`. That is, only one node out of listed replicas will be used for quorum.
If you want to increase the number of quorum nodes it is possible to do it with:
```diff
$ patronictl edit-config
--- 
+++ 
@@ -6,3 +6,4 @@
 retry_timeout: 10
 synchronous_mode: quorum
 ttl: 30
+synchronous_node_count: 2

Apply these changes? [y/N]: y
Configuration changed
```

Good old `synchronous_mode: on` is still supported.

Close https://github.com/patroni/patroni/issues/664
Close https://github.com/zalando/patroni/pull/672
2024-08-13 08:51:01 +02:00
Alexander Kukushkin
56dba93c55 Implement support of log.mode. (#3122)
There was one oversight of #2781 - to influence external tools that Patroni could execute, we set global `umask` value based on permissions of the $PGDATA directory. As a result, it also influenced permissions of log files created by Patroni.

To address the problem we implement two measures:
1. Make `log.mode` configurable.
2. If the value is not set - calculate permissions from the original value of the umask setting.
2024-08-13 08:11:28 +02:00
Alexander Kukushkin
4456e267eb Patroni doesn't forece wal_log_hints anymore (#3109)
We forgot to update it in https://github.com/patroni/patroni/pull/3063
2024-07-22 09:42:37 +02:00
Polina Bungina
fbbd32a537 Release v3.3.2 (#3099)
* Update release notes, bump version
* Fix rn
* Bump pyright
2024-07-11 13:01:57 +02:00
RMT
f00826d6e6 Make `postgresql.parameter` documentation more clear (#3092) 2024-07-10 08:16:03 +02:00
Polina Bungina
7067d744c6 Fix release notes indent (#3088) 2024-06-17 18:41:00 +02:00
Polina Bungina
6b7ec49282 Release v3.3.1 (#3087)
* Update release notes
* Bump version
* Bump pyright version and solve reported issues

---------

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2024-06-17 17:45:10 +02:00
Polina Bungina
d4fd782038 Change all links and org references (#3086)
* Change all links and org references

* Update coverage status badge
2024-06-17 10:28:21 +02:00
Polina Bungina
6e1f9f7a6e Prepare repo migration (#3085) 2024-06-17 09:04:43 +02:00
Alexander Kukushkin
b6c5a12017 Fix infinite recursion in in replicatefrom tags (#3072)
Besides that:
1. fix problem with is_physical_slot() methods, it was returning false positives for logical slots.
2. Fix a little issue with replicatefrom docs.

Close https://github.com/zalando/patroni/issues/3068
2024-06-12 10:26:18 +02:00
jostaub
0a91948a49 Doc improvement: mention requirement to use gRPC gateway with EtcdV3 (#3073) 2024-06-11 12:12:00 +02:00
Paul_Kim
0a6c09e252 Make wal_log_hints configurable (#3063)
Close #1942
2024-05-24 09:55:26 +02:00
Alexander Kukushkin
ff99d29e6d Add date to every released version (#3057)
Going to GH releases and/or tags to get it is not very convinient.
2024-05-07 09:53:35 +02:00
Polina Bungina
9d231aeecd Fix readthedocs builds (#3046)
- Mitigate removal of the -E build option (rtd now reuses build env)
- Fix psycopg module's docs
- Properly remove modules/* docs for epub and latex

---------

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2024-04-08 08:28:20 +02:00
Alexander Kukushkin
48fbf64ea9 Release v3.3.0 (#3043)
* Make sure tests are not making external calls
and pass url with scheme to urllib3 to avoid warnings

* Make sure unit tests not rely on filesystem state

* Bump pyright and "solve" reported "issues"

Most of them are related to partially unknown types of values from empty
dict or list. To solve it for the empty dict we use `EMPTY_DICT` object of
newly introduced `_FrozenDict` class.

* Improve unit-tests code coverage

* Add release notes for 3.3.0

* Bump version

* Fix pyinstaller spec file

* python 3.6 compatibility

---------

Co-authored-by: Polina Bungina <27892524+hughcapet@users.noreply.github.com>
2024-04-04 17:51:26 +02:00
Alexander Kukushkin
d7454f7bcd Use target_session_attrs only when multiple hosts in standby_cluster (#3040)
Actually comment in the code was already saying that, but on practice it didn't happen.

It should help #3039
2024-04-02 11:59:57 +02:00
Grigory Smolkin
b09af642e6 Disable WAL streaming on standby node via new boolean tag "nostream" (#2842)
Add support for ``nostream`` tag. If set to ``true`` the node will not use replication protocol to stream WAL. It will rely instead on archive recovery (if ``restore_command`` is configured) and ``pg_wal``/``pg_xlog`` polling. It also disables copying and synchronization of permanent logical replication slots on the node itself and all its cascading replicas. Setting this tag on primary node has no effect.
2024-03-20 10:10:53 +01:00
Israel
014777b20a Refactor Barman scripts and add a sub-command to switch Barman config (#3016)
We currently have a script named `patroni_barman_recover` in Patroni, which is intended to be used as a custom bootstrap method, or as a custom replica creation method.

Now there is need of one more Barman related script in Patroni to handle switching of config models in Barman upon `on_role_change` events.

However, instead of creating another Patroni script, let's say `patroni_barman_config_switch`, and duplicating a lot of logic in the code, we decided to refactor the code so:

* Instead of two separate scripts (`patroni_barman_recover` and `patroni_barman_config_switch`), we have a single script (`patroni_barman`) with 2 sub-commands (`recover` and `config-switch`)

This is the overview of changes that have been performed:

* File `patroni.scripts.barman_recover` has been removed, and its logic has been split into a few files:
  * `patroni.scripts.barman.cli`: handles the entrypoint of the new `patroni_barman` command, exposing the argument parser and calling the appropriate functions depending on the sub-command
  * `patroni.scripts.barman.utils`: implements utilitary enums, functions and classes wich can be used by `cli` and by sub-commands implementation:
    * retry mechanism
    * logging set up
    * communication with pg-backup-api
  * `patroni.scripts.barman.recover`: implements the `recover` sub-command only
* File `patroni.tests.test_barman_recover` has been renamed as `patroni.tests.test_barman`
* File `patroni.scripts.barman.config_switch` was created to implement the `config-switch` sub-command only
* `setup.py` has been changed so it generates a `patroni_barman` application instead of `patroni_barman_recover`
* Docs and unit tests were updated accordingly

References: PAT-154.
2024-03-20 09:04:55 +01:00
Alexander Kukushkin
688c85389c Release v3.2.2 (#3007)
- update release notes
- bump Patroni version
- bump pyright version and fix reported issues
- improve compatibility with legacy psycopg2

Co-authored-by: Polina Bungina <bungina@gmail.com>
2024-01-17 08:31:08 +01:00
علی سالمی
5c4ee30dae Add JSON log format to logging configuration (#2982)
Now patroni can be configured as bellow to log in json format.

```yaml
log:
  type: json
  format:
    - asctime: '@timestamp'
    - levelname: level
    - message
    - module
    - name: logger_name
  static_fields:
    app: patroni
```

This config produce this log:

```json
{
  "@timestamp": "2023-12-14 19:51:24,872",
  "level": "INFO",
  "message": "Lock owner: None; I am postgresql1",
  "module": "ha",
  "app": "patroni",
  "logger_name": "patroni.ha"
}
```
2024-01-16 10:42:48 +01:00
Alexander Kukushkin
6976939f09 Release/v3.2.1 (#2968)
- bump version
- bump pyright
- update release notes
2023-11-30 16:50:42 +01:00
Israel
269b04be5d Add a contrib script for remote Barman recovery (#2931)
A contrib script, which can be used as a custom bootstrap method, or as a custom create replica method.

The script communicates with the pg-backup-api on the Barman node so Patroni is able to restore a Barman backup remotely.

The `--help` option of the script, along with the script docstring, should provide some context on how to use fill its parameters.

Patroni docs were updated accordingly to share examples about how to configure the script as a custom bootstrap method, or as a custom create replica method.

References: PAT-216.
2023-11-06 16:25:27 +01:00
Israel
d72f7cb259 Add a FAQ page to the docs (#2933)
This commit introduces a FAQ page to the docs. The idea is to get
most frequently asked questions answered before-hand, so the user
is able to get them answered quickly without going into detail in
the docs or having to go to Slack/GitHub to clarify questions.

---------
Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
2023-11-01 14:02:04 +01:00
Aras Mumcuyan
c3dce46830 Add ability to pass auth_data to zk client (#2932) 2023-10-30 11:46:36 +01:00
Alexander Kukushkin
ce10e5fccc Release v3.2.0 (#2930)
- bump version
- bump pyright and apply fixes
- update release notes
2023-10-25 16:13:30 +02:00
Israel
bb90feb393 Add support for additional parameters on custom bootstrap (#2927)
Previous to this commit, if a user would ever like to add parameters to the custom bootstrap script call, they would need to configure Patroni like this:

```
bootstrap:
  method: custom_method_name
  custom_method_name:
    command: /path/to/my/custom_script --arg1=value1 --arg2=value2 ...
```

This commit extends that so we achieve a similar behavior that is seen when using `create_replica_methods`, i.e., we also allow the following syntax:

```
bootstrap:
  method: custom_method_name
  custom_method_name:
    command: /path/to/my/custom_script
    arg1: value1
    arg2: value2
```

All keys in the mapping which are not recognized by Patroni, will be dealt with as if they were additional named arguments to be passed down to the `command` call.

References: PAT-218.
2023-10-25 15:01:08 +02:00
Polina Bungina
6c06f5cc96 Add initial docs for patroni --validate/generate config (#2929)
For now it will sit in the section about the Patroni configuration. We can later move it to (or reference from) a new section where all the functionality of the `patroni` executable will be described.
2023-10-25 14:20:17 +02:00
Mark Pekala
f5ee67fa1c Feature: failover priority (#2780)
The priority is configured with `failover_priority` tag. Possible values are from `0` till infinity, where `0` means that the node will never become the leader, which is the same as `nofailover` tag set to `true`. As a result, in the configuration file one should set only one of `failover_priority` or `nofailover` tags.

The failover priority kicks in only when there are more than one node have the same receive/replay LSN and are ahead of other nodes in the cluster. In this case the node with higher value of `failover_priority` is preferred. If there is a node with higher values of receive/replay LSN, it will become the new leader even if it has lower value of `failover_priority` (except when priority is set to 0).

Close https://github.com/zalando/patroni/issues/2759
2023-10-24 12:22:48 +02:00
Israel
65030c56ee Add capability of specifying namespace through --dcs argument (#2926)
This commit changes the `patronictl` application in such a way its
`--dcs` argument is now able to receive a namespace.

Previous to this commit this was the format of that argument's value:
`DCS://HOST:PORT`.

From now on it accepts this format: `DCS://HORT:PORT/NAMESPACE`. As all
previous parts of the argument value, `NAMESPACE` is optional, and if
not given `patronictl` will fallback to the value from the configuration
file, if any, or to `service`.

This change is specifically useful when you are running a cluster in a
custom namespace, and from a machine where you don't have a configuration
file for Patroni or `patronictl`. It can avoid that you would have to
create a configuration file only with `namespace` filed in that case.

Issue reported by: Shaun Thomas <shaun@bonesmoses.org>

Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
2023-10-24 12:09:44 +02:00
GuanqunYang193
ce187bec38 Remove user creation related docs (#2920)
* Remove user creation related docs
* remove template
2023-10-23 08:29:09 +02:00
Alexander Kukushkin
c5fffb3c97 Further work on permanent physical slots (#2891)
- Fixed issues with has_permanent_slots() method. It didn't took into account the case of permanent physical slots for members, falsely concluding that there are no permanent slots.
- Write to the status key only LSNs for permanent slots (not just for slots that exist on the primary).
  - Include pg_current_wal_flush_lsn() to slots feedback, so that slots on standby nodes could be advanced
- Improved behave tests:
  - Verify that permanent slots are properly created on standby nodes
  - Verify that permanent slots are properly advanced, including DCS failsafe mode
  - Verify that only permanent slots are written to the `/status`
2023-10-23 08:24:28 +02:00
zhjwpku
cb5f34b721 add some guide to run tests in different scopes (#2921)
Introduce ways to run tests in different scopes which should be helpful for beginners.
2023-10-23 08:17:53 +02:00
Alexander Kukushkin
fc67ba73f0 Allow to specify psycopg* in extras and switch to build (#2907)
* remove check_psycopg() call from the setup.py, when installing from wheel it doesn't work anyway.
* call check_psycopg() function before process_arguments(), because the last one is trying to import psycopg and fails with the stacktrace, while the first one shows a nice human-readable error message.
* add psycopg2, psycopg2-binary, and psycopg3 extras, that will install psycopg2>=2.5.4, psycopg2-binary, or psycopg[binary]>=3.0.0 modules respectively.
* move check_psycopg() function to the __main__.py.
* introduce the new extra called `all`, it will allow to install all dependencies at once (except psycopg related).
* use the `build` module in order to create sdist bdist_wheel packages.
* update the documentation regarding psycopg and extras (dependencies).
2023-10-17 14:46:15 +02:00
Alexander Kukushkin
d93db20baa Set citus.local_hostname (#2903)
There are cases when Citus wants to have a connection to the local postgres. By default it uses `localhost` for that, which is not alwasy available. To solve it we will set `citus.local_hostname` GUC to custom value, which is the same as Patroni uses to connect to Postgres.
2023-10-16 10:21:50 +02:00
Chris Bandy
588df5da05 Refine the documentation about custom_conf (#2901)
some back icks in this section needed to be balanced.
2023-10-11 08:41:11 +02:00
Alexander Kukushkin
9283ebda64 Enforce loop_wait/retry_timeout/ttl rule (#2869)
* hard-code minimal possible values
* make adjustments if values are lower or if the rule is violated and show warnings
* update documentation
2023-10-04 11:44:57 +02:00
Israel
a329a9d320 Add a documentation page for patronictl (#2874)
This PR introduces a documentation page for `patronictl` application.

We adopted a top-down approach when writing this document. We start by describing the outer most parts, and then keep writing new sections that specialize the knowledge.

We basically added a section called `patronictl` to the left menu. Inside that section we created a page with this structure:

- `patronictl`: describes what it is
    - `Configuraiton`: how to configure `patronictl`
    - `Usage`: how to use the CLI. Inside this section, there are subsections for each of the subcommands exposed by `patronictl`, and each of them are described using the following subsubsections:
        - `Synopsis`: syntax of the command and its positional and optional arguments
        - `Description`: a description of what the command does
        - `Parameters`: a detailed description of the arguments and how to use them
        - `Examples`: one or more examples of execution of the command

References: PAT-200.
2023-10-04 11:43:38 +02:00
Polina Bungina
27915984b4 Add contrib requirement for tests, small docs refactoring (#2887) 2023-09-27 12:19:58 +02:00
Alexander Kukushkin
a3b3e1bc1c Release v3.1.2 (#2885)
- bump version
- update release notes
2023-09-26 12:30:27 +02:00
Alexander Kukushkin
bc15813de0 Permanent physical slots on standby nodes (#2852)
Create permanent physical replication slots on standby nodes and use `pg_replication_slot_advance()` function to move them forward.

The `restart_lsn` is advanced based on values stored in the `/status` key by the primary node.

When slot is created on a replica it could be ahead the same slot on the primary and therefore there is some period of time when it doesn't protect WAL files from being recycled.
2023-09-20 16:50:37 +02:00
Alexander Kukushkin
18d9cb1124 Stick with sphinx_rtd_theme (#2873)
by default they are using something else
2023-09-20 14:59:15 +02:00
Alexander Kukushkin
66bdb1ae12 Release v3.1.1 (#2872)
* Bump version
* Update release notes
* Update contributing guidelines and tox.ini (include v16)
* Enable tests for `REL*` branches
2023-09-20 12:00:18 +02:00