Add support for ``nostream`` tag. If set to ``true`` the node will not use replication protocol to stream WAL. It will rely instead on archive recovery (if ``restore_command`` is configured) and ``pg_wal``/``pg_xlog`` polling. It also disables copying and synchronization of permanent logical replication slots on the node itself and all its cascading replicas. Setting this tag on primary node has no effect.
We currently have a script named `patroni_barman_recover` in Patroni, which is intended to be used as a custom bootstrap method, or as a custom replica creation method.
Now there is need of one more Barman related script in Patroni to handle switching of config models in Barman upon `on_role_change` events.
However, instead of creating another Patroni script, let's say `patroni_barman_config_switch`, and duplicating a lot of logic in the code, we decided to refactor the code so:
* Instead of two separate scripts (`patroni_barman_recover` and `patroni_barman_config_switch`), we have a single script (`patroni_barman`) with 2 sub-commands (`recover` and `config-switch`)
This is the overview of changes that have been performed:
* File `patroni.scripts.barman_recover` has been removed, and its logic has been split into a few files:
* `patroni.scripts.barman.cli`: handles the entrypoint of the new `patroni_barman` command, exposing the argument parser and calling the appropriate functions depending on the sub-command
* `patroni.scripts.barman.utils`: implements utilitary enums, functions and classes wich can be used by `cli` and by sub-commands implementation:
* retry mechanism
* logging set up
* communication with pg-backup-api
* `patroni.scripts.barman.recover`: implements the `recover` sub-command only
* File `patroni.tests.test_barman_recover` has been renamed as `patroni.tests.test_barman`
* File `patroni.scripts.barman.config_switch` was created to implement the `config-switch` sub-command only
* `setup.py` has been changed so it generates a `patroni_barman` application instead of `patroni_barman_recover`
* Docs and unit tests were updated accordingly
References: PAT-154.
A contrib script, which can be used as a custom bootstrap method, or as a custom create replica method.
The script communicates with the pg-backup-api on the Barman node so Patroni is able to restore a Barman backup remotely.
The `--help` option of the script, along with the script docstring, should provide some context on how to use fill its parameters.
Patroni docs were updated accordingly to share examples about how to configure the script as a custom bootstrap method, or as a custom create replica method.
References: PAT-216.
This commit introduces a FAQ page to the docs. The idea is to get
most frequently asked questions answered before-hand, so the user
is able to get them answered quickly without going into detail in
the docs or having to go to Slack/GitHub to clarify questions.
---------
Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
Previous to this commit, if a user would ever like to add parameters to the custom bootstrap script call, they would need to configure Patroni like this:
```
bootstrap:
method: custom_method_name
custom_method_name:
command: /path/to/my/custom_script --arg1=value1 --arg2=value2 ...
```
This commit extends that so we achieve a similar behavior that is seen when using `create_replica_methods`, i.e., we also allow the following syntax:
```
bootstrap:
method: custom_method_name
custom_method_name:
command: /path/to/my/custom_script
arg1: value1
arg2: value2
```
All keys in the mapping which are not recognized by Patroni, will be dealt with as if they were additional named arguments to be passed down to the `command` call.
References: PAT-218.
For now it will sit in the section about the Patroni configuration. We can later move it to (or reference from) a new section where all the functionality of the `patroni` executable will be described.
The priority is configured with `failover_priority` tag. Possible values are from `0` till infinity, where `0` means that the node will never become the leader, which is the same as `nofailover` tag set to `true`. As a result, in the configuration file one should set only one of `failover_priority` or `nofailover` tags.
The failover priority kicks in only when there are more than one node have the same receive/replay LSN and are ahead of other nodes in the cluster. In this case the node with higher value of `failover_priority` is preferred. If there is a node with higher values of receive/replay LSN, it will become the new leader even if it has lower value of `failover_priority` (except when priority is set to 0).
Close https://github.com/zalando/patroni/issues/2759
This commit changes the `patronictl` application in such a way its
`--dcs` argument is now able to receive a namespace.
Previous to this commit this was the format of that argument's value:
`DCS://HOST:PORT`.
From now on it accepts this format: `DCS://HORT:PORT/NAMESPACE`. As all
previous parts of the argument value, `NAMESPACE` is optional, and if
not given `patronictl` will fallback to the value from the configuration
file, if any, or to `service`.
This change is specifically useful when you are running a cluster in a
custom namespace, and from a machine where you don't have a configuration
file for Patroni or `patronictl`. It can avoid that you would have to
create a configuration file only with `namespace` filed in that case.
Issue reported by: Shaun Thomas <shaun@bonesmoses.org>
Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
- Fixed issues with has_permanent_slots() method. It didn't took into account the case of permanent physical slots for members, falsely concluding that there are no permanent slots.
- Write to the status key only LSNs for permanent slots (not just for slots that exist on the primary).
- Include pg_current_wal_flush_lsn() to slots feedback, so that slots on standby nodes could be advanced
- Improved behave tests:
- Verify that permanent slots are properly created on standby nodes
- Verify that permanent slots are properly advanced, including DCS failsafe mode
- Verify that only permanent slots are written to the `/status`
* remove check_psycopg() call from the setup.py, when installing from wheel it doesn't work anyway.
* call check_psycopg() function before process_arguments(), because the last one is trying to import psycopg and fails with the stacktrace, while the first one shows a nice human-readable error message.
* add psycopg2, psycopg2-binary, and psycopg3 extras, that will install psycopg2>=2.5.4, psycopg2-binary, or psycopg[binary]>=3.0.0 modules respectively.
* move check_psycopg() function to the __main__.py.
* introduce the new extra called `all`, it will allow to install all dependencies at once (except psycopg related).
* use the `build` module in order to create sdist bdist_wheel packages.
* update the documentation regarding psycopg and extras (dependencies).
There are cases when Citus wants to have a connection to the local postgres. By default it uses `localhost` for that, which is not alwasy available. To solve it we will set `citus.local_hostname` GUC to custom value, which is the same as Patroni uses to connect to Postgres.
This PR introduces a documentation page for `patronictl` application.
We adopted a top-down approach when writing this document. We start by describing the outer most parts, and then keep writing new sections that specialize the knowledge.
We basically added a section called `patronictl` to the left menu. Inside that section we created a page with this structure:
- `patronictl`: describes what it is
- `Configuraiton`: how to configure `patronictl`
- `Usage`: how to use the CLI. Inside this section, there are subsections for each of the subcommands exposed by `patronictl`, and each of them are described using the following subsubsections:
- `Synopsis`: syntax of the command and its positional and optional arguments
- `Description`: a description of what the command does
- `Parameters`: a detailed description of the arguments and how to use them
- `Examples`: one or more examples of execution of the command
References: PAT-200.
Create permanent physical replication slots on standby nodes and use `pg_replication_slot_advance()` function to move them forward.
The `restart_lsn` is advanced based on values stored in the `/status` key by the primary node.
When slot is created on a replica it could be ahead the same slot on the primary and therefore there is some period of time when it doesn't protect WAL files from being recycled.
- Don't set leader in failover key from patronictl failover
- Show warning and execute switchover if leader option is provided for patronictl failover command
- Be more precise in the log messages
- Allow to failover to an async candidate in sync mode
- Check if candidate is the same as the leader specified in api
- Fix and extend some tests
- Add documentation
Some special handling is required when changing either of these settings in a Postgres cluster that has standby nodes:
* `max_connections`
* `max_prepared_transactions`
* `max_locks_per_transaction`
* `max_wal_senders`
* `max_worker_processes`
If one attempts to decrease `max_connections` dynamic setting and restart all nodes at the same time (primary and standbys), Patroni will refuse to apply the new value on the standbys and require the user to restart it again later, once replication catches up.
That behavior is correct, but it is not documented.
This commit adds information to documentation about that behavior and why it's required.
References: PAT-166.
* Generate documentation of private members through sphinx docs
With this commit we make sphinx build API docs for the following
things, which were missing up to this point:
* `__init__` method of classes;
* "private" members (properties, functions, methods, attributes, etc.,
which name starts with an underscore);
* members that are missing a docstring, so we can still reference
them with links in the documentation.
The third point can be removed later, if we wish, when we reach a
point where everything has proper docstrings in the Patroni code base.
* Fix documentation problems found after enabling private methods in sphinx
* `:cvar:` is not a valid domain role. Replaced with `:attr:`.
* documentation for `consul.base.Consul.__init__` has a single backtick
quoted string which is interpreted as a reference which cannot be found.
Therefore, the docstring has been copied as a block quote.
* various list spacing problems and indentation problems.
* code blocks added where indentation is interpreted incorrectly
* literal string quoting issues.
---------
Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
Co-authored-by: Matt Baker <matt.baker@enterprisedb.com>
When running with the leader lock Patroni was just setting the `role` label to `master` and effectively `kubernetes.standby_leader_label_value` feature never worked.
Now it is fixed, but in order to not introduce breaking changes we just update default value of the `standby_leader_label_value` to the `master`.
* Add failsafe_mode_is_active to /patroni and /metrics
* Add patroni_primary to /metrics
* Add examples showing that failsafe_mode_is_active and cluster_unlocked
are only shown for /patroni when the value is "true"
* Update /patroni and /config examples
Besides adding docstrings to `patroni.config`, a few side changes
have been applied:
* Reference `config_file` property instead of internal attribute
`_config_file` in method `_load_config_file`;
* Have `_AUTH_ALLOWED_PARAMETERS[:2]` as default value of `params`
argument in method `_get_auth` instead of using
`params or _AUTH_ALLOWED_PARAMETERS[:2]` in the body;
* Use `len(PATRONI_ENV_PREFIX)` instead of a hard-coded `8` when
removing the prefix from environment variable names;
* Fix documentation of `wal_log_hints` setting. The previous docs
mentioned it was a dynamic setting that could be changed. However
it is managed by Patroni, which forces `on` value.
References: PAT-123.
Expanding on the addition of docstrings in code, this adds python module API docs to sphinx documentation.
A developer can preview what this might look like by running this locally:
```
tox -m docs
```
The option `-W` is added to the tox env so that warning messages are considered errors.
Adds doc generation using the above method to the test GitHub workflow to catch documentation problems on PRs.
Some docstrings have been reformatted and fixed to satisfy errors generated with the above setup.
It seems that a common pitfall for new users of Patroni is that the `bootstrap.dcs` section is only used to initialize the configuration in DCS. This moves the comment about this to an info block so it is more visible to the reader.
This PR is an attempt of refactoring the docs about migration to Patroni.
These are a few enhancements that we propose through this PR:
* Docs used to mention the procedure can only be performed in a single-node cluster. We changed that so the procedure considers a cluster composed of primary and standbys;
* Teach how to deal with pre-existing replication slots;
* Explain how to create the user for `pg_rewind`, if user intends to enable `use_pg_rewind`.
References: PAT-143.
The docs of `slots` configuration used to have this mention:
```
my_slot_name: the name of replication slot. If the permanent slot name
matches with the name of the current primary it will not be created.
Everything else is the responsibility of the operator to make sure that
there are no clashes in names between replication slots automatically
created by Patroni for members and permanent replication slots.
```
However that is not true in the sense that Patroni does not check for
clashes between `my_slot_name` and the name of replication slots created
for replicating changes among members. If you specify a slot name that
clashes with the name of a replication slot used by a member, it turns
out Patroni will make the slot permanent in the primary even if the member
key expire from the DCS.
Through this commit we also enhance the docs in terms of explaining that
physical permanent slots are maintained only in the primary, while logical
replication slots are copied from primary to standbys.
Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
1. Take client certificates only from the `ctl` section. Motivation: sometimes there are server-only certificates that can't be used as client certificates. As a result neither Patroni not patronictl work correctly even if `--insecure` option is used.
2. Document that if `restapi.verify_client` is set to `required` then client certificates **must** be provided in the `ctl` section.
3. Add support for `ctl.authentication` and prefer to use it over `restapi.authentication`.
4. Silence annoying InsecureRequestWarning when `patronictl -k` is used, so that behavior becomes is similar to `curl -k`.
To do that we use `pg_stat_get_wal_receiver()` function, which is available since 9.6. For older versions the `patronictl list` output and REST API responses remain as before.
In case if there is no wal receiver process we check if `restore_command` is set and show the state as `in archive recovery`.
Example of `patronictl list` output:
```bash
$ patronictl list
+ Cluster: batman -------------+---------+---------------------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+---------------------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 12 | |
| postgresql1 | 127.0.0.1:5433 | Replica | in archive recovery | 12 | 0 |
+-------------+----------------+---------+---------------------+----+-----------+
$ patronictl list
+ Cluster: batman -------------+---------+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 12 | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 12 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
```
Example of REST API response:
```bash
$ curl -s localhost:8009 | jq .
{
"state": "running",
"postmaster_start_time": "2023-07-06 13:12:00.595118+02:00",
"role": "replica",
"server_version": 150003,
"xlog": {
"received_location": 335544480,
"replayed_location": 335544480,
"replayed_timestamp": null,
"paused": false
},
"timeline": 12,
"replication_state": "in archive recovery",
"dcs_last_seen": 1688642069,
"database_system_identifier": "7252327498286490579",
"patroni": {
"version": "3.0.3",
"scope": "batman"
}
}
$ curl -s localhost:8009 | jq .
{
"state": "running",
"postmaster_start_time": "2023-07-06 13:12:00.595118+02:00",
"role": "replica",
"server_version": 150003,
"xlog": {
"received_location": 335544816,
"replayed_location": 335544816,
"replayed_timestamp": null,
"paused": false
},
"timeline": 12,
"replication_state": "streaming",
"dcs_last_seen": 1688642089,
"database_system_identifier": "7252327498286490579",
"patroni": {
"version": "3.0.3",
"scope": "batman"
}
}
```
Until the last release, contributors' names were fully written on the
first occurence during that release. This meant that if Alexander had
four contributions in the release, we would use Alexander Kukushkin on
the first item in the release, and on all the others just Alexander.
This could, in some cases, create some confusion. For example, if there
are more than one contributor with the same first name that has more
than one contribution each.
For this reason, in release 3.0.3, we used the full names of contributors
on all the items from the release.
This patch is to amend the old release notes and have each entry with the
full name of the contributor.
Also fix typo with 2 spaces between first name and last name in one bug fix
Signed-off-by: Martín Marqués <martin.marques@enterprisedb.com>