Commit Graph

247 Commits

Author SHA1 Message Date
SK
80a03a4892 Enreach some endpoints with the scope and name (#2846)
- monitoring endpoints - added `name` to the `patroni`, next to the `scope` and `version`
- metrics endpoint - added name to labels
2023-09-05 07:24:17 +02:00
Alexander Kukushkin
6b7f914da7 Fix bug with kubernetes.standby_leader_label_value (#2832)
When running with the leader lock Patroni was just setting the `role` label to `master` and effectively `kubernetes.standby_leader_label_value` feature never worked.

Now it is fixed, but in order to not introduce breaking changes we just update default value of the `standby_leader_label_value` to the `master`.
2023-09-04 10:03:37 +02:00
Polina Bungina
7319d12026 Remove accidentally added .DS_Store (#2826)
And extend .gitignore
2023-08-21 07:50:45 +02:00
Polina Bungina
2ec9834c60 Update api examples (#2824)
* Add failsafe_mode_is_active to /patroni and /metrics
* Add patroni_primary to /metrics
* Add examples showing that failsafe_mode_is_active and cluster_unlocked
  are only shown for /patroni when the value is "true"
* Update /patroni and /config examples
2023-08-18 16:13:13 +02:00
Alexander Kukushkin
93be10a655 Remove Python 2 install instructions from docs/README (#2822)
docs/README.rst mainly duplicates README.rst and also should be changed. Besides that remove test/coverage badges.

followup on #2821
2023-08-17 16:17:34 +02:00
Israel
4138d0b830 Add docstrings to patroni.config (#2708)
Besides adding docstrings to `patroni.config`, a few side changes
have been applied:

* Reference `config_file` property instead of internal attribute
`_config_file` in method `_load_config_file`;
* Have `_AUTH_ALLOWED_PARAMETERS[:2]` as default value of `params`
argument in method `_get_auth` instead of using
`params or _AUTH_ALLOWED_PARAMETERS[:2]` in the body;
* Use `len(PATRONI_ENV_PREFIX)` instead of a hard-coded `8` when
removing the prefix from environment variable names;
* Fix documentation of `wal_log_hints` setting. The previous docs
mentioned it was a dynamic setting that could be changed. However
it is managed by Patroni, which forces `on` value.

References: PAT-123.
2023-08-17 11:19:49 +02:00
Matt Baker
b7ea511511 Generate API docs from code with sphinx autodoc (#2699)
Expanding on the addition of docstrings in code, this adds python module API docs to sphinx documentation.

A developer can preview what this might look like by running this locally:

```
tox -m docs
```

The option `-W` is added to the tox env so that warning messages are considered errors.

Adds doc generation using the above method to the test GitHub workflow to catch documentation problems on PRs.

Some docstrings have been reformatted and fixed to satisfy errors generated with the above setup.
2023-08-17 10:27:33 +02:00
Matt Baker
82d2ef4878 Make docs more clear on changes to the bootstrap.dcs section of YAML config (#2811)
It seems that a common pitfall for new users of Patroni is that the `bootstrap.dcs` section is only used to initialize the configuration in DCS. This moves the comment about this to an info block so it is more visible to the reader.
2023-08-11 10:31:31 +02:00
Alexander Kukushkin
84aac437c1 Release v3.1.0 (#2801)
- bump pyright and resolve reported issues
- bump Patroni version
- update release notes
2023-08-03 13:02:29 +02:00
Israel
48e3d31e1d Refactor docs about migration to Patroni (#2796)
This PR is an attempt of refactoring the docs about migration to Patroni.

These are a few enhancements that we propose through this PR:

* Docs used to mention the procedure can only be performed in a single-node cluster. We changed that so the procedure considers a cluster composed of primary and standbys;
* Teach how to deal with pre-existing replication slots;
* Explain how to create the user for `pg_rewind`, if user intends to enable `use_pg_rewind`.

References: PAT-143.
2023-08-03 09:01:16 +02:00
Israel
018a2f4dd9 Enhance docs of slots dynamic configuration (#2797)
The docs of `slots` configuration used to have this mention:

```
my_slot_name: the name of replication slot. If the permanent slot name
matches with the name of the current primary it will not be created.
Everything else is the responsibility of the operator to make sure that
there are no clashes in names between replication slots automatically
created by Patroni for members and permanent replication slots.
```

However that is not true in the sense that Patroni does not check for
clashes between `my_slot_name` and the name of replication slots created
for replicating changes among members. If you specify a slot name that
clashes with the name of a replication slot used by a member, it turns
out Patroni will make the slot permanent in the primary even if the member
key expire from the DCS.

Through this commit we also enhance the docs in terms of explaining that
physical permanent slots are maintained only in the primary, while logical
replication slots are copied from primary to standbys.

Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
2023-08-01 15:40:07 +02:00
Waynerv
0e19e3e98e Make pod role label configurable (#2659)
Close #2495
2023-07-25 10:29:04 +02:00
Alexander Kukushkin
06db296612 Fixes in patroni.request (#2768)
1.  Take client certificates only from the `ctl` section. Motivation: sometimes there are server-only certificates that can't be used as client certificates. As a result neither Patroni not patronictl work correctly even if `--insecure` option is used.
2. Document that if `restapi.verify_client` is set to `required` then client certificates **must** be provided in the `ctl` section.
3.  Add support for `ctl.authentication` and prefer to use it over `restapi.authentication`.
4. Silence annoying InsecureRequestWarning when `patronictl -k` is used, so that behavior becomes is similar to `curl -k`.
2023-07-25 08:48:18 +02:00
Alexander Kukushkin
a4d29eb99e Release v3.0.4 (#2754)
- update release notes
- bump version
- bump pyright version
2023-07-13 11:51:38 +02:00
Alexander Kukushkin
d46ca88e6b Make it visible replication state on standbys (#2733)
To do that we use `pg_stat_get_wal_receiver()` function, which is available since 9.6. For older versions the `patronictl list` output and REST API responses remain as before.

In case if there is no wal receiver process we check if `restore_command` is set and show the state as `in archive recovery`.

Example of `patronictl list` output:
```bash
$ patronictl list
+ Cluster: batman -------------+---------+---------------------+----+-----------+
| Member      | Host           | Role    | State               | TL | Lag in MB |
+-------------+----------------+---------+---------------------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader  | running             | 12 |           |
| postgresql1 | 127.0.0.1:5433 | Replica | in archive recovery | 12 |         0 |
+-------------+----------------+---------+---------------------+----+-----------+

$ patronictl list
+ Cluster: batman -------------+---------+-----------+----+-----------+
| Member      | Host           | Role    | State     | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader  | running   | 12 |           |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 12 |         0 |
+-------------+----------------+---------+-----------+----+-----------+
```

Example of REST API response:
```bash
$ curl -s localhost:8009 | jq .
{
  "state": "running",
  "postmaster_start_time": "2023-07-06 13:12:00.595118+02:00",
  "role": "replica",
  "server_version": 150003,
  "xlog": {
    "received_location": 335544480,
    "replayed_location": 335544480,
    "replayed_timestamp": null,
    "paused": false
  },
  "timeline": 12,
  "replication_state": "in archive recovery",
  "dcs_last_seen": 1688642069,
  "database_system_identifier": "7252327498286490579",
  "patroni": {
    "version": "3.0.3",
    "scope": "batman"
  }
}

$ curl -s localhost:8009 | jq .
{
  "state": "running",
  "postmaster_start_time": "2023-07-06 13:12:00.595118+02:00",
  "role": "replica",
  "server_version": 150003,
  "xlog": {
    "received_location": 335544816,
    "replayed_location": 335544816,
    "replayed_timestamp": null,
    "paused": false
  },
  "timeline": 12,
  "replication_state": "streaming",
  "dcs_last_seen": 1688642089,
  "database_system_identifier": "7252327498286490579",
  "patroni": {
    "version": "3.0.3",
    "scope": "batman"
  }
}
```
2023-07-13 09:24:20 +02:00
Martín Marqués
e72d3ba79e Use full names for contributors in the release notes (#2725)
Until the last release, contributors' names were fully written on the
first occurence during that release. This meant that if Alexander had
four contributions in the release, we would use Alexander Kukushkin on
the first item in the release, and on all the others just Alexander.

This could, in some cases, create some confusion. For example, if there
are more than one contributor with the same first name that has more
than one contribution each.

For this reason, in release 3.0.3, we used the full names of contributors
on all the items from the release.

This patch is to amend the old release notes and have each entry with the
full name of the contributor.

Also fix typo with 2 spaces between first name and last name in one bug fix

Signed-off-by: Martín Marqués <martin.marques@enterprisedb.com>
2023-07-04 18:53:53 +03:00
Andrey
74d78dbba2 Update request_queue_size feature authors (#2723)
Add Aleksei Sukhov do the authors
2023-06-26 08:11:09 +02:00
Alexander Kukushkin
6f91f4f4e2 Release v3.0.3 (#2719)
* Bump version
* Bump pyright version and fix newly reported issues
* Update release notes
* Fix typos, extend release process desc
* Add readthedocs configuration file v2
* Fix Dockerfile.citus files
2023-06-22 10:46:02 +02:00
Mark Pekala
43e2290fdf More beginner-friendly introduction (#2712)
Attempts to make progress on #2250 by rephrasing existing introduction and making sentences slightly shorter.
2023-06-21 11:45:56 +02:00
Polina Bungina
21e92fd166 Add env vars for custom bin names (#2706) 2023-06-01 14:06:11 +02:00
Israel
d11328020d Add support for custom Postgres binary names (#2692)
When using a custom Postgres distribution it may be the case that the Postgres binaries are compiled with different names other than the ones used by the community Postgres distribution.

With that in mind we implemented a new set of settings for Patroni, so the user is able to override the default binary names with custom binary names through the new section postgresql.bin_name in the local configuration.

References: PAT-17.
2023-05-30 13:57:57 +02:00
Polina Bungina
822b6ec711 Subtle README fix (#2691)
Remove misleading words
2023-05-24 11:22:41 +02:00
Matt Baker
73797e8572 Add tox configuration for running multiple test envs (#2603) 2023-05-24 10:58:04 +02:00
Polina Bungina
d1fdb45179 Make bootstrap.initdb optional (#2685) 2023-05-24 09:28:57 +02:00
Polina Bungina
6c8a3b0d25 Remove bootstrap.pg_hba (#2684)
* Remove bootstrap.pg_hba
* Extend docs for postgresql.pg_hba/pg_ident
* Add postgresql.pg_hba/pg_ident to dynamic config docs

---------

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2023-05-24 09:01:56 +02:00
Polina Bungina
2f5bcbd877 Change PostgreSQL Slack invite link (#2680) 2023-05-23 08:17:51 +02:00
Polina Bungina
506b5bec48 Validate-config fixes (#2678)
- fix --validate-config not to error out if bin_dir is an empty string in the yaml config
- mention bin_dir optionality in the docs
- validate bin_dir even if it is not in the yaml config (add optional
  default value for Optional config params in validator)
- make rewind user optional
2023-05-15 13:40:22 +02:00
Polina Bungina
4e1b9937b9 Documentation improvements (#2661)
* Further nested lists rendering fixes
* Remove a couple of sphinx warnings
* Fix bootstrap.users.password description
* Boto->boto3 in README's
* Split configuration docs and move some lines across files
* Fix a typo
2023-05-04 07:24:37 +02:00
nrmn_2492
90ed581d87 Update pg_rewind user/password to optional in docs (#2650)
* Update REWIND_USERNAME, REWIND_PASWWORD optional tag in ENVIRONMENT.rst file
* Update settings.rst pg_rewind user/password optional
2023-05-03 15:39:09 +02:00
Le Duane
bebe6754fc Add before stop hook (#2642)
The two cases we have in mind are:
* In spite of following all best practices client-side, logical replication connections can sometimes hang the Postgres shutdown sequence. We'd like to sigterm any misbehaving logical replication connections which remain after x seconds. These will inevitably get killed anyway on master stop timeout.
* remove "role=master" label on current primary when not using k8s as DCS. Waiting until after Postgres fully stops can sometimes be too long for this.
* Pause pgbouncer connections before switchover

Close #2596
2023-04-27 13:07:32 +02:00
Chris Bandy
54b6d8186f Render nested lists correctly in settings docs (#2649)
The sections on this page have been rendering as description lists
rather than unordered lists.
2023-04-26 14:42:57 +02:00
Andrey
8a5d6ec74d Add "request_queue_size" option to REST API server (#2643)
Sets request queue size for TCP socket used by Patroni REST API. Once the queue is full, further requests get a "Connection denied" error. The default value is 5.
2023-04-12 10:25:14 +02:00
Alexander Kukushkin
39875f448c Release v3.0.2 (#2617)
- bump version
- update release notes
- update links to Postgres Slack
- simplify /sync health-check endpoint code
- update unit-tests to cover missing lines
2023-03-24 08:54:54 +01:00
T.v.Dein
60723f5fa4 Add metric to report about sync standby replica status (#2615)
Close #2613

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2023-03-23 09:32:29 +01:00
Víctor Oriol i Aguilar
36c17e944b high availability across multiple datacenter #2587 (#2598)
documentations about how deploy a high availability across multiple datacenters

Close #2587
2023-03-14 15:39:50 +01:00
Alexander Kukushkin
eefa15b390 Make K8s retriable HTTP status code configurable (#2585)
Configuration parameter is `kubernetes.retriable_http_codes` or `PATRONI_KUBERNETES_RETRIABLE_HTTP_CODES` environment variable.

These status codes are added to the default list of 500, 503, 504.

Close https://github.com/zalando/patroni/issues/2536
2023-03-10 09:38:12 +01:00
Burak Ergen
89595babdf add "GET /metrics" rest_api.rst (#2576) 2023-03-02 09:40:54 +01:00
Polina Bungina
422047f105 Release 3.0.1 (#2561)
* Bump version
* Update release notes
* Return 3.6 to supported versions in setup.py
2023-02-16 08:51:47 +01:00
Alexander Kukushkin
8ac8ed6584 Update Citus link to the github.com repo (#2546)
Per suggestion from @clairegiordano
2023-02-02 11:50:19 +01:00
Alexander Kukushkin
7869f5e211 Release 3.0.0 (#2545)
* bump version
* update release notes
* removed 2.7, 3.4, 3.5, and 3.6 from supported versions in setup.py
* switched GH actions back to ubuntu-latest, removed tests with 2.7 and 3.6, and added 3.11
* some little fixes in Citus documentation and behave tests
2023-01-30 10:29:08 +01:00
Alexander Kukushkin
4c3af2d1a0 Change master->primary/leader/member (#2541)
keep as much backward compatibility as possible.

Following changes were made:
1. All internal checks are performed as `role in ('master', 'primary')`
2. All internal variables/functions/methods are renamed
3. `GET /metrics` endpoint returns `patroni_primary` in addition to `patroni_master`.
4. Logs are changed to use leader/primary/member/remote depending on the context
5. Unit-tests are using only role = 'primary' instead of 'master' to verify that 1 works.
6. patronictl still supports old syntax, but also accepts `--leader` and `--primary`.
7. `master_(start|stop)_timeout` is automatically translated to `primary_(start|stop)_timeout` if the last one is not set.
8. updated the documentation and some examples

Future plan: in the next major release switch role name from `master` to `primary` and maybe drop `master` altogether.
The Kubernetes implementation will require more work and keep two labels in parallel. Label values should probably be configurable as described in https://github.com/zalando/patroni/issues/2495.
2023-01-27 07:40:24 +01:00
Alexander Kukushkin
4872ac51e0 Citus integration (#2504)
Citus cluster (coordinator and workers) will be stored in DCS as a fleet of Patroni logically grouped together:
```
/service/batman/
/service/batman/0/
/service/batman/0/initialize
/service/batman/0/leader
/service/batman/0/members/
/service/batman/0/members/m1
/service/batman/0/members/m2
/service/batman/
/service/batman/1/
/service/batman/1/initialize
/service/batman/1/leader
/service/batman/1/members/
/service/batman/1/members/m1
/service/batman/1/members/m2
...
```

Where 0 is a Citus group for coordinator and 1, 2, etc are worker groups.

Such hierarchy allows reading the entire Citus cluster with a single call to DCS (except Zookeeper).

The get_cluster() method will be reading the entire Citus cluster on the coordinator because it needs to discover workers. For the worker cluster it will be reading the subtree of its own group.

Besides that we introduce a new method  get_citus_coordinator(). It will be used only by worker clusters.

Since there is no hierarchical structures on K8s we will use the citus group suffix on all objects that Patroni creates.
E.g.
```
batman-0-leader  # the leader config map for the coordinator
batman-0-config  # the config map holding initialize, config, and history "keys"
...
batman-1-leader  # the leader config map for worker group 1
batman-1-config
...
```

Citus integration is enabled from patroni.yaml:
```yaml
citus:
  database: citus
  group: 0  # 0 is for coordinator, 1, 2, etc are for workers
```

If enabled, Patroni will create the database, citus extension in it, and INSERTs INTO `pg_dist_authinfo` information required for Citus nodes to communicate between each other, i.e. 'password', 'sslcert', 'sslkey' for superuser if they are defined in the Patroni configuration file.

When the new Citus coordinator/worker is bootstrapped, Patroni adds `synchronous_mode: on` to the `bootstrap.dcs` section.

Besides that, Patroni takes over management of some Postgres GUCs:
- `shared_preload_libraries` - Patroni ensures that the "citus" is added to the first place
- `max_prepared_transactions` - if not set or set to 0, Patroni changes the value to `max_connections*2`
- wal_level - automatically set to logical. It is used by Citus to move/split shards. Under the hood Citus is creating/removing replication slots and they are automatically added by Patroni to the `ignore_slots` configuration to avoid accidental removal.

The coordinator primary actively discovers worker primary nodes and registers/updates them in the `pg_dist_node` table using
citus_add_node() and citus_update_node() functions.

Patroni running on the coordinator provides the new REST API endpoint: `POST /citus`. It is used by workers to facilitate controlled switchovers and restarts of worker primaries.
When the worker primary needs to shut down Postgres because of restart or switchover, it calls the `POST /citus` endpoint on the coordinator and the Patroni on the coordinator starts a transaction and calls `citus_update_node(nodeid, 'host-demoted', port)` in order to pause client connections that work with the given worker.
Once the new leader is elected or postgres started back, they perform another call to the `POST/citus` endpoint, that does another `citus_update_node()` call with actual hostname and port and commits a transaction. After transaction is committed, coordinator reestablishes connections to the worker node and client connections are unblocked.
If clients don't run long transaction the operation finishes without client visible errors, but only a short latency spike.

All operations on the `pg_dist_node` are serialized by Patroni on the coordinator. It allows to have more control and ROLLBACK transaction in progress if its lifetime exceeding a certain threshold and there are other worker nodes should be updated.
2023-01-24 16:14:58 +01:00
Polina Bungina
acecbe0d8f Fix a couple of linter problems, delete TODO.md (#2526)
Fix a couple of linter problems, remove trailing whitespaces

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2023-01-17 10:52:03 +01:00
Alexander Kukushkin
2ea0357854 DCS failsafe mode (#2379)
If enabled it will allow Patroni to cope with DCS outages.
In case of a DCS outage the leader tries to call all remaining members in the cluster via API and if all of them respond with success the leader will not be demoted.

The failsafe_mode could be enabled by running
```sh
patronictl edit-config -s failsafe_mode=true
```

or by calling the `/config` REST API endpoint.

Co-authored-by: Polina Bungina <bungina@gmail.com>
2023-01-13 13:35:05 +01:00
Polina Bungina
650344fca8 Update Slack link in README.rst and CONTRIBUTING.rst (#2520)
* Update Slack link in README.rst and CONTRIBUTING.rst
2023-01-11 16:06:25 +01:00
Alexander Kukushkin
442bd3f434 Compatibility with some old modules (#2514)
- old click differently handles argument names
- old pytest doesn't like `from mock import call`

Bump version and update release notes.

Close: https://github.com/zalando/patroni/issues/2508
Close: https://github.com/zalando/patroni/issues/2512
2023-01-04 07:24:52 +01:00
Polina Bungina
bad158046e Release v2.1.6 (#2507)
* bump version
* update release notes

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2022-12-30 13:32:34 +01:00
Alexander Kukushkin
53f89faaab Release v2.1.5 (#2462)
* bump version
* update release notes
* run some behave tests on v15
* automate release process by building/pushing packages on tag creation and release publication
2022-11-28 10:45:04 +01:00
Alexander Kukushkin
8f8e9c9b81 Inptroduce postgresql.proxy_address (#2437)
It will be written to member key in DCS as the `proxy_url` and could be used/useful for service discovery.
2022-10-24 10:23:06 +02:00
Jim Chanco Jr
84dc72b031 docs: Change term "Master" to "primary" or "leader" (#2417) 2022-09-29 08:48:49 +02:00