Commit Graph

32 Commits

Author SHA1 Message Date
Alexander Kukushkin
13cc86f851 Merge branch 'master' of github.com:zalando/patroni into feature/quorum-commit 2023-11-24 14:42:15 +01:00
Israel
d72f7cb259 Add a FAQ page to the docs (#2933)
This commit introduces a FAQ page to the docs. The idea is to get
most frequently asked questions answered before-hand, so the user
is able to get them answered quickly without going into detail in
the docs or having to go to Slack/GitHub to clarify questions.

---------
Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
2023-11-01 14:02:04 +01:00
Alexander Kukushkin
5c6b34a757 Merge branch 'master' of github.com:zalando/patroni into feature/quorum-commit 2023-10-10 09:57:04 +02:00
Israel
a329a9d320 Add a documentation page for patronictl (#2874)
This PR introduces a documentation page for `patronictl` application.

We adopted a top-down approach when writing this document. We start by describing the outer most parts, and then keep writing new sections that specialize the knowledge.

We basically added a section called `patronictl` to the left menu. Inside that section we created a page with this structure:

- `patronictl`: describes what it is
    - `Configuraiton`: how to configure `patronictl`
    - `Usage`: how to use the CLI. Inside this section, there are subsections for each of the subcommands exposed by `patronictl`, and each of them are described using the following subsubsections:
        - `Synopsis`: syntax of the command and its positional and optional arguments
        - `Description`: a description of what the command does
        - `Parameters`: a detailed description of the arguments and how to use them
        - `Examples`: one or more examples of execution of the command

References: PAT-200.
2023-10-04 11:43:38 +02:00
Alexander Kukushkin
67612f5667 Merge branch 'master' of github.com:zalando/patroni into feature/quorum-commit 2023-09-12 09:07:11 +02:00
Polina Bungina
b31a4d55c9 Ensure strict failover/switchover definition difference (#2784)
- Don't set leader in failover key from patronictl failover
- Show warning and execute switchover if leader option is provided for patronictl failover command
- Be more precise in the log messages
- Allow to failover to an async candidate in sync mode
- Check if candidate is the same as the leader specified in api
- Fix and extend some tests
- Add documentation
2023-09-12 08:51:17 +02:00
Alexander Kukushkin
1ea5d6bfb7 Merge branch 'master' of github.com:zalando/patroni into feature/quorum-commit 2023-09-11 14:32:41 +02:00
SK
80a03a4892 Enreach some endpoints with the scope and name (#2846)
- monitoring endpoints - added `name` to the `patroni`, next to the `scope` and `version`
- metrics endpoint - added name to labels
2023-09-05 07:24:17 +02:00
Alexander Kukushkin
8e24d72f98 Merge branch 'master' of github.com:zalando/patroni into feature/quorum-commit 2023-08-22 10:07:35 +02:00
Polina Bungina
2ec9834c60 Update api examples (#2824)
* Add failsafe_mode_is_active to /patroni and /metrics
* Add patroni_primary to /metrics
* Add examples showing that failsafe_mode_is_active and cluster_unlocked
  are only shown for /patroni when the value is "true"
* Update /patroni and /config examples
2023-08-18 16:13:13 +02:00
Alexander Kukushkin
8f3c6d2ff6 Merge branch 'master' of github.com:zalando/patroni into feature/quorum-commit 2023-08-17 12:04:14 +02:00
Israel
4138d0b830 Add docstrings to patroni.config (#2708)
Besides adding docstrings to `patroni.config`, a few side changes
have been applied:

* Reference `config_file` property instead of internal attribute
`_config_file` in method `_load_config_file`;
* Have `_AUTH_ALLOWED_PARAMETERS[:2]` as default value of `params`
argument in method `_get_auth` instead of using
`params or _AUTH_ALLOWED_PARAMETERS[:2]` in the body;
* Use `len(PATRONI_ENV_PREFIX)` instead of a hard-coded `8` when
removing the prefix from environment variable names;
* Fix documentation of `wal_log_hints` setting. The previous docs
mentioned it was a dynamic setting that could be changed. However
it is managed by Patroni, which forces `on` value.

References: PAT-123.
2023-08-17 11:19:49 +02:00
Alexander Kukushkin
893e460695 Address review feedback 2023-07-20 08:00:52 +02:00
Alexander Kukushkin
b0d8b21d49 Merge branch 'master' of github.com:zalando/patroni into feature/quorum-commit 2023-07-13 12:55:39 +02:00
Alexander Kukushkin
d46ca88e6b Make it visible replication state on standbys (#2733)
To do that we use `pg_stat_get_wal_receiver()` function, which is available since 9.6. For older versions the `patronictl list` output and REST API responses remain as before.

In case if there is no wal receiver process we check if `restore_command` is set and show the state as `in archive recovery`.

Example of `patronictl list` output:
```bash
$ patronictl list
+ Cluster: batman -------------+---------+---------------------+----+-----------+
| Member      | Host           | Role    | State               | TL | Lag in MB |
+-------------+----------------+---------+---------------------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader  | running             | 12 |           |
| postgresql1 | 127.0.0.1:5433 | Replica | in archive recovery | 12 |         0 |
+-------------+----------------+---------+---------------------+----+-----------+

$ patronictl list
+ Cluster: batman -------------+---------+-----------+----+-----------+
| Member      | Host           | Role    | State     | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader  | running   | 12 |           |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 12 |         0 |
+-------------+----------------+---------+-----------+----+-----------+
```

Example of REST API response:
```bash
$ curl -s localhost:8009 | jq .
{
  "state": "running",
  "postmaster_start_time": "2023-07-06 13:12:00.595118+02:00",
  "role": "replica",
  "server_version": 150003,
  "xlog": {
    "received_location": 335544480,
    "replayed_location": 335544480,
    "replayed_timestamp": null,
    "paused": false
  },
  "timeline": 12,
  "replication_state": "in archive recovery",
  "dcs_last_seen": 1688642069,
  "database_system_identifier": "7252327498286490579",
  "patroni": {
    "version": "3.0.3",
    "scope": "batman"
  }
}

$ curl -s localhost:8009 | jq .
{
  "state": "running",
  "postmaster_start_time": "2023-07-06 13:12:00.595118+02:00",
  "role": "replica",
  "server_version": 150003,
  "xlog": {
    "received_location": 335544816,
    "replayed_location": 335544816,
    "replayed_timestamp": null,
    "paused": false
  },
  "timeline": 12,
  "replication_state": "streaming",
  "dcs_last_seen": 1688642089,
  "database_system_identifier": "7252327498286490579",
  "patroni": {
    "version": "3.0.3",
    "scope": "batman"
  }
}
```
2023-07-13 09:24:20 +02:00
Alexander Kukushkin
d799be9638 update REST API
- Don't return 200 for quorum nodes in GET `/sync` endpoint
- Dew `GET /quorum` endpoint, returns 200 for quorum nodes
- Show Quorum Standby role for quorum nodes in `patronictl list`
2023-05-11 12:18:33 +02:00
Polina Bungina
4e1b9937b9 Documentation improvements (#2661)
* Further nested lists rendering fixes
* Remove a couple of sphinx warnings
* Fix bootstrap.users.password description
* Boto->boto3 in README's
* Split configuration docs and move some lines across files
* Fix a typo
2023-05-04 07:24:37 +02:00
T.v.Dein
60723f5fa4 Add metric to report about sync standby replica status (#2615)
Close #2613

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2023-03-23 09:32:29 +01:00
Burak Ergen
89595babdf add "GET /metrics" rest_api.rst (#2576) 2023-03-02 09:40:54 +01:00
Alexander Kukushkin
4c3af2d1a0 Change master->primary/leader/member (#2541)
keep as much backward compatibility as possible.

Following changes were made:
1. All internal checks are performed as `role in ('master', 'primary')`
2. All internal variables/functions/methods are renamed
3. `GET /metrics` endpoint returns `patroni_primary` in addition to `patroni_master`.
4. Logs are changed to use leader/primary/member/remote depending on the context
5. Unit-tests are using only role = 'primary' instead of 'master' to verify that 1 works.
6. patronictl still supports old syntax, but also accepts `--leader` and `--primary`.
7. `master_(start|stop)_timeout` is automatically translated to `primary_(start|stop)_timeout` if the last one is not set.
8. updated the documentation and some examples

Future plan: in the next major release switch role name from `master` to `primary` and maybe drop `master` altogether.
The Kubernetes implementation will require more work and keep two labels in parallel. Label values should probably be configurable as described in https://github.com/zalando/patroni/issues/2495.
2023-01-27 07:40:24 +01:00
Alexander Kukushkin
88db6018ac Improve liveness probe (#2395)
it will start failing if the heartbeat loop isn't running longer than `ttl` on the primary or `2*ttl` on the replica.

Close https://github.com/zalando/patroni/issues/2388
2022-09-01 11:34:42 +02:00
Robert Cutajar
f92d975e7b #2021 add HEAD support - minimal (#2360) 2022-08-19 13:27:08 +02:00
Dennis4b
b42550aad4 Add /read-only-sync endpoint (#2305) (#2311)
`/read-only-sync` mirrors `/read-only`, but only returns `200` on a replica if this replica is a synchronous standby.
2022-05-30 17:09:43 +02:00
Christian Clauss
75e52226a8 Fix typos discovered by codespell (#1997) 2021-07-06 10:01:30 +02:00
Arman Jafari Tehrani
e48df9987d Add health check on user defined tags (#1964)
Close #1958
2021-06-23 08:30:10 +02:00
Alexander Kukushkin
03e71b6717 The /leader endpoint returns 200 if node holds the lock (#1917)
Promoting the standby cluster requires updating load-balancer health checks, which is not very convenient and easy to forget.
In order to solve it, we change the behavior of the `/leader` health-check endpoint. It will return 200 without taking into account whether PostgreSQL is running as the primary or the standby_leader.
2021-06-22 08:21:29 +02:00
Alexander Kukushkin
7bf60b64b0 Compatibility with PostgreSQL 13 (#1654)
So far Patroni was enforcing the same value of `wal_keep_segments` on all nodes in the cluster. If the parameter was missing from the global configuration it was using the default value `8`.
In pg13 beta3 the `wal_keep_segments` was renamed to the `wal_keep_size` and it broke Patroni.

If `wal_keep_segments` happened to be present in the configuration for pg13, Paroni will recalculate the value to `wal_keep_size` assuming that the `wal_segment_size` is 16MB. Sure, it is possible to get the real value of `wal_segment_size` from pg_control, but since we are dealing with the case of misconfiguration it is not worse time spend on it.
2020-08-17 10:45:02 +02:00
ksarabu1
8a62999eaa replica & async rest API health check enhancement (#1599)
- ``GET /replica?lag=<max-lag>``: replica check endpoint.
- ``GET /asynchronous?lag=<max-lag>`` or ``GET /async&lag=<max-lag>``: asynchronous standby check endpoint.

Checks replication latency and returns status code **200** only when the latency is below a specified value. The key leader_optime from DCS is used for the leader WAL position and compute latency on the replica for performance reasons. Please note that the value in leader_optime might be a couple of seconds old (based on loop_wait).

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2020-07-15 10:36:48 +02:00
Alexander Kukushkin
db8c634db3 Create readiness and liveness endpoints (#1590)
They could be useful to eliminate "unhealthy" pods from subsets addresses when the K8s service with label selectors are used.
Real-life example: the node where the primary was running has failed and being shutdown and Patroni can't update (remove) the role label.
Therefore on OpenShift the leader service will have two pods assigned, one of them is a failed primary.
With the readiness probe defined, the failed primary pod will be excluded from the list.
2020-07-10 14:08:39 +02:00
Alexander Kukushkin
cbff544b9c Implement patronictl flush switchover (#1554)
It includes implementing the `DELETE /switchover` REST API endpoint.

Close https://github.com/zalando/patroni/issues/1376
2020-06-25 16:27:57 +02:00
Alexander Kukushkin
35a2ccf8a8 A couple of small fixes in docs (#1285)
* fix formatting in release notes
* fix patronictl reinit command name
2019-11-21 10:39:28 +01:00
Alexander Kukushkin
c1adbafbc5 Improve documentation (#1244)
* document tags
* move dynamic configuration out of `bootstrap.dcs`
* document REST API endpoints
2019-11-13 16:10:28 +01:00