120 Commits

Author SHA1 Message Date
Dennis4b
b42550aad4 Add /read-only-sync endpoint (#2305) (#2311)
`/read-only-sync` mirrors `/read-only`, but only returns `200` on a replica if this replica is a synchronous standby.
2022-05-30 17:09:43 +02:00
Ants Aasma
0980838cb3 Fix port in use error on certificate replacement (#2185)
When switching certificates there is a race condition with a concurrent API request. If there is one active during the replacement period then the replacement will error out with a port in use error and Patroni gets stuck in a state without an active API server.

Fix is to call server_close after shutdown which will wait for already running requests to complete before returning.

Close #2184
2022-01-26 13:52:25 +01:00
Alexander Kukushkin
dc9ff4cb8a Release 2.1.2 (#2136)
* Implement missing unit-tests
* Bump version
* Update release notes
2021-12-03 15:49:57 +01:00
Alexander Kukushkin
fce889cd04 Compatibility with psycopg 3.0 (#2088)
By default `psycopg2` is preferred. The `psycopg>=3.0` will be used only if `psycopg2` is not available or its version is too old.
2021-11-19 14:32:54 +01:00
Michael Banck
2f31e88bdc Add dcs_last_seen field to API (#2051)
This field notes the last time (as unix epoch) a cluster member has successfully communicated with the DCS. This is useful to identify and/or analyze network partitions.

Also, expose dcs_last_seen in the MemberStatus class and its from_api_response() method.
2021-09-22 10:01:35 +02:00
Alexander Kukushkin
62aa1333cd Implemented allowlist for REST API (#1959)
If configured, only IPs that matching rules would be allowed to call unsafe endpoints.
In addition to that, it is possible to automatically include IPs of members of the cluster to the list.
If neither of the above is configured the old behavior is retained.

Partially address https://github.com/zalando/patroni/issues/1734
2021-07-05 09:43:56 +02:00
Arman Jafari Tehrani
e48df9987d Add health check on user defined tags (#1964)
Close #1958
2021-06-23 08:30:10 +02:00
Alexander Kukushkin
03e71b6717 The /leader endpoint returns 200 if node holds the lock (#1917)
Promoting the standby cluster requires updating load-balancer health checks, which is not very convenient and easy to forget.
In order to solve it, we change the behavior of the `/leader` health-check endpoint. It will return 200 without taking into account whether PostgreSQL is running as the primary or the standby_leader.
2021-06-22 08:21:29 +02:00
Alexander Kukushkin
c7173aadd7 Failover logical slots (#1820)
Effectively, this PR consists of a few changes:

1. The easy part:
  In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots.
  In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster.

2. The tricky part:
  On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()`  function.
  When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward.

3. Additional requirements:
  In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots.

4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use.

Compatibility.
The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary.

The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed.

Close: https://github.com/zalando/patroni/issues/1749
2021-03-25 16:18:23 +01:00
Mark Mercado
09f2f579d7 Quick attempt at Prometheus (#1848)
Close https://github.com/zalando/patroni/issues/318
2021-03-04 12:37:29 +01:00
Alexander Kukushkin
b341ab2e2f Release 2.0.2 (#1851)
* bump version
* update release notes
* implement missing unit-test
2021-02-22 12:28:19 +01:00
Nicolas Limage
e2b2daf0b3 add missing shutdown_request (#1770)
This patch fixes the error handling of cases where there are runtime errors in `socketserver`.
For example, when creating a new thread (to handle a request) fails.

`get_request` handles ssl connections by replacing the new client socket by a tuple containing `(server_socket, new_client_socket)` in order to later deal with handshakes in `process_request_thread`

During the processing of a request, the socketserver `BaseServer` calls `handle_request`, calling the `_handle_request_noblock`, which is calling the following functions (https://github.com/python/cpython/blob/3.8/Lib/socketserver.py#L303):

```
request, client_addr = get_request()
verify_request(request, client_address):
process_request(request, client_address)
handle_error(request, client_address)
shutdown_request(request)
```

- `get_request` is overloaded in patroni and returns `request` as a tuple in case of ssl calls
- `verify_request` defaults to `return True` and should be fixed if used but is fine in this case
- `process_request` just calls `process_request_thread` (which is overloaded in patroni and handles tuple-style requests)
- `handle_error` is overloaded in patroni and handles tuple-style requests)
- but `shutdown_request` is not overloaded and thus missing support for tuple-style requests

This patch adds support for tuple-style requests in patroni api
2020-11-27 19:25:06 +01:00
Alexander Kukushkin
0a1f389686 Release 2.0.0 (#1680)
* update release notes
* bump version
* change the default alignment in patronictl table output to `left`
* add missing tests
* add missing pieces to the documentation
2020-09-02 15:35:04 +02:00
Yogesh Sharma
62463db5e2 Add support for user defined HTTP header to Patroni REST API response (#1645)
Close #1644
2020-08-26 17:37:02 +02:00
ksarabu1
1ab709c5f0 Multi Sync Standby Support (#1594)
The new parameter `synchronous_node_count` is used by Patroni to manage number of synchronous standby databases. It is set to 1 by default. It has no effect when synchronous_mode is set to off. When enabled, Patroni manages precise number of synchronous standby databases based on parameter synchronous_node_count and adjusts the state in DCS & synchronous_standby_names as members join and leave.

This functionality can be further extended to support Priority (FIRST n) based synchronous replication & Quorum (ANY n) based synchronous replication in future.
2020-08-14 11:51:07 +02:00
Alexander Kukushkin
a9915fb3c9 Explicitly disallow patching non-existent config (#1639)
For DCS other than `kubernetes` it was failing with exception due to the `cluster.config` being `None`, but on Kubernetes it was happily creating the config annotation and preventing writing bootstrap configuration after the bootstrap finished.
2020-08-07 09:36:56 +02:00
ksarabu1
8a62999eaa replica & async rest API health check enhancement (#1599)
- ``GET /replica?lag=<max-lag>``: replica check endpoint.
- ``GET /asynchronous?lag=<max-lag>`` or ``GET /async&lag=<max-lag>``: asynchronous standby check endpoint.

Checks replication latency and returns status code **200** only when the latency is below a specified value. The key leader_optime from DCS is used for the leader WAL position and compute latency on the replica for performance reasons. Please note that the value in leader_optime might be a couple of seconds old (based on loop_wait).

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2020-07-15 10:36:48 +02:00
Alexander Kukushkin
db8c634db3 Create readiness and liveness endpoints (#1590)
They could be useful to eliminate "unhealthy" pods from subsets addresses when the K8s service with label selectors are used.
Real-life example: the node where the primary was running has failed and being shutdown and Patroni can't update (remove) the role label.
Therefore on OpenShift the leader service will have two pods assigned, one of them is a failed primary.
With the readiness probe defined, the failed primary pod will be excluded from the list.
2020-07-10 14:08:39 +02:00
Alexander Kukushkin
7a13579973 Refactor tcp_keepalive code (#1578)
* Move it into a separate function
* set keepalive on the REST API socket

The function will be also used in #1162
2020-07-08 14:04:59 +02:00
Alexander Kukushkin
cbff544b9c Implement patronictl flush switchover (#1554)
It includes implementing the `DELETE /switchover` REST API endpoint.

Close https://github.com/zalando/patroni/issues/1376
2020-06-25 16:27:57 +02:00
Alexander Kukushkin
e95e54b94e Handle correctly health-checks for standby cluster (#1553)
Close https://github.com/zalando/patroni/issues/1388
2020-06-05 10:37:02 +02:00
Alexander Kukushkin
4f1a3e53cd Defer TLS handshake until thread has started (#1547)
The `SSLSocket` is immediately doing the handshake on accept. Effectively it blocks the whole API thread if the client-side doesn't send any data.
In order to solve the issue we defer the handshake until a thread serving request has started.

The solution is a bit hacky, but thread-safe.

Close https://github.com/zalando/patroni/issues/1545
2020-06-05 09:36:13 +02:00
Alexander Kukushkin
c2a78ee652 Bugfix: GET /cluster was showing stale member info in zookeeper (#1573)
Zookpeeper implementation heavily relies on cached version of the cluster view in order to minimize the number of requests. Having stale members information is fine for Patroni workflow because it basically relies only on member names and tags.

The `GET /cluster` is a different case. Being exposed outside it might be used for monitoring purposes and therefore we should show the up-to-date members information.
2020-06-05 09:23:54 +02:00
Alexander Kukushkin
6a0d2924a0 Separate received and replayed location (#1514)
When making a decision whether the running replica is able to stream from the new primary or must be rewound we should use replayed location, therefore we extract received and replayed independently.

Reuse the part of the query that extracts the timeline and locations in the REST API.
2020-05-27 13:33:37 +02:00
Alexander Kukushkin
0693fe7dd0 Housekeeping (#1315)
* Reduce memory usage by patroni init process
* More cleanup in setup.py
* Implement missing tests
2019-12-04 11:28:46 +01:00
Alexander Kukushkin
367d787ff9 Implement /history and /cluster endpoints (#1191)
The /history endpoint shows the content of the `history` key in DCS
The /cluster endpoint show all cluster members and some service info like pending and scheduled restarts or switchovers.

In addition to that implement `patronictl history`

Close #586
Close #675
Close #1133
2019-10-22 17:19:02 +02:00
Alexander Kukushkin
b666f5e4ed Refactor Patroni REST API communication (#1197)
* make it possible to use client certificates with REST API
* define a separate PatroniRequest class which handles all communication
* refactor patronictl to use the new class
* make Ha to use the new class instead of calling requests.get. The old call wasn't taking into account certificates and basic-auth

Close #898
2019-10-11 10:16:33 +02:00
Alexander Kukushkin
3d29cb7e50 Perform pg_ctl reload regardless of config changes (#1204)
It is possible that some config files are not controlled by Patroni and when somebody is doing reload via REST API or by sending SIGHUP to Patroni process the usual expectation is that postgres will also be reloaded, but it didn't happen when there were no changes in the postgresql section of Patroni config.

For example one might replace ssl_cert_file and ssl_key_file on the filesystem and starting from PostgreSQL 10 it just requires a reload, but Patroni wasn't doing it.

In addition to that fix the issue with handling of `wal_buffers`. The default value depends on `shared_buffers` and `wal_segment_size` and therefore Patroni was exposing pending_restart when the new value in the config was explicitly set to -1 (default).

Close https://github.com/zalando/patroni/issues/1198
2019-10-10 14:49:30 +02:00
Alexander Kukushkin
4a24b79b73 IPv6 support (#1122)
fixes https://github.com/zalando/patroni/issues/1121
2019-08-02 11:34:29 +02:00
Alexander Kukushkin
37f03790cc Implement two-step logging (#1080)
A few times we observed that Patroni HA loop was blocked for a few minutes due to not being able to write logs to stderr. This is a very rare condition which we hit so far only on k8s. This commit makes Patroni resilient to such kind of problems. All log messages first are written into the in-memory queue and later they are asynchronously flushed into the stderr or file from a separate thread.

The maximum queue size is configurable and the default value is 1000. This should be enough to keep more than one hour of log messages with default settings and when Patroni cluster operates normally (without big issues).

In case if we hit the maximum size of the queue further logs will be discarded until the queue size will be reduced. The number of discarded messages will be reported into the log later.

In addition to that, the number of non-flushed and discarded messages (if there are any), will be reported via Patroni REST API as:
```json
"logger_queue_size": X,
"logger_records_lost": Y`
```
2019-06-13 14:18:49 +02:00
wilfriedroset
2384d9e735 Add API route /health (#1079)
close #119
2019-06-11 15:22:52 +02:00
Alexander Kukushkin
a4bd6a9b4b Refactor postgresql class (#1060)
* Convert postgresql.py into a package
* Factor out cancellable process into a separate class
* Factor out connection handler into a separate class
* Move postmaster into postgresql package
* Factor out pg_rewind into a separate class
* Factor out bootstrap into a separate class
* Factor out slots handler into a separate class
* Factor out postgresql config handler into a separate class
* Move callback_executor into postgresql package

This is just a careful refactoring, without code changes.
2019-05-21 16:02:47 +02:00
Julien Riou
663026c34c Use SSLContext to wrap REST API socket (#1039)
Using `ssl.wrap_socket` is deprecated and was still allowing soon-to-be-deprecated protocols like TLS 1.1.
Now using `SSLContext.create_default_context()` to produce a secure SSL context to wrap the REST API server's socket.
2019-04-23 11:23:22 +02:00
Alexander Kukushkin
7c0c9599fc Remove psycopg2 from requirements (#1023)
Recently released psycopg2 split into two different packages, psycopg2, and psycopg2-binary which could be installed at the same time into the same place on the filesystem. In order to decrease dependency hell problem, we let a user choose how to install psycopg2. There are a few options available and it is reflected in the documentation.

This PR also changes the following behavior:
* `pip install patroni` will fail if psycopg2 is not installed
* Patroni will check psycopg2 upon start and fail if it can't be found or outdated.

Closes https://github.com/zalando/patroni/issues/1021
2019-04-15 14:30:16 +02:00
Alexander Kukushkin
2c128520cf Python34 compatibility (#933)
and some other minor fixes.

Closes https://github.com/zalando/patroni/issues/932
2019-01-16 14:40:05 +01:00
Alexander Kukushkin
381a5b80d2 Release 1.5.4 (#931)
* Bump version
* Update release notes
* Make it possible to configure registration of Service in Consul via env variables
2019-01-15 12:14:19 +01:00
Alexander Kukushkin
71dae6a905 Optionally consider node not healthy if it is not on the latest timeline (#892)
The latest timeline is calculated from the `/history` key in DCS. In case there is no such key or it contains some garbage we consider the node healthy.
Closes https://github.com/zalando/patroni/issues/890
2019-01-15 11:16:30 +01:00
Alexander Kukushkin
f70edefc65 A few bugfixes in the "standby cluster" workflow (#823)
* Always run `pg_rewind` against the remote master
* Always use the remote master as the source when "recovering" stopped standby leader
* Use remote master as the source when "recovering" the node in the unhealthy cluster
* Use the local dbname as the fallback when doing `pg_rewind` from the remote master
*  `no_replication_slot` is the allowed key in the `RemoteMember` object
* Make it possible to "bootstrap" the new `standby_cluster` with existing (and valid) data directory. There is one prerequisite though, there should be no `patroni.dynamic.json` file in it!
2018-10-09 13:30:48 +02:00
Alexander Kukushkin
76d1b4cfd8 Minor fixes (#808)
* Use `shutil.move` instead of `os.replace`, which is available only from 3.3
*  Introduce standby-leader health-check and consul service
* Improve unit tests, some lines were not covered
* rename `assertEquals` -> `assertEqual`, due to deprecation warning
2018-09-19 16:32:33 +02:00
Alexander Kukushkin
90cf930036 Refactor REST API health-checks (#779)
Make it more readable and easy to understand.
Mostly it is needed to implement https://github.com/zalando/patroni/issues/772
2018-08-29 11:35:22 +02:00
Alexander Kukushkin
0c1ae6fbeb Respond 200 to master health-check only if update_lock was successful (#713)
If Patroni gets partitioned it starts receiving stale information from DCS.
We can't use this information to determine that we have the leader key.
Instead, we will record in Ha object the actual state of acquire/update lock and report as a leader only if it was successful.

P.S. despite responding with 200 on `GET /master` postgres was still running read-only.
2018-08-03 17:00:01 +02:00
Henning Jacobs
2537147810 #694 handle configuration error (#695)
It is possible to change a lot of parameters in runtime (including `restapi.listen`) by updating Patroni config file and sending SIGHUP to Patroni process.

If something was misconfigured it was throwing a weird exception and breaking `restapi` thread.

This PR improves friendliness of error message and avoids breaking of `restapi`.
2018-06-12 14:08:38 +02:00
Alexander Kukushkin
84f29caf92 Fix race condition in poll_failover_result (#658)
It didn't affect directly neither failover nor switchover, but in some rare cases it was reporting it as a success too early, when the former leader released the lock: `Failed over to "None" instead of "desired-node"`

In addition to that this commit improves logs and status messages by differentiating between failover and switchover.
2018-04-16 17:45:05 +02:00
Alexander Kukushkin
5668367181 Implement '/sync' and /async endpoints (#578)
They will respond with http status code 200 only when the node is running as a synchronous or asynchronous replica.

Fixes https://github.com/zalando/patroni/issues/189
Fixes https://github.com/zalando/patroni/issues/415
2018-01-05 15:28:40 +01:00
Alexander Kukushkin
03c2a85d23 Expose current timeline in DCS and via API (#591)
It is very easy to get current timeline on the master by executing
```sql
SELECT ('x' || SUBSTR(pg_walfile_name(pg_current_wal_lsn()), 1, 8))::bit(32)::int
```

Unfortunately the same method doesn't work when postgres is_in_recovery. Therefore we will use replication connection for that on the replicas. In order to avoid opening and closing replication connection on every HA loop we will cache the result if its value matches with the timeline of the master.

Also this PR introduces a new key in DCS: `/history`. It will contain a json serialized object with timeline history in a format similar to the usual history files. The differences are:
* Second column is the absolute wal position in bytes, instead of LSN
* Optionally there might be a fourth column - timestamp, (mtime of history file)
2018-01-05 15:25:56 +01:00
Alexander Kukushkin
18786464a1 Rename failover to switchover and make new failover work without leader (#588)
In addition to that implement /switchover endpoint as an alias to /failover endpoint and implement more checks like:
* candidate must be provided for a failover
* switchover can't be scheduled in a pause state
* and so on

Fixes https://github.com/zalando/patroni/issues/585
Fixes https://github.com/zalando/patroni/issues/520
2018-01-05 15:17:56 +01:00
Alexander Kukushkin
3a96ffa718 Expose pause state of every member to DCS and via REST (#592)
and implement patronictl pause|resume --wait on top of that

Fixes https://github.com/zalando/patroni/issues/349
2018-01-05 15:16:45 +01:00
Alexander Kukushkin
0e01bb33bb Improve patronictl reinit (#576)
Make it possible to cancel a running task if you want to reinitialize replica.
There are two possible ways to trigger it:
1. patronictl will ask whether you want to cancel already running task if an attempt to trigger reinitialize has failed
2. if you are using `--force` argument with `patronictl reinit`
2018-01-04 10:31:44 +01:00
Alexander Kukushkin
23152a7fc4 synchronous_standby_names must be quoted with quote_ident (#505)
in addition to that implement additional checks around manual failover and recover when synchronous_mode is enabled

* Comparison must be case insensitive
2017-08-24 07:55:02 +02:00
Alexander Kukushkin
77aea03df9 Different bugfixes around pause state, mostly related to watchdog (#507)
* Do not send keepalives if watchdog is not active
* Avoid activating watchdog in a pause mode
* Set correct postgres state in pause mode
* Don't try to run queries from API if postgres is stopped
2017-08-24 07:53:32 +02:00