2020 Commits

Author SHA1 Message Date
Alexander Kukushkin
db8c634db3 Create readiness and liveness endpoints (#1590)
They could be useful to eliminate "unhealthy" pods from subsets addresses when the K8s service with label selectors are used.
Real-life example: the node where the primary was running has failed and being shutdown and Patroni can't update (remove) the role label.
Therefore on OpenShift the leader service will have two pods assigned, one of them is a failed primary.
With the readiness probe defined, the failed primary pod will be excluded from the list.
2020-07-10 14:08:39 +02:00
Alexander Kukushkin
7a13579973 Refactor tcp_keepalive code (#1578)
* Move it into a separate function
* set keepalive on the REST API socket

The function will be also used in #1162
2020-07-08 14:04:59 +02:00
Alexander Kukushkin
8eb01c77b6 Don't fire on_reload when promoting to standby_leader on 13+ (#1552)
PostgreSQL 13 finally introduced the possibility to change the `primary_conninfo` without a restart. Just doing reload is enough, but in case if the role is changing from the `replica` to the `standby_leader` we want to call only `on_role_change` callback and skip `on_reload`, because they duplicate each other.
2020-06-29 14:49:25 +02:00
Alexander Kukushkin
cbff544b9c Implement patronictl flush switchover (#1554)
It includes implementing the `DELETE /switchover` REST API endpoint.

Close https://github.com/zalando/patroni/issues/1376
2020-06-25 16:27:57 +02:00
Alexander Kukushkin
7f343c2c57 Try to fetch missing WAL if pg_rewind complains about it (#1561)
It could happen that the WAL segment required for `pg_rewind` doesn't exist in the `pg_wal` anymore and therefore `pg_rewind` can't find the checkpoint location before the diverging point.
Starting from PostgreSQL 13 `pg_rewind` could use `restore_command` for fetching missing WALs, but we can do better than that.
On older PostgreSQL versions Patroni will parse the stdout and stderr of failed rewind attempt, try to fetch the missing WAL by calling the `restore_command`, and repeat an attempt.
2020-06-25 16:24:21 +02:00
Alexander Kukushkin
e00acdf6df Fix possible race conditions in update_leader (#1596)
1. Between get_cluster() and update_leader() calls the K8s leader object might be updated from outside and therefore the resource version will not match (error code=409). Since we are watching for all changes, the ObjectCache likely will have the most up-to-date version and we will take advantage of that. There is still a chance to hit a race-condition, but it would be smaller than before. Actually, other DCS are free of this issue. Etcd - update is based on the value comparison, Zookeeper and Consul are relying on session mechanism.
2. If the update still failed - recheck the resource version of the leader object and that the current node is still the leader there and repeat the call.

P.S. The leader race is still relying on the version of the leader object as it was during the get_cluster() call.

In addition to that fixed handling of K8s API errors, we should retry on 500, not on 502.
Close https://github.com/zalando/patroni/issues/1589
2020-06-22 16:07:52 +02:00
Alexander Kukushkin
ee4bf79c11 Populate references and nodename in subsets addresses (#1591)
It makes subsets to exactly look like they were populated by the service with label selector and would help with https://github.com/zalando/postgres-operator/issues/340#issuecomment-587001109

Unit-tests are refactored to minimize amount of mocks.
2020-06-16 12:56:20 +02:00
Maxim Fedotov
623b594539 patronictl add ability to print ASCII topology (#1576)
Example:
```bash
$ patronictl topology
+ Cluster: batman (6834835313225022118) -----+---------+----+-----------+------------------------------------------------+
| Member          |      Host      |   Role  |  State  | TL | Lag in MB | Tags                                           |
+-----------------+----------------+---------+---------+----+-----------+------------------------------------------------+
| postgresql0     | localhost:5432 |  Leader | running |  2 |           |                                                |
| + postgresql1   | localhost:5433 | Replica | running |  2 |       0.0 |                                                |
|   + postgresql2 | localhost:5434 | Replica | running |  2 |       0.0 | {nofailover: true, replicatefrom: postgresql1} |
+-----------------+----------------+---------+---------+----+-----------+------------------------------------------------+
```
2020-06-12 15:23:42 +02:00
ponvenkates
2d5c8e0067 Increasing maxsize in pool manager (#1575)
Close #1474
2020-06-11 16:33:00 +02:00
Alexander Kukushkin
e95e54b94e Handle correctly health-checks for standby cluster (#1553)
Close https://github.com/zalando/patroni/issues/1388
2020-06-05 10:37:02 +02:00
Alexander Kukushkin
4f1a3e53cd Defer TLS handshake until thread has started (#1547)
The `SSLSocket` is immediately doing the handshake on accept. Effectively it blocks the whole API thread if the client-side doesn't send any data.
In order to solve the issue we defer the handshake until a thread serving request has started.

The solution is a bit hacky, but thread-safe.

Close https://github.com/zalando/patroni/issues/1545
2020-06-05 09:36:13 +02:00
Alexander Kukushkin
1229cf2c16 Ignore hba_file and ident_file when they match with defaults (#1555)
It is possible to specify custom hba_file and ident_file in the postgresql configuration parameters and Patroni is considering that these files are managed externally. It could happen that locations of these files matching with default locations of pg_hba,conf and pg_ident.conf. In this case we will ignore custom values and fallback to the default workflow, i.e. Patroni will overwrite them.

Close: https://github.com/zalando/patroni/issues/1544
2020-06-05 09:33:50 +02:00
Alexander Kukushkin
1b2491cedf Check basic-auth indepandantly from client certificate (#1556)
this is absolutely valid use-case
2020-06-05 09:25:33 +02:00
Alexander Kukushkin
80ed08a2bb Enforce synchronous_commit=local for post_init script (#1559)
Patroni was already doing that before creating users for a long time, but the post_init was an oversight. It will help to all utilities relying on libpq and reduce the end-user confusion.
2020-06-05 09:24:47 +02:00
Alexander Kukushkin
c2a78ee652 Bugfix: GET /cluster was showing stale member info in zookeeper (#1573)
Zookpeeper implementation heavily relies on cached version of the cluster view in order to minimize the number of requests. Having stale members information is fine for Patroni workflow because it basically relies only on member names and tags.

The `GET /cluster` is a different case. Being exposed outside it might be used for monitoring purposes and therefore we should show the up-to-date members information.
2020-06-05 09:23:54 +02:00
Tomáš Pospíšek
6406b39b77 add config section keys, improve verify_client documentation (#1549) 2020-06-03 09:55:21 +02:00
Alexander Kukushkin
76cfcf3ae8 Don't rely on deprecated flake8 setuptools entrypoint (#1557)
Define and use own command class for that
2020-06-03 09:54:04 +02:00
Сергей Бурладян
6e4ca1717c Correct CRLF after HTTP headers in OPTIONS request (#1570)
Close #1569
2020-06-02 08:55:49 +02:00
Alexander Kukushkin
cd1b2741fa Improve timeline divergence check (#1563)
We don't need to rewind when:
1. replayed location for the former replica is not ahead of switchpoint
2. end of checkpoint record for the former primary is the same as switchpoint

In order to get the end of checkpoint record we use the `pg_waldump` and parse its output.

Close https://github.com/zalando/patroni/issues/1493
2020-05-29 14:15:10 +02:00
Alexander Kukushkin
98c2081c67 Detect a new timeline in the standby cluster (#1522)
The standby cluster doesn't know about leader elections in the main cluster and therefore the usual mechanisms of detecting divergences don't work. For example, it could happen that the standby cluster is ahead of the new primary of the main cluster and must be rewound.
There is a way to know that the new timeline has been created by checking the presence of a history file in pg_wal. If the new file is there, we will start usual procedures of making sure that we can continue streaming or will run the pg_rewind.
2020-05-29 14:14:47 +02:00
Alexander Kukushkin
c6207933d1 Properly handle the exception raised from refresh_session (#1531)
The `touch_member()` could be called from the finally block of the `_run_cycle()`. In case if it raised an exception the whole Patroni process was crashing.
In order to avoid future crashes we wrap `_run_cycle()` into the try..except block and ask a user to report a BUG.

Close https://github.com/zalando/patroni/issues/1529
2020-05-29 14:14:11 +02:00
Mateusz Kowalski
798c46bc03 Handle IPv6 addresses in split_host_port (#1533)
This PR makes split_host_port return IPv6 address without enclosing brackets.
This is due to the fact that e.g. socket.* functions expect host not to contain them when being called with IPv6.

Close: #1532
2020-05-29 14:13:33 +02:00
Alexander Kukushkin
6a0d2924a0 Separate received and replayed location (#1514)
When making a decision whether the running replica is able to stream from the new primary or must be rewound we should use replayed location, therefore we extract received and replayed independently.

Reuse the part of the query that extracts the timeline and locations in the REST API.
2020-05-27 13:33:37 +02:00
Alexander Kukushkin
881bba9e1c Sync HA loops of all pods in one cluster (#1515)
There is no expire mechanism available on K8s, therefore we implement soft leader lock, i.e. every pod is "watching" for changes of the leader object and when there are no changes during the TTL it starts leader race.

Before we switched to LIST+WATCH approach in #1189 and #1276, we only watched for the leader object and every time it was updated, the main thread of the HA loop was waking up. As a result, all replica pods were synchronized, and starting the leader race more or less at the same time.

The new approach made all pods "unsynchronized" and the biggest downside of it - it takes `ttl + loop_wait` in the worst case to detect the leader failure.

This commit makes all pods in one cluster to sync HA loops again based on updates of the leader object.
2020-05-15 18:04:59 +02:00
Alexander Kukushkin
ad5c686c11 Take advantage of pg_stat_wal_recevier (#1513)
So far Patroni was parsing `recovery.conf` or querying `pg_settings` in order to get the current values of recovery parameters. On PostgreSQL earlier than 12 it could easily happen that the value of `primary_conninfo` in the `recovery.conf` has nothing to do with reality. Luckily for us, on PostgreSQL 9.6+ there is a `pg_stat_wal_receiver` view, which contains current values of `primary_conninfo` and `primary_slot_name`. The password field is masked through, but this is fine, because authentication happens only during opening the connection. All other parameters we compare as usual.

Another advantage of `pg_stat_wal_recevier` - it contains the current timeline, therefore on 9.6+ we don't need to use the replication connection trick if walreceiver process is alive.

If there is no walreceiver process available or it is not streaming we will stick to old methods.
2020-05-15 18:04:24 +02:00
Alexander Kukushkin
08b3d5d20d Move ensure_clean_shutdown into rewind module (#1528)
Logically fits there better
2020-05-15 16:22:57 +02:00
Pavlo Golub
4cc6034165 Fix features/steps/standby_cluster.py under Windows (#1535)
Resolves #1534
2020-05-15 16:22:15 +02:00
Alexander Kukushkin
30aa355eb5 Shorten and beautify history log output (#1526)
when Patroni is trying to figure out the necessity of pg_rewind it could write the content history file from the primary into the log. The history file is growing with every failover/switchover and eventually starts taking too many lines in the log, most of them are not so much useful.
Instead of showing the raw data, we will show only 3 lines before the current replica timeline and 2 lines after.
2020-05-15 16:14:25 +02:00
Alexander Kukushkin
7cf0b753ab Update optime/leader with checkpoint location after clean shut down (#1527)
Potentially this information could be used in order to make sure that there is no data loss on switchover.
2020-05-15 16:13:16 +02:00
Alexander Kukushkin
285bffc68d Use pg_rewind with --restore-target-wal on 13 if possible (#1525)
On PostgreSQL 13 check if restore_command is configured and tell pg_rewind to use it
2020-05-15 16:05:07 +02:00
Alexander Kukushkin
e6ef3c340a Wake up the main thread after checkpoint is done (#1524)
Replicas are waiting for checkpoint indication via member key of the leader in DCS. The key is normally updated only one time per HA loop.
Without waking the main thread up replicas will have to wait up to `loop_wait` seconds longer than necessary.
2020-05-15 16:02:17 +02:00
Alexander Kukushkin
0d957076ca Improve compatibility with PostgreSQL 12 and 13 (#1523)
There were two new connection parameters introduced:
1. `gssencmode` in 12
2. `channel_binding` in 13
2020-05-13 13:13:25 +02:00
Takashi Idobe
ffb879c6e6 Adding --enable-v2=true flag to README.md (#1537)
Without supplying the --enable-v2=true flag to etcd on startup, patroni cannot find etcd to run.

after running `etcd --data-dir=data/etcd` in one terminal and `patroni postgres0.yaml` in another terminal, etcd starts fine, but the postgres instance cannot find etcd.

```
patroni postgres0.yaml
2020-05-09 15:58:48,560 ERROR: Failed to get list of machines from http://127.0.0.1:2379/v2: EtcdException('Bad response : 404 page not found\n')
2020-05-09 15:58:48,560 INFO: waiting on etcd
```
If etcd is passed the flag `--enable-v2=true` on startup, everything works fine.
2020-05-13 12:35:42 +02:00
ksarabu1
2551684007 bugfix for attribute error during bootstrap (#1538)
initial bootstrap by attaching Patroni to the running postgres was causing the following error.

```
  File "/xx/lib/python3.8/site-packages/patroni/ha.py", line 529, in update_cluster_history
    history = history[-self.cluster.config.max_timelines_history:]
AttributeError: 'NoneType' object has no attribute 'max_timelines_history'
```
2020-05-13 12:29:06 +02:00
Alexander Kukushkin
a6fbc2dd7b Handle the case when member conn_url is missing (#1510)
Close https://github.com/zalando/patroni/issues/1508
2020-05-13 12:26:39 +02:00
Alexander Kukushkin
703a129646 Don't try calling a non existing leader in patronictl pause (#1542)
While pausing a cluster without a leader on K8s patronictl was showing warnings that member "None" could not be accessed.
2020-05-13 12:22:47 +02:00
Alexander Kukushkin
fe23d1f2d0 Release 1.6.5 (#1503)
* bump version
* update release notes
* implement missing unit-tests and format code.
v1.6.5
2020-04-23 16:02:01 +02:00
Feike Steenbergen
8b860d7528 Skip missing values from pg_controldata (#1501)
Skip missing values from pg_controldata

When calling controldata(), it may return an empty dictionary, which in
turn caused the following error to occur:

    effective_configuration
        cvalue = parse_int(data[cname])
    KeyError: 'max_wal_senders setting'

Instead of causing a crash of this part, we now log the error and
continue.




This is the full output of the error:
```
2020-04-17 14:31:54,791 ERROR: Exception during execution of long running task restarting after failure
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/async_executor.py", line 97, in run
    wakeup = func(*args) if args else func()
  File "/usr/lib/python3/dist-packages/patroni/postgresql/__init__.py", line 707, in follow
    self.start(timeout=timeout, block_callbacks=change_role, role=role)
  File "/usr/lib/python3/dist-packages/patroni/postgresql/__init__.py", line 409, in start
    configuration = self.config.effective_configuration
  File "/usr/lib/python3/dist-packages/patroni/postgresql/config.py", line 983, in effective_configuration
    cvalue = parse_int(data[cname])
KeyError: 'max_wal_senders setting'
```
2020-04-23 12:51:38 +02:00
Alexander Kukushkin
be4c078d95 Etcd smart refresh members (#1499)
In dynamic environments it is common that during the rolling upgrade etcd nodes are changing their IP addresses. If the etcd node where Patroni is currently connected to is upgraded last, it could happen that the cached topology doesn't contain any live node anymore and therefore request can't be retried and totally fails, usually resulting in demoting of the primary.

In order to partially overcome the problem, Patroni is already doing a periodic (every 5 minutes) rediscovery of the etcd cluster topology, but in case of very fast node rotation there was still a possibility to hit the issue.

This PR is an attempt to address the problem. If the list of nodes exhausted, Patroni will try to perform initial discovery via an external mechanism, like resolving A or SRV dns records and if the new list is different from the original, Patroni will use it as the new etcd cluster topology.

In order to deal with tcp issues the connect_timeout is set to max(read_timeout/2, 1). It will make list of members exhaust faster, but leaves the time to perform topology rediscovery and another attempt.

The third issue addressed by this PR - it could happen that dns names of etcd nodes didn't change, but ip addresses are new, therefore we clean up the internal dns cache when doing topology rediscovery.

Besides that, this commit makes `_machines_cache` property pretty much static, it will be updated only when the topology has changed and helps to avoid concurrency issues.
2020-04-23 12:51:05 +02:00
Alexander Kukushkin
80fbe90056 Issue CHEKPOINT explicitely after promote happened (#1498)
It is safe to call pg_rewind on the replica only when pg_control on the primary contains information about the latest timeline. Postgres is usually doing immediate checkpoint right after promote and in most cases it works just fine. Unfortunately we regularly receive complaints that it takes to long (minutes) until the checkpoint is done and replicas can't perform rewind. At the same time doing the checkpoint manually immediately helped. So Patroni starts doing the same. When the promotion happened and postgres is not running in recovery, we explicitly issue the checkpoint.

We are intentionally not using the AsyncExecutor here, because we want the HA loop continues doing its normal flow.
2020-04-20 11:55:05 +02:00
ksarabu1
5fa912f8fa Make max history timelines in DCS configurable (#1491)
Close https://github.com/zalando/patroni/issues/1487
2020-04-17 16:27:38 +02:00
ksarabu1
fc962121d7 Reinitialize Bug fix with user defined tablespaces (#1494)
During reinit, Patroni removing pgdata and leaving user-defined tablespace directory. This is causing Patroni to loop in reinit.
2020-04-17 07:49:55 +02:00
Alexander Kukushkin
337f9efc9e Improve patronictl list output (#1486)
The redundant column `Column` will be presented in the table header.

Depending on output format `Tags` are serialized differently:
* For *pretty* format YAML is used, every element on the new line
* For *tsv* format for YAML is also used, but all elements and on the same line (similar to JSON)
* For *json* and *yaml* formats `Tags` are serialized into an appropriate format.

<details><summary>Examples of output in pretty formats:</summary>

```bash
$ patronictl list
+ Cluster: batman (6813309862653668387) +---------+----+-----------+---------------------+
|    Member   |      Host      |  Role  |  State  | TL | Lag in MB | Tags                |
+-------------+----------------+--------+---------+----+-----------+---------------------+
| postgresql0 | 127.0.0.1:5432 | Leader | running |  3 |           | clonefrom: true     |
|             |                |        |         |    |           | noloadbalance: true |
|             |                |        |         |    |           | nosync: true        |
+-------------+----------------+--------+---------+----+-----------+---------------------+
| postgresql1 | 127.0.0.1:5433 |        | running |  3 |       0.0 |                     |
+-------------+----------------+--------+---------+----+-----------+---------------------+

$ patronictl list badclustername
+ Cluster: badclustername (uninitialized) ------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+------+------+-------+----+-----------+
+--------+------+------+-------+----+-----------+
```
</details>

<details><summary>Example in tsv format:</summary>

```bash
Cluster Member  Host    Role    State   TL      Lag in MB       Pending restart Tags
batman  postgresql0     127.0.0.1:5432  Leader  running 2
batman  postgresql1     127.0.0.1:5433          running 2       0               {clonefrom: true, nofailover: true, noloadbalance: true, replicatefrom: postgresql0}
batman  postgresql2     127.0.0.1:5434          running 2       0       *       {replicatefrom: postgres1}
```
</details>

In addition to that, `patronictl list` command will stop showing keys with empty values in `json` and `yaml` formats.
<details><summary>Examples:</summary>

```yaml
$ patronictl list -f yaml
- Cluster: batman
  Host: 127.0.0.1:5432
  Member: postgresql0
  Role: Leader
  State: running
  TL: 2
- Cluster: batman
  Host: 127.0.0.1:5433
  Lag in MB: 0
  Member: postgresql1
  State: running
  TL: 2
  Tags:
    clonefrom: true
    nofailover: true
    noloadbalance: true
    replicatefrom: postgresql0
- Cluster: batman
  Host: 127.0.0.1:5434
  Lag in MB: 0
  Member: postgresql2
  Pending restart: '*'
  State: running
  TL: 2
  Tags:
    replicatefrom: postgres1
```

```json
$ patronictl list -f json | jq .
[
  {
    "Cluster": "batman",
    "Member": "postgresql0",
    "Host": "127.0.0.1:5432",
    "Role": "Leader",
    "State": "running",
    "TL": 2
  },
  {
    "Cluster": "batman",
    "Member": "postgresql1",
    "Host": "127.0.0.1:5433",
    "State": "running",
    "TL": 2,
    "Lag in MB": 0,
    "Tags": {
      "nofailover": true,
      "noloadbalance": true,
      "replicatefrom": "postgresql0",
      "clonefrom": true
    }
  },
  {
    "Cluster": "batman",
    "Member": "postgresql2",
    "Host": "127.0.0.1:5434",
    "State": "running",
    "TL": 2,
    "Lag in MB": 0,
    "Pending restart": "*",
    "Tags": {
      "replicatefrom": "postgres1"
    }
  }
]
```
</details>
2020-04-15 12:19:18 +02:00
ksarabu1
e3335bea1a Master stop timeout (#1445)
## Feature: Postgres stop timeout

Switchover/Failover operation hangs on signal_stop (or checkpoint) call when postmaster doesn't respond or  hangs for some reason(Issue described in [1371](https://github.com/zalando/patroni/issues/1371)). This is leading to service loss for an extended period of time until the hung postmaster starts responding or it is killed by some other actor.

### master_stop_timeout

The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled. When set to > 0 and the synchronous_mode is enabled, Patroni sends SIGKILL to the postmaster if the stop operation is running for more than the value set by master_stop_timeout. Set the value according to your durability/availability tradeoff. If the parameter is not set or set <= 0, master_stop_timeout does not apply.
2020-04-15 12:18:49 +02:00
Alexander Kukushkin
27cda08ece Improve unit-tests (#1479)
* tests were failing on windows and macos
* improve coverage
2020-04-09 10:34:35 +02:00
Alexander Kukushkin
369a93ce2a Handle cases when conn_url is not defined (#1482)
On K8s when one of the Patroni pods in starting there is valid annotation yet, which could cause failure in patronictl.
In addition to that handle cases if port isn't specified in the standby_cluster configuration.

Close https://github.com/zalando/patroni/issues/1100
Close https://github.com/zalando/patroni/issues/1463
2020-04-09 10:34:12 +02:00
Alexander Kukushkin
e58680f833 Convert recovery_min_apply_delay to ms before doing comparison (#1484)
Fixes https://github.com/zalando/patroni/issues/1483
2020-04-09 10:33:38 +02:00
Kaarel Moppel
d58006319b Patronictl - fail if a config file is specified explicitly but not found (#1467)
$ python3 patronictl.py -c postgresql0.yml list
Error: Provided config file postgresql0.yml not existing or no read rights. Check the -c/--config-file parameter
2020-04-01 15:52:43 +02:00
Kaarel Moppel
92d74af06e Better patronictl help text for the "flush" subcommand (#1466)
Right now it's not really clear from --help what it does, flushing in software
context usually means persisting...so actually it's the opposite, so make
the intention more explicit.
2020-04-01 15:51:50 +02:00
0m1xa
80354f6484 Update Dockerfile (#1461)
On postgres:12 find command without this pattern deletes file i18n_ctype which is needed by localdef. And localedef exit with code !=0
2020-04-01 15:51:19 +02:00