Commit Graph

6 Commits

Author SHA1 Message Date
Alexander Kukushkin
680444ae13 Reduce lock time taken by dcs.get_cluster() (#989)
`dcs.cluster` and `dcs.get_cluster()` are using the same lock resource and therefore when get_cluster call is slow due to the slowness of DCS it was also affecting the `dcs.cluster` call, which in return was making health-check requests slow.
2019-03-12 22:37:11 +01:00
Alexander Kukushkin
0c516de147 Create headless service associated with $SCOPE-config endpoint (#958)
if there is no service defined k8s assumes that endpoint is orphaned and removes it.
Patroni tries to create the service only in case if use_endpoints is enabled if the following cases:
1. Upon start
2. When it tries to (re-)create the config endpoint

If for some reason creation of the service has failed, Patroni will retry it on every cycle of HA loop. Usually it fails due to lack of permissions and if you don't want to give such permissions to the service account used by Patroni, you can create the service explicitly in the deployment manifest.
2019-02-15 13:35:04 +01:00
Alexander Kukushkin
2efd97baab Permanent replication slots (#819)
Permanent replication slots are preserved on failover/switchover, that is Patroni on the new primary will create configured replication slots right after doing promote.

Slots could be configured with the help of `patronictl edit-config`.
The initial configuration could be also done in the `bootstrap.dcs`

```yaml
slots:
  permanent_physical_1:
    type: physical
  permanent_logical_1:
    type: logical
    database: foo
    plugin: pgoutput
```

It is the responsibility of the operator to make sure that there are no clashes in names between replication slots automatically created by Patroni for members and permanent replication slots.

Closes https://github.com/zalando/patroni/issues/656
2018-10-31 11:37:42 +01:00
Alexander Kukushkin
a0c8491abb Don't swallow silently all errors from k8s API (#611)
Output exception trace to the logs when http status code == 403, something is wrong with permissions.

When http status code == 409 -- such error could be ignored, because object probably was created or updated by another process.

For all other http status codes it will also produce stack traces.

I hope it will help to debug issues similar to the https://github.com/zalando/patroni/issues/606
2018-01-26 09:57:17 +01:00
Alexander Kukushkin
03c2a85d23 Expose current timeline in DCS and via API (#591)
It is very easy to get current timeline on the master by executing
```sql
SELECT ('x' || SUBSTR(pg_walfile_name(pg_current_wal_lsn()), 1, 8))::bit(32)::int
```

Unfortunately the same method doesn't work when postgres is_in_recovery. Therefore we will use replication connection for that on the replicas. In order to avoid opening and closing replication connection on every HA loop we will cache the result if its value matches with the timeline of the master.

Also this PR introduces a new key in DCS: `/history`. It will contain a json serialized object with timeline history in a format similar to the usual history files. The differences are:
* Second column is the absolute wal position in bytes, instead of LSN
* Optionally there might be a fourth column - timestamp, (mtime of history file)
2018-01-05 15:25:56 +01:00
Alexander Kukushkin
4328c15010 Make Patroni Kubernetes native (#500)
* Use ConfigMaps or Endpoins for leader elections and to keep cluster state
* Label pods with a postgres role
* change behavior of pip install. From now on it will not install all dependencies, you have to specify explicitly DCS you want to use Patroni with: `pip install patroni[etcd,zookeeper,kubernetes]`
2017-12-08 16:55:00 +01:00