237 Commits

Author SHA1 Message Date
Alexander Kukushkin
96b75fa7cb Special handling of check_recovery_conf for v12+ (#2292)
When starting as a replica it may take some time before Postgres starts accepting new connections, but meanwhile, it could happen that the leader transitioned to a different member and the `primary_conninfo` must be updated.

On pre v12 Patroni regularly checks `recovery.conf` in order to check that recovery parameters match the expectation. Starting from v12 recovery parameters were converted to GUC's and Patroni gets current values from the `pg_settings` view. The last one creates a problem when it takes more than a minute for Postgres to start accepting new connections.

Since Patroni attempts to execute at least `pg_is_in_recovery()` every HA loop, and it is raising at exception, the `check_recovery_conf()` effectively wasn't reachable until recovery is finished, but it changed when #2082 was introduced.

As a result of #2082 we got the following behavior:
1. Up to v12 (not including) everything was working as expected
2. v12 and v13 - Patroni restarting Postgres after 1m of recovery
3. v14+ - the `check_recovery_conf()` is not executed because the `replay_paused()` method raising an exception.

In order to properly handle changes of recovery parameters or leader transitioned to a different node on v12+, we will rely on the cached values of recovery parameters until Postgres becomes ready to execute queries.

Close https://github.com/zalando/patroni/issues/2289
2022-05-12 07:45:49 +02:00
Michael Banck
2d15e0dae6 Add target_session_attrs=read-write to standby_leader primary_conninfo (#2193)
This allows to have multiple hosts in a standby_cluster and ensures that the standby leader follows the main cluster's new leader after a switchover.

Partially addresses #2189
2022-02-10 15:50:14 +01:00
Alexander Kukushkin
fce889cd04 Compatibility with psycopg 3.0 (#2088)
By default `psycopg2` is preferred. The `psycopg>=3.0` will be used only if `psycopg2` is not available or its version is too old.
2021-11-19 14:32:54 +01:00
Alexander Kukushkin
250328b84b Use cached role as a fallback when postgres is slow (#2082)
In some extreme cases Postgres could be so slow that the normal monitoring query doesn't finish in a few seconds. It results in
the exception being raised from the `Postgresql._cluster_info_state_get()` method, which could lead to the situation that postgres isn't demoted on time.
In order to make it reliable we will catch the exception and use the cached state of postgres (`is_running()` and `role`) to determine whether postgres is running as a primary.

Close https://github.com/zalando/patroni/issues/2073
2021-10-07 16:08:21 +02:00
Alexander Kukushkin
d394b63c9f Release the leader lock when pg_controldata reports "shut down" (#2067)
Due to different reasons, it could happen that WAL archiving on the primary stuck or significantly delayed. If we try to do a switchover or shut it down, the shutdown will take forever and will not finish until the whole backlog of WALs is processed.
In the meantime, Patroni keeps updating the leader lock, which prevents other nodes from starting the leader race even if it is known that they received/applied all changes.

The `Database cluster state:` is changed to `"shut down"` after:
- all data is fsynced to disk and the latest checkpoint is written to WAL
- all streaming replicas confirmed that they received all changes (including the latest checkpoint)
- at the same time, the archiver process continues to do its job and the postmaster process is still running.

In order to solve this problem and make the switchover more reliable/fast in a case when `archive_command` is slow/failing, Patroni will remove the leader key immediately after `pg_controldata` started reporting PGDATA as `"shut down"` cleanly and it verified that there is at least one replica that received all changes. If there are no replicas that fulfill the condition the leader key isn't removed and the old behavior is retained, i.e. Patroni will keep updating it.
2021-10-05 10:55:35 +02:00
Alexander Kukushkin
0ceb59b49d Write prev LSN to before checkpoint to optime if wal_achive=on (#1889)
The #1527 introduced a feature of updating `/optime/leader` with the location of the last checkpoint after the Postgres was shutdown cleanly.

If wal archiving is enabled, Postgres always switching the WAL file before writing the checkpoint shutdown record. Normally it is not an issue, but for databases without too much write activity it could lead to the situation that the visible replication lag becomes equal to the size of a single WAL file. In fact, the previous WAL file is mostly empty and contains only a few records.

Therefore it should be safe to report the LSN of the SWITCH record before the shutdown checkpoint.
In order to do that, Patroni first gets the output of the pg_controldata and based on it calls pg_waldump two times:
* The first call reads the checkpoint record (and verifies that this is really the shutdown checkpoint).
* The next call reads the previous record and in case if it is the 'xlog switch' (for 9.3 and 9.4) or 'SWITCH' (for 9.5+), the LSN
of the SWITCH record is written to the `/optime/leader`.

In case of any mismatch, failure to call pg_waldump or parse its output, the old behavior is retained, i.e. `Latest checkpoint location` from the pg_controldata is used.

Close https://github.com/zalando/patroni/issues/1860
2021-07-05 09:29:39 +02:00
Alexander Kukushkin
6616acff58 Postpone writing postgresql.conf when joining running Postgres 12+ (#1956)
When joining already running Postgres, Patroni ensures that config files are set according to expectations.
With recovery parameters converted to GUCs in Postgres v12 it became a little problem, because when the `Postgresql` object is being created it is not yet known where the given replica is supposed to stream from.
It resulted in postgresql.conf first being written without recovery parameters, and on the next run of HA loop Patroni noticing inconsistencies and updating the config one more time.

For Postgres v12 it is not a big issue, but for v13+ it resulted in interruption of streaming replication.
2021-06-30 09:11:12 +02:00
Alexander Kukushkin
f3420e2db5 Compatibility with PostgreSQL 14 (#1926)
PostgreSQL 14 changed the behavior of replicas when certain parameters (like for example `max_connections`) are changed (increased): https://github.com/postgres/postgres/commit/15251c0a.
Instead of immediately exiting Postgres 14 pauses replication and waits for actions from the operator.

Since the `pg_is_wal_replay_paused()` returning `True` is the only indicator of such a change, Patroni on the replica will call the `pg_wal_replay_resume()`, which would cause either continue replication or shutdown (like previously).

So far Patroni was never calling `pg_wal_replay_resume()` on its own, therefore, to remain backward compatible it will call it only for PostgreSQL 14+.
2021-06-25 13:41:45 +02:00
Alexander Kukushkin
eaa98e71e3 Fix bug with unix socket connections (#1933)
When the unix_socket_directories is not known Patroni was immediately going back to tcp connection via the localhost.

The bug was introduced in https://github.com/zalando/patroni/pull/1865
2021-05-10 09:53:25 +02:00
melrifa
6d6b504cb8 Add support for patroni replication user socket connection (#1865)
Close #1866
2021-04-20 09:43:05 +02:00
Alexander Kukushkin
9edbe7e3f7 Fix little issues with custom bootstrap (#1891)
1. Set hot_standby=off only when we do PITR
2. Restart postgres after PITR is done to avoid warnings
3. Address invalid config issue https://github.com/zalando/patroni/issues/1870#issuecomment-800088643
2021-03-29 08:06:12 +02:00
Alexander Kukushkin
c7173aadd7 Failover logical slots (#1820)
Effectively, this PR consists of a few changes:

1. The easy part:
  In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots.
  In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster.

2. The tricky part:
  On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()`  function.
  When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward.

3. Additional requirements:
  In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots.

4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use.

Compatibility.
The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary.

The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed.

Close: https://github.com/zalando/patroni/issues/1749
2021-03-25 16:18:23 +01:00
Mark Mercado
09f2f579d7 Quick attempt at Prometheus (#1848)
Close https://github.com/zalando/patroni/issues/318
2021-03-04 12:37:29 +01:00
krishna
b3dc765e6d Choose synchronous nodes based on replication lag (#1786)
This commit makes it possible to configure the maximum lag (`maximum_lag_on_syncnode`) after which Patroni will "demote" the node from synchronous and replace it with another node.

The previous implementation always tried to stick to the same synchronous nodes (even if they are not optimal ones).
2021-02-02 15:45:02 +01:00
Alexander Kukushkin
89a15a2df4 Fix small issues with ignore-slots feature (#1797)
When there is no config key in DCS Patroni shouldn't try accessing ignore_slots, otherwise an exception is raised.

In addition to that implement missing unit-tests and fix linting issues in behave tests.
2020-12-16 18:10:12 +01:00
Alexander Kukushkin
0a1f389686 Release 2.0.0 (#1680)
* update release notes
* bump version
* change the default alignment in patronictl table output to `left`
* add missing tests
* add missing pieces to the documentation
2020-09-02 15:35:04 +02:00
Alexander Kukushkin
13e24d832d Advanced validation of PostgreSQL parameters (#1674)
So far Patroni was performing a comparison of the old value (in the `pg_settings`) with the new value (from Patroni configuration or from DCS) in order to figure out if reload or restart is required when the parameter has been changed. If the given parameter was missing in the `pg_settings` Patroni was ignoring it and not writing into the `postgresql.conf`.

In case if Postgres is not running, no validation has been performed and parameters and values were written into the config as it is.

It is not a very common mistake, but people tend to mistype parameter names or values.
Also, it happens that some parameters are removed in specific Postgres versions and some new are added (e.g. `checkpoint_segments` replaced with `min_wal_size` and `max_wal_size` in 9.5 or` wal_keep_segments` was replaced with `wal_keep_size` in 13).

Writing nonexistent parameters or invalid values into the `postgresql.conf` makes postgres unstartable.
This change doesn't solve the issue 100%, but at least approaching this goal very close.
2020-09-01 16:26:57 +02:00
Sergey Dudoladov
950eff27ad Optional fencing script (pre_promote) (#1099)
Call a fencing script after acquiring the leader lock. If the script didn't finish successfully - don't promote but remove leader key

Close https://github.com/zalando/patroni/issues/1567
2020-09-01 07:50:39 +02:00
Feike Steenbergen
e3bc546dd5 Move WAL and tablespaces after a failed init (#1631)
For init processes that use a symlinked WAL directory, or use custom scripts that create new tablespaces, these directories should also be renamed after a failed init attempt, as currently the following errors occur if the first init attempt failed, but a second one might succeed:

      fixing permissions on existing directory /var/lib/postgresql/data ... ok
      initdb: error: directory "/var/lib/postgresql/wal/pg_wal" exists but is not empty
      [...]
      File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1173, in post_bootstrap
        self.cancel_initialization()
      File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1168, in cancel_initialization
        raise PatroniException('Failed to bootstrap cluster')
      patroni.exceptions.PatroniException: 'Failed to bootstrap cluster'

In the remove_data_directory function the same happens for removing the data directory, it seems the same kind of thing should also happen when moving a data directory.

To ensure the data directory can still be used, the symlinks will point to the renamed directories.
2020-08-17 16:12:33 +02:00
ksarabu1
1ab709c5f0 Multi Sync Standby Support (#1594)
The new parameter `synchronous_node_count` is used by Patroni to manage number of synchronous standby databases. It is set to 1 by default. It has no effect when synchronous_mode is set to off. When enabled, Patroni manages precise number of synchronous standby databases based on parameter synchronous_node_count and adjusts the state in DCS & synchronous_standby_names as members join and leave.

This functionality can be further extended to support Priority (FIRST n) based synchronous replication & Quorum (ANY n) based synchronous replication in future.
2020-08-14 11:51:07 +02:00
Alexander Kukushkin
3341c898ff Add Etcd v3 protocol support via api gRPC-gateway (#1162)
The only python-etcd3 client working directly via gRPC still supports only a single endpoint, which is not very nice for high-availability.

Since Patroni is already using a heavily hacked version of python-etcd with smart retries and auto-discovery out-of-the-box, I decided to enhance the existing code with limited support of v3 protocol via gRPC-gateway.

Unfortunately, watches via gRPC-gateway requires us to open and keep the second connection to the etcd.

Known limitations:
* The very minimal supported version is 3.0.4. On earlier versions transactions don't work due to bugs in grpc-gateway. Without transactions we can't do atomic operations, i.e. leader locks.
* Watches work only starting from 3.1.0
* Authentication works only starting from 3.3.0
* gRPC-gateway does not support authentication using TLS Common Name. This is because gRPC-proxy terminates TLS from its client so all the clients share a cert of the proxy: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/authentication.md#using-tls-common-name
2020-07-31 14:33:40 +02:00
Alexander Kukushkin
ad5c686c11 Take advantage of pg_stat_wal_recevier (#1513)
So far Patroni was parsing `recovery.conf` or querying `pg_settings` in order to get the current values of recovery parameters. On PostgreSQL earlier than 12 it could easily happen that the value of `primary_conninfo` in the `recovery.conf` has nothing to do with reality. Luckily for us, on PostgreSQL 9.6+ there is a `pg_stat_wal_receiver` view, which contains current values of `primary_conninfo` and `primary_slot_name`. The password field is masked through, but this is fine, because authentication happens only during opening the connection. All other parameters we compare as usual.

Another advantage of `pg_stat_wal_recevier` - it contains the current timeline, therefore on 9.6+ we don't need to use the replication connection trick if walreceiver process is alive.

If there is no walreceiver process available or it is not streaming we will stick to old methods.
2020-05-15 18:04:24 +02:00
Alexander Kukushkin
08b3d5d20d Move ensure_clean_shutdown into rewind module (#1528)
Logically fits there better
2020-05-15 16:22:57 +02:00
Alexander Kukushkin
30aa355eb5 Shorten and beautify history log output (#1526)
when Patroni is trying to figure out the necessity of pg_rewind it could write the content history file from the primary into the log. The history file is growing with every failover/switchover and eventually starts taking too many lines in the log, most of them are not so much useful.
Instead of showing the raw data, we will show only 3 lines before the current replica timeline and 2 lines after.
2020-05-15 16:14:25 +02:00
Alexander Kukushkin
7cf0b753ab Update optime/leader with checkpoint location after clean shut down (#1527)
Potentially this information could be used in order to make sure that there is no data loss on switchover.
2020-05-15 16:13:16 +02:00
Alexander Kukushkin
0d957076ca Improve compatibility with PostgreSQL 12 and 13 (#1523)
There were two new connection parameters introduced:
1. `gssencmode` in 12
2. `channel_binding` in 13
2020-05-13 13:13:25 +02:00
Alexander Kukushkin
fe23d1f2d0 Release 1.6.5 (#1503)
* bump version
* update release notes
* implement missing unit-tests and format code.
2020-04-23 16:02:01 +02:00
ksarabu1
e3335bea1a Master stop timeout (#1445)
## Feature: Postgres stop timeout

Switchover/Failover operation hangs on signal_stop (or checkpoint) call when postmaster doesn't respond or  hangs for some reason(Issue described in [1371](https://github.com/zalando/patroni/issues/1371)). This is leading to service loss for an extended period of time until the hung postmaster starts responding or it is killed by some other actor.

### master_stop_timeout

The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled. When set to > 0 and the synchronous_mode is enabled, Patroni sends SIGKILL to the postmaster if the stop operation is running for more than the value set by master_stop_timeout. Set the value according to your durability/availability tradeoff. If the parameter is not set or set <= 0, master_stop_timeout does not apply.
2020-04-15 12:18:49 +02:00
Alexander Kukushkin
4a29caa9d3 On role change callback didn't fire on failed primary (#1420)
Bug was introduced in https://github.com/zalando/patroni/pull/703
Close https://github.com/zalando/patroni/issues/1418
2020-02-27 12:22:44 +01:00
Alexander Kukushkin
80ce61876e Don't create permanent physical slot with name of the primary (#1392)
It is a regular issue that primary is recycling WALs when one of the replicas is down for a long time. So far there were only two solutions for such a problem and both of them are not perfect:
1. Increase `wal_keep_segments`, but it is hard to guess the good value.
2. Use continuous archiving and PITR, but it is not always possible.

This PR is introducing the way to solve the problem for static clusters, with a fixed number of nodes and names that never change. You just need to list the names of all nodes in the `slots` so the primary will not remove the slot when the node is down (not registered in DCS).
Of course, the primary will not create the permanent slot which is matching its own name.

Usage example: let's assume you have a cluster with nodes named *abc1*, *abc2*, and *abc3*.
You have to run `patronictl edit-config` and put the following snippet into the configuration:
```yaml
slots:
  abc1:
    type: physical
  abc2:
    type: physical
  abc3:
    type: physical
```

If the node *abc2* is the primary, it will always create slots for *abc1* and *abc3* even if they are not running, but will not create slot *abc2*.
Other nodes will behave the same.

Close #280
2020-02-20 10:07:43 +01:00
Alexander Kukushkin
1461d7d4b8 Allow certain recovery parameters be defined in the custom_conf (#1335)
Fixes https://github.com/zalando/patroni/issues/1333
2020-01-15 12:41:07 +01:00
Igor Yanchenko
ea76a40845 Make sure postgresql.pgpass is a file or it does not exist (#1337)
Also make sure that it is located in the writable directory.
2020-01-15 12:40:41 +01:00
Alexander Kukushkin
16d1ffdde7 Update timeline on standby cluster (#1332)
Fixes https://github.com/zalando/patroni/issues/1031
2019-12-20 12:56:00 +01:00
Alexander Kukushkin
0693fe7dd0 Housekeeping (#1315)
* Reduce memory usage by patroni init process
* More cleanup in setup.py
* Implement missing tests
2019-12-04 11:28:46 +01:00
Alexander Kukushkin
e1d569ad75 Inherit CaseInsensitiveDict from urllib3 HTTPHeaderDict (#1302)
It might look like a hack, but the API is stable enough and didn't change in the past 3+ years.
2019-12-02 12:14:59 +01:00
Alexander Kukushkin
7793887ea7 Fix tests on windows (#1303)
and disable junit, it produces a deprecation warning
2019-11-27 14:57:33 +01:00
Alexander Kukushkin
5ea73d50ed Make it possible to apply some recovery params without restart (#1260)
Starting from PostgreSQL 12 the following recovery parameters could be changed without restart, but Patroni didn't yet support it:
* archive_cleanup_command
* promote_trigger_file
* recovery_end_command
* recovery_min_apply_delay

In future postgres releases this list will be extended and Patroni will support it automatically.
2019-11-11 16:18:23 +01:00
Alexander Kukushkin
29ac77b6e7 Compare all recovery parameters (#1208)
Previously check_recovery_conf() function was only checking whether primary_conninfo has changed and never taking into account all other recovery parameters.

Fixes https://github.com/zalando/patroni/issues/1201
2019-10-30 12:30:09 +01:00
Alexander Kukushkin
9e87b00d36 Kill callback child processes when it is necessary (#1242)
Not doing so makes it hard to implement callbacks in bash and eventually can lead to the situation when two callbacks are running at the same time. In case if we failed to kill the child process we will still wait for it to finish.

The same problem could happen with custom bootstrap, therefore if we happen to kill the custom bootstrap process we also kill all child subprocesses.

Closes https://github.com/zalando/patroni/issues/1238
2019-10-29 12:44:18 +01:00
Alexander Kukushkin
828585079f Improve workflow when PGDATA is not empty during bootstrap (#1217)
Recently it has happened two times when people tried to deploy the new cluster but postgres data directory wasn't empty and also wasn't valid. In this case Patroni was still creating initialize key in DCS and trying to start the postgres up.
Now it will complain about non-empty invalid postgres data directory and exit.

Close https://github.com/zalando/patroni/issues/1216
2019-10-25 14:09:44 +02:00
Alexander Kukushkin
0947ac1e43 Fix race condition in postmaster_start_time() (#1243)
when it is executed not from the main thread we need to create a new cursor object.
2019-10-24 11:23:34 +02:00
Alexander Kukushkin
f4623c4e8e Build recovery params in a separate method (#1219)
In addition to that try to protect from the case when some recovery parameters are set in one of included files by explicitly setting their value to an empty string on postgres 12.

Simplifies https://github.com/zalando/patroni/pull/1208
2019-10-11 20:18:06 +02:00
Alexander Kukushkin
3d29cb7e50 Perform pg_ctl reload regardless of config changes (#1204)
It is possible that some config files are not controlled by Patroni and when somebody is doing reload via REST API or by sending SIGHUP to Patroni process the usual expectation is that postgres will also be reloaded, but it didn't happen when there were no changes in the postgresql section of Patroni config.

For example one might replace ssl_cert_file and ssl_key_file on the filesystem and starting from PostgreSQL 10 it just requires a reload, but Patroni wasn't doing it.

In addition to that fix the issue with handling of `wal_buffers`. The default value depends on `shared_buffers` and `wal_segment_size` and therefore Patroni was exposing pending_restart when the new value in the config was explicitly set to -1 (default).

Close https://github.com/zalando/patroni/issues/1198
2019-10-10 14:49:30 +02:00
Alexander Kukushkin
1572c02ced Use passfile in the primary_conninfo instead of password (#1194)
Fixed a few minor issues related to the #1134 and #1122
Close https://github.com/zalando/patroni/issues/1185
2019-10-09 18:04:14 +02:00
Alexander Kukushkin
0b1b1e3b54 Compatibility with postgresql 12 (#1068)
* use `SHOW primary_conninfo` instead of parsing config file on pg12
* strip out standby and recovery parameters from postgresql.auto.conf before starting the postgres 12

Patroni config remains backward compatible.
Despite for example `restore_command` converted to a GUC starting from postgresql 12, in the Patroni configuration you can still keep it in the `postgresql.recovery_conf` section.
If you put it into `postgresql.parameters.restore_command`, that will also work, but it is important not to mix both ways:
```yaml
# is OK
postgresql:
  parameters:
    restore_command: my_restore_command
    archive_cleanup_command: my_archive_cleanup_command

# is OK
postgresql:
  recovery_conf:
    restore_command: my_restore_command
    archive_cleanup_command: my_archive_cleanup_command

# is NOT ok
postgresql:
  parameters:
    restore_command: my_restore_command
  recovery_conf:
    archive_cleanup_command: my_archive_cleanup_command
```
2019-08-02 16:00:55 +02:00
Alexander Kukushkin
a4bd6a9b4b Refactor postgresql class (#1060)
* Convert postgresql.py into a package
* Factor out cancellable process into a separate class
* Factor out connection handler into a separate class
* Move postmaster into postgresql package
* Factor out pg_rewind into a separate class
* Factor out bootstrap into a separate class
* Factor out slots handler into a separate class
* Factor out postgresql config handler into a separate class
* Move callback_executor into postgresql package

This is just a careful refactoring, without code changes.
2019-05-21 16:02:47 +02:00
Alexander Kukushkin
bba9066315 Make it possible to run pg_rewind without superuser on pg11+ (#1035)
* expose the current patroni version in DCS
* expose `checkpoint_after_promote` flag in DCS as an indicator that pg_rewind could be safely executed
* other nodes will wait until this flag is set instead of connecting as superuser and issuing the CHECKPOINT
* define `postgresql.authention.rewind` with credentials for pg_rewind in patroni configuration files.
* create user for pg_rewind if postgres is 11+
* grant execute on functions required for pg_rewind to rewind user
2019-05-02 14:07:26 +02:00
Alexander Kukushkin
f0b784fe7f Manage pg_ident.conf with Patroni (#1037)
This functionality works similarly to the `pg_hba`:
If the `postgresql.pg_ident` is defined in the config file or DCS, Patroni will write its value to pg_ident.conf, however, if `postgresql.parameters.ident_file` is defined, Patroni will assume that pg_ident is managed from outside and not update the file.
2019-04-23 16:16:53 +02:00
Pavlo Golub
b53a29c022 Fix unit-tests for Windows (#1014)
Closes #1013
2019-04-02 13:58:17 +02:00
Alexander Kukushkin
e38fe78b56 Fix callbacks behavior (mostly for standby cluster) (#998)
First of all, this patch changes the behavior of `on_start`/`on_restart` callbacks, they will be called only when postgres is started or restarted without role changes. In case if the member is promoted or demoted only the `on_role_change` callback will be executed. `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument.
Before that `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument.

In addition to that, the REST API will return standby_leader role for the leader of the standby cluster.

Closes https://github.com/zalando/patroni/issues/988
2019-03-29 10:28:07 +01:00