mirror of
https://github.com/outbackdingo/patroni.git
synced 2026-01-27 10:20:10 +00:00
Merge branch 'feature/failover-switchover-definition' into refactor/failover-limitations-checks
This commit is contained in:
@@ -565,7 +565,7 @@ When calling ``/switchover`` endpoint candidate can be specified but is not requ
|
||||
|
||||
In the JSON body of the ``POST`` request, you must specify at least the ``leader`` field and, optionally, the ``candidate`` and ``scheduled_at`` field if you want to schedule a switchover at a specific time.
|
||||
|
||||
Depending on the situation request might finish with different HTTP status codes and bodies. Status code **200** is returned when the switchover or failover successfully completed. If the switchover was successfully scheduled, Patroni will return HTTP status code **202**. In case something went wrong, the error status code (one of **400**, **412**, or **503**) will be returned with some details in the response body.
|
||||
Depending on the situation, requests might return different HTTP status codes and bodies. Status code **200** is returned when the switchover or failover successfully completed. If the switchover was successfully scheduled, Patroni will return HTTP status code **202**. In case something went wrong, the error status code (one of **400**, **412**, or **503**) will be returned with some details in the response body.
|
||||
|
||||
``DELETE /switchover`` can be used to delete the currently scheduled switchover.
|
||||
|
||||
@@ -586,7 +586,7 @@ Depending on the situation request might finish with different HTTP status codes
|
||||
Successfully switched over to "postgresql2"
|
||||
|
||||
|
||||
**Example:** schedule a switchover from the leader to any other healthy standby in the cluster at a specific time
|
||||
**Example:** schedule a switchover from the leader to any other healthy standby in the cluster at a specific time.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
@@ -598,7 +598,7 @@ Depending on the situation request might finish with different HTTP status codes
|
||||
Failover
|
||||
^^^^^^^^
|
||||
|
||||
``/failover`` endpoint allows to perform a manual failover when there are no healthy nodes (e.g. to an asynchronous standby if all synchronous standbys are not healthy to promote). However there is no requirement for a cluster not to have leader - failover can also be run on a healthy cluster.
|
||||
``/failover`` endpoint can be used to perform a manual failover when there are no healthy nodes (e.g. to an asynchronous standby if all synchronous standbys are not healthy enough to promote). However there is no requirement for a cluster not to have leader - failover can also be run on a healthy cluster.
|
||||
|
||||
In the JSON body of the ``POST`` request you must specify ``candidate`` field. If ``leader`` field is specified, switchover is triggered.
|
||||
|
||||
@@ -645,20 +645,20 @@ Healthy standby
|
||||
There are a couple of checks that a member of a cluster should pass to be able to participate in the leader race during a switchover or to become a leader as a failover/switchover candidate:
|
||||
|
||||
- be reachable via Patroni API,
|
||||
- not to have ``nofailover`` tag,
|
||||
- not have ``nofailover`` tag set to ``true``,
|
||||
- have watchdog fully functional (if required by the configuration),
|
||||
- in case of a switchover or a failover in a healthy cluster, not to exceed maximum replication lag (``maximum_lag_on_failover`` :ref:`configuration parameter <dynamic_configuration>`),
|
||||
- in case of a switchover or a failover in a healthy cluster, not to have the timeline number smaller than the cluster timeline,
|
||||
- in case of a switchover or a failover in a healthy cluster, not exceed maximum replication lag (``maximum_lag_on_failover`` :ref:`configuration parameter <dynamic_configuration>`),
|
||||
- in case of a switchover or a failover in a healthy cluster, not have a timeline number smaller than the cluster timeline,
|
||||
- in :ref:`synchronous mode <synchronous_mode>`:
|
||||
|
||||
- In case of a switchover (both with and without a candidate): be listed in the ``/sync`` key members.
|
||||
- For a failover in both healthy and unhealthy clusters, this check is omitted.
|
||||
|
||||
.. warning::
|
||||
In case of a failover in a cluster without a leader, a candidate will be allowed to promote even:
|
||||
- if it is not in the ``/sync`` key members when synchronous mode is enabled,
|
||||
- if its lag exceeds the maximum replication lag allowed,
|
||||
- if it has the timeline number smaller than the cluster timeline.
|
||||
In case of a failover in a cluster without a leader, a candidate will be allowed to promote even if:
|
||||
- it is not in the ``/sync`` key members when synchronous mode is enabled,
|
||||
- its lag exceeds the maximum replication lag allowed,
|
||||
- it has the timeline number smaller than the cluster timeline.
|
||||
|
||||
|
||||
Restart endpoint
|
||||
|
||||
@@ -1209,10 +1209,12 @@ def _do_failover_or_switchover(obj: Dict[str, Any], action: str, cluster_name: s
|
||||
candidate = click.prompt('Candidate ' + str(candidate_names), type=str, default='')
|
||||
|
||||
# We allow manual failover to an aync node in the sync mode, so we better ask for the confirmation
|
||||
if not force and action == 'failover':
|
||||
if global_config.is_synchronous_mode and not cluster.sync.is_empty \
|
||||
and not cluster.sync.matches(candidate, True) \
|
||||
and not click.confirm(f'Are you sure you want to failover to the asynchronous node {candidate}'):
|
||||
if all((not force,
|
||||
action == 'failover',
|
||||
global_config.is_synchronous_mode,
|
||||
not cluster.sync.is_empty,
|
||||
not cluster.sync.matches(candidate, True))):
|
||||
if click.confirm(f'Are you sure you want to failover to the asynchronous node {candidate}'):
|
||||
raise PatroniCtlException('Aborting ' + action)
|
||||
|
||||
if action == 'switchover' and scheduled is None and not force:
|
||||
@@ -1245,8 +1247,8 @@ def _do_failover_or_switchover(obj: Dict[str, Any], action: str, cluster_name: s
|
||||
demote_msg = f', demoting current leader {cluster.leader.name}' if cluster.leader else ''
|
||||
if scheduled_at_str:
|
||||
# only switchover can be scheduled
|
||||
if not click.confirm(f'Are you sure you want to schedule switchover of cluster \
|
||||
{cluster_name} at {scheduled_at_str}{demote_msg}?'):
|
||||
if not click.confirm(f'Are you sure you want to schedule switchover of cluster'
|
||||
f'{cluster_name} at {scheduled_at_str}{demote_msg}?'):
|
||||
raise PatroniCtlException('Aborting scheduled ' + action)
|
||||
else:
|
||||
if not click.confirm(f'Are you sure you want to {action} cluster {cluster_name}{demote_msg}?'):
|
||||
|
||||
Reference in New Issue
Block a user