Merge branch 'feature/failover-switchover-definition' into refactor/failover-limitations-checks

2026-01-27 10:20:10 +00:00 · 2023-08-27 22:12:15 +02:00
parent 439d292c60 293c1e4cd3
commit 5fca21849c
2 changed files with 18 additions and 16 deletions
--- a/docs/rest_api.rst
+++ b/docs/rest_api.rst
@@ -565,7 +565,7 @@ When calling ``/switchover`` endpoint candidate can be specified but is not requ

 In the JSON body of the ``POST`` request, you must specify at least the ``leader`` field and, optionally, the ``candidate`` and ``scheduled_at`` field if you want to schedule a switchover at a specific time.

-Depending on the situation request might finish with different HTTP status codes and bodies. Status code **200** is returned when the switchover or failover successfully completed. If the switchover was successfully scheduled, Patroni will return HTTP status code **202**. In case something went wrong, the error status code (one of **400**, **412**, or **503**) will be returned with some details in the response body.
+Depending on the situation, requests might return different HTTP status codes and bodies. Status code **200** is returned when the switchover or failover successfully completed. If the switchover was successfully scheduled, Patroni will return HTTP status code **202**. In case something went wrong, the error status code (one of **400**, **412**, or **503**) will be returned with some details in the response body.

 ``DELETE /switchover`` can be used to delete the currently scheduled switchover.

@@ -586,7 +586,7 @@ Depending on the situation request might finish with different HTTP status codes
 	Successfully switched over to "postgresql2"


-**Example:** schedule a switchover from the leader to any other healthy standby in the cluster at a specific time
+**Example:** schedule a switchover from the leader to any other healthy standby in the cluster at a specific time.

 .. code-block:: bash

@@ -598,7 +598,7 @@ Depending on the situation request might finish with different HTTP status codes
 Failover
 ^^^^^^^^

-``/failover`` endpoint allows to perform a manual failover when there are no healthy nodes (e.g. to an asynchronous standby if all synchronous standbys are not healthy to promote). However there is no requirement for a cluster not to have leader - failover can also be run on a healthy cluster.
+``/failover`` endpoint can be used to perform a manual failover when there are no healthy nodes (e.g. to an asynchronous standby if all synchronous standbys are not healthy enough to promote). However there is no requirement for a cluster not to have leader - failover can also be run on a healthy cluster.

 In the JSON body of the ``POST`` request you must specify ``candidate`` field. If ``leader`` field is specified, switchover is triggered. 

@@ -645,20 +645,20 @@ Healthy standby
 There are a couple of checks that a member of a cluster should pass to be able to participate in the leader race during a switchover or to become a leader as a failover/switchover candidate:

 - be reachable via Patroni API,
- not to have ``nofailover`` tag,
+- not have ``nofailover`` tag set to ``true``,
 - have watchdog fully functional (if required by the configuration),
- in case of a switchover or a failover in a healthy cluster, not to exceed maximum replication lag (``maximum_lag_on_failover`` :ref:`configuration parameter <dynamic_configuration>`),
- in case of a switchover or a failover in a healthy cluster, not to have the timeline number smaller than the cluster timeline,
+- in case of a switchover or a failover in a healthy cluster, not exceed maximum replication lag (``maximum_lag_on_failover`` :ref:`configuration parameter <dynamic_configuration>`),
+- in case of a switchover or a failover in a healthy cluster, not have a timeline number smaller than the cluster timeline,
 - in :ref:`synchronous mode <synchronous_mode>`:

  - In case of a switchover (both with and without a candidate): be listed in the ``/sync`` key members.
  - For a failover in both healthy and unhealthy clusters, this check is omitted.

 .. warning::
-    In case of a failover in a cluster without a leader, a candidate will be allowed to promote even:
-	- if it is not in the ``/sync`` key members when synchronous mode is enabled,
-	- if its lag exceeds the maximum replication lag allowed,
-	- if it has the timeline number smaller than the cluster timeline.
+    In case of a failover in a cluster without a leader, a candidate will be allowed to promote even if:
+	- it is not in the ``/sync`` key members when synchronous mode is enabled,
+	- its lag exceeds the maximum replication lag allowed,
+	- it has the timeline number smaller than the cluster timeline.


 Restart endpoint
--- a/patroni/ctl.py
+++ b/patroni/ctl.py
@@ -1209,10 +1209,12 @@ def _do_failover_or_switchover(obj: Dict[str, Any], action: str, cluster_name: s
        candidate = click.prompt('Candidate ' + str(candidate_names), type=str, default='')

    # We allow manual failover to an aync node in the sync mode, so we better ask for the confirmation
-    if not force and action == 'failover':
-        if global_config.is_synchronous_mode and not cluster.sync.is_empty \
-           and not cluster.sync.matches(candidate, True) \
-           and not click.confirm(f'Are you sure you want to failover to the asynchronous node {candidate}'):
+    if all((not force,
+            action == 'failover',
+            global_config.is_synchronous_mode,
+            not cluster.sync.is_empty,
+            not cluster.sync.matches(candidate, True))):
+        if click.confirm(f'Are you sure you want to failover to the asynchronous node {candidate}'):
            raise PatroniCtlException('Aborting ' + action)

    if action == 'switchover' and scheduled is None and not force:
@@ -1245,8 +1247,8 @@ def _do_failover_or_switchover(obj: Dict[str, Any], action: str, cluster_name: s
        demote_msg = f', demoting current leader {cluster.leader.name}' if cluster.leader else ''
        if scheduled_at_str:
            # only switchover can be scheduled
-            if not click.confirm(f'Are you sure you want to schedule switchover of cluster \
-{cluster_name} at {scheduled_at_str}{demote_msg}?'):
+            if not click.confirm(f'Are you sure you want to schedule switchover of cluster'
+                                 f'{cluster_name} at {scheduled_at_str}{demote_msg}?'):
                raise PatroniCtlException('Aborting scheduled ' + action)
        else:
            if not click.confirm(f'Are you sure you want to {action} cluster {cluster_name}{demote_msg}?'):