VAULT-28478: Updates to autopilot docs (#28331)

* restructure * update command * fixes * fix command flags * revert makefile change * remove tick log
2025-11-03 20:17:59 +00:00 · 2024-09-17 10:53:18 +02:00
parent c140470639
commit d00715d129
9 changed files with 289 additions and 149 deletions
--- a/website/content/api-docs/system/storage/raftautopilot.mdx
+++ b/website/content/api-docs/system/storage/raftautopilot.mdx
@@ -35,54 +35,69 @@ $ curl \
 ```json
 {
  "healthy": true,
  "failure_tolerance": 1,
  "healthy": true,
  "leader": "vault_1",
  "servers": {
-    "raft1": {
+    "vault_1": {
      "id": "raft1",
      "name": "raft1",
      "address": "127.0.0.1:8201",
-      "node_status": "alive",
+      "healthy": true,
      "id": "vault_1",
      "last_contact": "0s",
      "last_index": 63,
      "last_term": 3,
-      "last_index": 459,
+      "name": "vault_1",
-      "healthy": true,
+      "node_status": "alive",
-      "stable_since": "2021-03-19T20:14:11.831678-04:00",
+      "node_type": "voter",
      "stable_since": "2024-08-29T16:02:45.639829+02:00",
      "status": "leader",
-      "meta": null
+      "version": "1.17.3"
    },
-    "raft2": {
+    "vault_2": {
-      "id": "raft2",
+      "address": "127.0.0.1:8203",
      "name": "raft2",
      "address": "127.0.0.2:8201",
      "node_status": "alive",
      "last_contact": "516.49595ms",
      "last_term": 3,
      "last_index": 459,
      "healthy": true,
-      "stable_since": "2021-03-19T20:14:19.831931-04:00",
+      "id": "vault_2",
      "last_contact": "678.62575ms",
      "last_index": 63,
      "last_term": 3,
      "name": "vault_2",
      "node_status": "alive",
      "node_type": "voter",
      "stable_since": "2024-08-29T16:02:47.640976+02:00",
      "status": "voter",
-      "meta": null
+      "version": "1.17.3"
    },
-    "raft3": {
+    "vault_3": {
-      "id": "raft3",
+      "address": "127.0.0.1:8205",
      "name": "raft3",
      "address": "127.0.0.3:8201",
      "node_status": "alive",
      "last_contact": "196.706591ms",
      "last_term": 3,
      "last_index": 459,
      "healthy": true,
-      "stable_since": "2021-03-19T20:14:25.83565-04:00",
+      "id": "vault_3",
      "last_contact": "3.969159375s",
      "last_index": 63,
      "last_term": 3,
      "name": "vault_3",
      "node_status": "alive",
      "node_type": "voter",
      "stable_since": "2024-08-29T16:02:49.640905+02:00",
      "status": "voter",
-      "meta": null
+      "version": "1.17.3"
    }
  },
-  "leader": "raft1",
+  "voters": [
-  "voters": ["raft1", "raft2", "raft3"],
+    "vault_1",
-  "non_voters": null
+    "vault_2",
    "vault_3"
  ]
 }
 ```
 The `failure_tolerance` of a cluster is the number of nodes in the cluster that could
 fail gradually without causing an outage.
 When verifying the health of your cluster, check the following fields of each server:
 - `healthy`: whether Autopilot considers this node healthy or not
 - `status`: the voting status of the node. This will be `voter`, `leader`, or [`non-voter`](/vault/docs/concepts/integrated-storage#non-voting-nodes-enterprise-only)")
 - `last_index`: the index of the last applied Raft log. This should be close to the `last_index` value of the leader.
 - `version`: the version of Vault running on the server
 - `node_type`: the type of node. On CE, this will always be `voter`. See below for an explanation of Enterprise node types.
 ### Enterprise only
 Vault Enterprise will include additional output in its API response to indicate the current state of redundancy zones,
@@ -149,7 +164,7 @@ automated upgrade progress (if any), and optimistic failure tolerance.
      }
    },
    "status": "await-new-voters",
-    "target_version": "1.12.0",
+    "target_version": "1.17.5",
    "target_version_non_voters": [
      "vault_5"
    ]
@@ -161,6 +176,11 @@ automated upgrade progress (if any), and optimistic failure tolerance.
 }
 ```
 `optimistic_failure_tolerance` describes the number of healthy active and
 back-up voting servers that can fail gradually without causing an outage.
@include 'autopilot/node-types.mdx'
 ## Get configuration
 This endpoint is used to get the configuration of the autopilot subsystem of Integrated Storage.
@@ -203,31 +223,7 @@ This endpoint is used to modify the configuration of the autopilot subsystem of
 ### Parameters
- `cleanup_dead_servers` `(bool: false)` - Controls whether to remove dead servers from
+@include 'autopilot/config.mdx'
  the Raft peer list periodically or when a new server joins. This requires that
  `min_quorum` is also set.
 - `last_contact_threshold` `(string: "10s")` - Limit on the amount of time a server can
  go without leader contact before being considered unhealthy.
 - `dead_server_last_contact_threshold` `(string: "24h")` - Limit on the amount of time
  a server can go without leader contact before being considered failed. This
  takes effect only when `cleanup_dead_servers` is `true`. This can not be set to a value
  smaller than 1m. **We strongly recommend that this is kept at a high duration, such as a day,
  as it being too low could result in removal of nodes that aren't actually dead.**
 - `max_trailing_logs` `(int: 1000)` - Amount of entries in the Raft Log that a server
  can be behind before being considered unhealthy.
 - `min_quorum` `(int: 3)` - Minimum number of servers allowed in a cluster before
  autopilot can prune dead servers. This should at least be 3. Applicable only for
  voting nodes.
 - `server_stabilization_time` `(string: "10s")` - Minimum amount of time a server must
  be in a stable, healthy state before it can be added to the cluster.
 - `disable_upgrade_migration` `(bool: false)` - Disables automatically upgrading Vault using
  autopilot. (Enterprise-only)
 ### Sample request
--- a/website/content/docs/commands/operator/raft.mdx
+++ b/website/content/docs/commands/operator/raft.mdx
@@ -128,6 +128,13 @@ Usage: vault operator raft list-peers
 }
 ```
 Use the output of `list-peers` to ensure that your cluster is in an expected state.
 If you've removed a server using `remove-peer`, the server should no longer be
 listed in the `list-peers` output. If you've added a server using `add-peer` or
 through `retry_join`, check the `list-peers` output to see that it has been added
 to the cluster and (if the node has not been added as a non-voter)
 it has been promoted to a voter.
 ## remove-peer
 This command is used to remove a node from being a peer to the Raft cluster. In
@@ -229,14 +236,9 @@ Subcommands:
 ### autopilot state
 Displays the state of the raft cluster under integrated storage as seen by
-autopilot. It shows whether autopilot thinks the cluster is healthy or not,
+autopilot. It shows whether autopilot thinks the cluster is healthy or not.
 and how many nodes could fail before the cluster becomes unhealthy ("Failure Tolerance").
-State includes a list of all servers by nodeID and IP address. Last Index
+State includes a list of all servers by nodeID and IP address.
 indicates how close the state on each node is to the leader's.
 A node can have a status of "leader", "voter", and
 "[non-voter](/vault/docs/concepts/integrated-storage#non-voting-nodes-enterprise-only)".
 ```text
 Usage: vault operator raft autopilot state
@@ -251,32 +253,58 @@ Usage: vault operator raft autopilot state
 ```text
 Healthy:                         true
 Failure Tolerance:               1
-Leader:                       raft1
+Leader:                          vault_1
 Voters:
-   raft1
+   vault_1
-   raft2
+   vault_2
-   raft3
+   vault_3
 Servers:
-   raft1
+   vault_1
-      Name:            raft1
+      Name:              vault_1
      Address:           127.0.0.1:8201
      Status:            leader
      Node Status:       alive
      Healthy:           true
      Last Contact:      0s
      Last Term:         3
-      Last Index:      38
+      Last Index:        61
-   raft2
+      Version:           1.17.3
-      Name:            raft2
+      Node Type:         voter
-      Address:         127.0.0.2:8201
+   vault_2
      Name:              vault_2
      Address:           127.0.0.1:8203
      Status:            voter
      Node Status:       alive
      Healthy:           true
-      Last Contact:    2.514176729s
+      Last Contact:      564.765375ms
      Last Term:         3
-      Last Index:      38
+      Last Index:        61
      Version:           1.17.3
      Node Type:         voter
   vault_3
      Name:              vault_3
      Address:           127.0.0.1:8205
      Status:            voter
      Node Status:       alive
      Healthy:           true
      Last Contact:      3.814017875s
      Last Term:         3
      Last Index:        61
      Version:           1.17.3
      Node Type:         voter
 ```
-Vault Enterprise will include additional output related to automated upgrades and redundancy zones.
+
 The "Failure Tolerance" of a cluster is the number of nodes in the cluster that could
 fail gradually without causing an outage.
 When verifying the health of your cluster, check the following fields of each server:
 - Healthy: whether Autopilot considers this node healthy or not
 - Status: the voting status of the node. This will be `voter`, `leader`, or [`non-voter`](/vault/docs/concepts/integrated-storage#non-voting-nodes-enterprise-only).
 - Last Index: the index of the last applied Raft log. This should be close to the "Last Index" value of the leader.
 - Version: the version of Vault running on the server
 - Node Type: the type of node. On CE, this will always be `voter`. See below for an explanation of Enterprise node types.
 Vault Enterprise will include additional output related to automated upgrades, optimistic failure tolerance, and redundancy zones.
 #### Example Vault enterprise output
@@ -292,7 +320,7 @@ Redundancy Zones:
      Failure Tolerance: 1
 Upgrade Info:
   Status: await-new-voters
-   Target Version: 1.12.0
+   Target Version: 1.17.5
   Target Version Voters:
   Target Version Non-Voters: vault_5
   Other Version Voters: vault_1, vault_3
@@ -310,6 +338,11 @@ Upgrade Info:
         Other Version Non-Voters: vault_4
 ```
 "Optimistic Failure Tolerance" describes the number of healthy active and
 back-up voting servers that can fail gradually without causing an outage.
@include 'autopilot/node-types.mdx'
 ### autopilot get-config
 Returns the configuration of the autopilot subsystem under integrated storage.
@@ -337,29 +370,49 @@ Usage: vault operator raft autopilot set-config [options]
 Flags applicable to this command are the following:
- `cleanup-dead-servers` `(bool)` - Controls whether to remove dead servers from
+- `cleanup-dead-servers` `(bool: false)` - Controls whether to remove dead servers from
  the Raft peer list periodically or when a new server joins. This requires that
-  `min-quorum` is also set. Defaults to `false`.
+  `min-quorum` is also set.
- `last-contact-threshold` `(string)` - Limit on the amount of time a server can
+- `last-contact-threshold` `(string: "10s")` - Limit on the amount of time a server can
-  go without leader contact before being considered unhealthy. Defaults to `10s`.
+  go without leader contact before being considered unhealthy.
- `dead-server-last-contact-threshold` `(string)` - Limit on the amount of time
+- `dead-server-last-contact-threshold` `(string: "24h")` - Limit on the amount of time
-  a server can go without leader contact before being considered failed.
+a server can go without leader contact before being considered failed. This
-  This takes effect only when `cleanup_dead_servers` is set as `true`. Defaults to `24h`.
+takes effect only when `cleanup_dead_servers` is set. When adding new nodes
 to your cluster, the `dead_server_last_contact_threshold` needs to be larger
 than the amount of time that it takes to load a Raft snapshot, otherwise the
 newly added nodes will be removed from your cluster before they have finished
 loading the snapshot and starting up. If you are using an [HSM](/vault/docs/enterprise/hsm), your
 `dead_server_last_contact_threshold` needs to be larger than the response
 time of the HSM.
-  -> **Note:** A failed server that autopilot has removed from the raft configuration cannot rejoin the cluster without being reinitialized.  
+<Warning>
- `max-trailing-logs` `(int)` - Amount of entries in the Raft Log that a server
+  We strongly recommend keeping `dead_server_last_contact_threshold` at a high
-  can be behind before being considered unhealthy. Defaults to `1000`.
+  duration, such as a day, as it being too low could result in removal of nodes
  that aren't actually dead
- `min-quorum` `(int)` - Minimum number of servers that should always be present in a cluster.
+</Warning>
  Autopilot will not prune servers below this number. This should be set to the expected number
  of voters in your cluster. There is no default.
- `server-stabilization-time` `(string)` - Minimum amount of time a server must be in a healthy state before it
+- `max-trailing-logs` `(int: 1000)` - Amount of entries in the Raft Log that a server
  can be behind before being considered unhealthy. If this value is too low,
  it can cause the cluster to lose quorum if a follower falls behind. This
  value only needs to be increased from the default if you have a very high
  write load on Vault and you see that it takes a long time to promote new
  servers to becoming voters. This is an unlikely scenario and most users
  should not modify this value.
 - `min-quorum` `(int)` - The minimum number of servers that should always be
 present in a cluster. Autopilot will not prune servers below this number.
 **There is no default for this value** and it should be set to the expected
 number of voters in your cluster when `cleanup_dead_servers` is set as `true`.
 Use the [quorum size guidance](/vault/docs/internals/integrated-storage#quorum-size-and-failure-tolerance)
 to determine the proper minimum quorum size for your cluster.
 - `server-stabilization-time` `(string: "10s")` - Minimum amount of time a server must be in a healthy state before it
  can become a voter. Until that happens, it will be visible as a peer in the cluster, but as a non-voter, meaning it
-  won't contribute to quorum. Defaults to `10s`.
+  won't contribute to quorum.
- `disable-upgrade-migration` `(bool)` - Controls whether to disable automated
+- `disable-upgrade-migration` `(bool: false)` - Controls whether to disable automated
-  upgrade migrations, an Enterprise-only feature. Defaults to `false`.
+  upgrade migrations, an Enterprise-only feature.
--- a/website/content/docs/concepts/integrated-storage/autopilot.mdx
+++ b/website/content/docs/concepts/integrated-storage/autopilot.mdx
@@ -17,7 +17,7 @@ These two features were introduced in Vault 1.11.
 Server stabilization helps to retain the stability of the Raft cluster by safely
 joining new voting nodes to the cluster. When a new voter node is joined to an
 existing cluster, autopilot adds it as a non-voter instead, and waits for a
-pre-configured amount of time to monitor it's health. If the node remains to be
+pre-configured amount of time to monitor its health. If the node remains
 healthy for the entire duration of stabilization, then that node will be
 promoted as a voter. The server stabilization period can be tuned using
 `server_stabilization_time` (see below).
@@ -31,7 +31,7 @@ and `min_quorum` (see below).
 ## State API
-State API provides detailed information about all the nodes in the Raft cluster
+The [State API](/vault/api-docs/system/storage/raftautopilot#get-cluster-state) provides detailed information about all the nodes in the Raft cluster
 in a single call. This API can be used for monitoring for cluster health.
 ### Follower health
@@ -50,40 +50,7 @@ although dead server cleanup is not enabled by default. Upgrade of
 Raft clusters deployed with older versions of Vault will also transition to use
 Autopilot automatically.
-Autopilot exposes a [configuration
+@include 'autopilot/config.mdx'
 API](/vault/api-docs/system/storage/raftautopilot#set-configuration) to manage its
 behavior. Autopilot gets initialized with the following default values. If these default values do not meet your expected autopilot behavior, don't forget to set them to your desired values.
 - `cleanup_dead_servers` - `false`
  - This controls whether to remove dead servers from
    the Raft peer list periodically or when a new server joins. This requires that
    `min-quorum` is also set.
 - `dead_server_last_contact_threshold` - `24h`
  - Limit on the amount of time
    a server can go without leader contact before being considered failed. This
    takes effect only when `cleanup_dead_servers` is set. **We strongly recommend
    that this is kept at a high duration, such as a day, as it being too low could
    result in removal of nodes that aren't actually dead.**
 - `min_quorum` - This doesn't default to anything and should be set to the expected
  number of voters in your cluster when `cleanup_dead_servers` is set as `true`.
  - Minimum number of servers that should always be present in a cluster.
  Autopilot will not prune servers below this number.
 - `max_trailing_logs` - `1000`
  - Amount of entries in the Raft Log that a server
    can be behind before being considered unhealthy.
 - `last_contact_threshold` - `10s`
  - Limit on the amount of time a server can go without leader contact before being considered unhealthy.
 - `server_stabilization_time` - `10s`
  - Minimum amount of time a server must be in a healthy state before it can become a voter. Until that happens,
    it will be visible as a peer in the cluster, but as a non-voter, meaning it won't contribute to quorum.
 - `disable_upgrade_migration` - `false`
  - Controls whether to disable automated upgrade migrations, an Enterprise-only feature.
 ~> **Note**: Autopilot in Vault does similar things to what autopilot does in
 [Consul](https://www.consul.io/). However, the configuration in these 2 systems
@@ -94,7 +61,7 @@ provide the autopilot functionality.
 ## Automated upgrades
-Automated Upgrades lets you automatically upgrade a cluster of Vault nodes to a new version as
+[Automated Upgrades](/vault/docs/enterprise/automated-upgrades) lets you automatically upgrade a cluster of Vault nodes to a new version as
 updated server nodes join the cluster. Once the number of nodes on the new version is
 equal to or greater than the number of nodes on the old version, Autopilot will promote
 the newer versioned nodes to voters, demote the older versioned nodes to non-voters,
@@ -104,7 +71,7 @@ nodes can be removed from the cluster.
 ## Redundancy zones
-Redundancy Zones provide both scaling and resiliency benefits by deploying non-voting
+[Redundancy Zones](/vault/docs/enterprise/redundancy-zones) provide both scaling and resiliency benefits by deploying non-voting
 nodes alongside voting nodes on a per availability zone basis. When using redundancy zones,
 each zone will have exactly one voting node and as many additional non-voting nodes as desired.
 If the voting node in a zone fails, a non-voting node will be automatically promoted to
--- a/website/content/docs/concepts/integrated-storage/index.mdx
+++ b/website/content/docs/concepts/integrated-storage/index.mdx
@@ -60,6 +60,11 @@ API (both methods described below). When joining a node, the API address of the
 recommend setting the [`api_addr`](/vault/docs/concepts/ha#direct-access) configuration
 option on all nodes to make joining simpler.
 Always join nodes to a cluster one at a time and wait for the node to become
 healthy and (if applicable) a voter before continuing to add more nodes. The
 status of a node can be verified by performing a [`list-peers`](/vault/docs/commands/operator/raft#list-peers)
 command or by checking the [`autopilot state`](/vault/docs/commands/operator/raft#autopilot-state).
 #### `retry_join` configuration
 This method enables setting one, or more, target leader nodes in the config file.
@@ -95,9 +100,10 @@ provided, Vault will use [go-discover](https://github.com/hashicorp/go-discover)
 to automatically attempt to discover and resolve potential Raft leader
 addresses.
-See the go-discover
+Check the go-discover
 [README](https://github.com/hashicorp/go-discover/blob/master/README.md) for
-details on the format of the [`auto_join`](/vault/docs/configuration/storage/raft#auto_join) value.
+details on the format of the [`auto_join`](/vault/docs/configuration/storage/raft#auto_join)
 value per cloud provider.
 ```hcl
 storage "raft" {
@@ -167,6 +173,14 @@ $ vault operator raft remove-peer node1
 Peer removed successfully!
 ```
 #### Re-joining after removal
 If you have used `remove-peer` to remove a node from the Raft cluster, but you
 later want to have this same node re-join the cluster, you will need to delete
 any existing Raft data on the removed node before adding it back to the cluster.
 This will involve stopping the Vault process, deleting the data directory containing
 Raft data, and then restarting the Vault process.
 ### Listing peers
 To see the current peer set for the cluster you can issue a
--- a/website/content/docs/enterprise/redundancy-zones.mdx
+++ b/website/content/docs/enterprise/redundancy-zones.mdx
@@ -36,3 +36,7 @@ wait to begin leadership transfer until it can ensure that there will be as much
 new Vault version as there was on the old Vault version.
 The status of redundancy zones can be monitored by consulting the [Autopilot state API endpoint](/vault/api-docs/system/storage/raftautopilot#get-cluster-state).
 ## Optimistic Failure Tolerance
@include 'autopilot/redundancy-zones.mdx'
--- a/website/content/docs/internals/integrated-storage.mdx
+++ b/website/content/docs/internals/integrated-storage.mdx
@@ -271,6 +271,28 @@ For example, if you start with a 5-node cluster:
 You should always maintain quorum to limit the impact on failure tolerance when
 changing or scaling your Vault instance.
 ### Redundancy Zones
 If you are using autopilot with [redundancy zones](/vault/docs/enterprise/redundancy-zones),
 the total number of servers will be different from the above, and is dependent
 on how many redundancy zones and servers per redundancy zone that you choose.
@include 'autopilot/redundancy-zones.mdx'
 <Highlight title="Best practice">
  If you choose to use redundancy zones, we **strongly recommend** using at least 3
  zones to ensure failure tolerance.
 </Highlight>
 Redundancy zones | Servers per zone | Quorum size | Failure tolerance | Optimistic failure tolerance
 :--------------: | :--------------: | :---------: | :---------------: | :--------------------------:
 2                | 2                | 2           | 0                 | 2
 3                | 2                | 2           | 1                 | 3
 3                | 3                | 2           | 1                 | 5
 5                | 2                | 3           | 2                 | 6
 [consensus protocol]: https://en.wikipedia.org/wiki/Consensus_(computer_science)
 [consistency]: https://en.wikipedia.org/wiki/CAP_theorem
 ["Raft: In search of an Understandable Consensus Algorithm"]: https://raft.github.io/raft.pdf
--- a/website/content/partials/autopilot/config.mdx
+++ b/website/content/partials/autopilot/config.mdx
@@ -0,0 +1,53 @@
 Autopilot exposes a [configuration
 API](/vault/api-docs/system/storage/raftautopilot#set-configuration) to manage its
 behavior. These items cannot be set in Vault server configuration files.
 Autopilot gets initialized with the following default values. If these default
 values do not meet your expected autopilot behavior, don't forget to set them to your desired values.
 - `cleanup_dead_servers` `(bool: false)` - This controls whether to remove dead servers from
 the Raft peer list periodically or when a new server joins. This requires that
 `min-quorum` is also set.
 - `dead_server_last_contact_threshold` `(string: "24h")` - Limit on the amount of time
 a server can go without leader contact before being considered failed. This
 takes effect only when `cleanup_dead_servers` is set. When adding new nodes
 to your cluster, the `dead_server_last_contact_threshold` needs to be larger
 than the amount of time that it takes to load a Raft snapshot, otherwise the
 newly added nodes will be removed from your cluster before they have finished
 loading the snapshot and starting up. If you are using an [HSM](/vault/docs/enterprise/hsm), your
 `dead_server_last_contact_threshold` needs to be larger than the response
 time of the HSM.
 <Warning>
  We strongly recommend keeping `dead_server_last_contact_threshold` at a high
  duration, such as a day, as it being too low could result in removal of nodes
  that aren't actually dead
 </Warning>
 - `min_quorum` `(int)` - The minimum number of servers that should always be
 present in a cluster. Autopilot will not prune servers below this number.
 **There is no default for this value** and it should be set to the expected
 number of voters in your cluster when `cleanup_dead_servers` is set as `true`.
 Use the [quorum size guidance](/vault/docs/internals/integrated-storage#quorum-size-and-failure-tolerance)
 to determine the proper minimum quorum size for your cluster.
 - `max_trailing_logs` `(int: 1000)` - Amount of entries in the Raft Log that a
 server can be behind before being considered unhealthy. If this value is too low,
 it can cause the cluster to lose quorum if a follower falls behind. This
 value only needs to be increased from the default if you have a very high
 write load on Vault and you see that it takes a long time to promote new
 servers to becoming voters. This is an unlikely scenario and most users
 should not modify this value.
 - `last_contact_threshold` `(string "10s")` - Limit on the amount of time a
 server can go without leader contact before being considered unhealthy.
 - `server_stabilization_time` `(string "10s")` - Minimum amount of time a server
 must be in a healthy state before it can become a voter. Until that happens,
 it will be visible as a peer in the cluster, but as a non-voter, meaning it
 won't contribute to quorum.
 - `disable_upgrade_migration` `(bool: false)` - Disables automatically upgrading
 Vault using autopilot (Enterprise-only)
--- a/website/content/partials/autopilot/node-types.mdx
+++ b/website/content/partials/autopilot/node-types.mdx
@@ -0,0 +1,6 @@
 #### Enterprise Node Types
 - `voter`: The server is a Raft voter and contributing to quorum.
 - `read-replica`: The server is not a Raft voter, but receives a replica of all data.
 - `zone-voter`: The main Raft voter in a redundancy zone.
 - `zone-extra-voter`: An additional Raft voter in a redundancy zone.
 - `zone-standby`: A non-voter in a redundancy zone that can be promoted to a voter, if needed.
--- a/website/content/partials/autopilot/redundancy-zones.mdx
+++ b/website/content/partials/autopilot/redundancy-zones.mdx
@@ -0,0 +1,25 @@
 The majority of the voting servers in a cluster need to be available to agree on
 changes in configuration. If a voting node becomes unavailable and that causes
 the cluster to have fewer voting nodes than the quorum size, then Autopilot will not
 be able to promote a non-voter to become a voter. This is the **failure tolerance** of
 the cluster. Redundancy zones are not able to improve the failure tolerance of a
 cluster.
 Say that you have a cluster configured to have 2 redundancy zones and each zone
 has 2 servers within it (for total of 4 nodes in the cluster). The quorum size
 is 2. If the zone voter in either of the redundancy zones becomes unavailable,
 the cluster does not have quorum and is not able to agree on the configuration
 change needed to promote the non-voter in the zone into a voter.
 Redundancy zones do improve the **optimistic failure tolerance** of a cluster.
 The optimistic failure tolerance is the number of healthy active and back-up
 voting servers that can fail gradually without causing an outage. If the Vault
 cluster is able to maintain a quorum of voting nodes, then the cluster has the
 capability to lose nodes gradually and promote the standby redundancy zone nodes
 to take the place of voters.
 For example, consider a cluster that is configured to have 3 redundancy zones
 with 2 nodes in each zone. If a voting node becomes unreachable, the zone standby
 in that zone is promoted. The cluster then maintains 3 voting nodes with 2 remaining
 standbys. The cluster can handle an additional 2 gradual failures before it loses
 quorum.