VAULT-28478: Updates to autopilot docs (#28331)

* restructure

* update command

* fixes

* fix command flags

* revert makefile change

* remove tick log
This commit is contained in:
miagilepner
2024-09-17 10:53:18 +02:00
committed by GitHub
parent c140470639
commit d00715d129
9 changed files with 289 additions and 149 deletions

View File

@@ -35,54 +35,69 @@ $ curl \
```json
{
"healthy": true,
"failure_tolerance": 1,
"healthy": true,
"leader": "vault_1",
"servers": {
"raft1": {
"id": "raft1",
"name": "raft1",
"vault_1": {
"address": "127.0.0.1:8201",
"node_status": "alive",
"healthy": true,
"id": "vault_1",
"last_contact": "0s",
"last_index": 63,
"last_term": 3,
"last_index": 459,
"healthy": true,
"stable_since": "2021-03-19T20:14:11.831678-04:00",
"name": "vault_1",
"node_status": "alive",
"node_type": "voter",
"stable_since": "2024-08-29T16:02:45.639829+02:00",
"status": "leader",
"meta": null
"version": "1.17.3"
},
"raft2": {
"id": "raft2",
"name": "raft2",
"address": "127.0.0.2:8201",
"node_status": "alive",
"last_contact": "516.49595ms",
"last_term": 3,
"last_index": 459,
"vault_2": {
"address": "127.0.0.1:8203",
"healthy": true,
"stable_since": "2021-03-19T20:14:19.831931-04:00",
"id": "vault_2",
"last_contact": "678.62575ms",
"last_index": 63,
"last_term": 3,
"name": "vault_2",
"node_status": "alive",
"node_type": "voter",
"stable_since": "2024-08-29T16:02:47.640976+02:00",
"status": "voter",
"meta": null
"version": "1.17.3"
},
"raft3": {
"id": "raft3",
"name": "raft3",
"address": "127.0.0.3:8201",
"node_status": "alive",
"last_contact": "196.706591ms",
"last_term": 3,
"last_index": 459,
"vault_3": {
"address": "127.0.0.1:8205",
"healthy": true,
"stable_since": "2021-03-19T20:14:25.83565-04:00",
"id": "vault_3",
"last_contact": "3.969159375s",
"last_index": 63,
"last_term": 3,
"name": "vault_3",
"node_status": "alive",
"node_type": "voter",
"stable_since": "2024-08-29T16:02:49.640905+02:00",
"status": "voter",
"meta": null
"version": "1.17.3"
}
},
"leader": "raft1",
"voters": ["raft1", "raft2", "raft3"],
"non_voters": null
"voters": [
"vault_1",
"vault_2",
"vault_3"
]
}
```
The `failure_tolerance` of a cluster is the number of nodes in the cluster that could
fail gradually without causing an outage.
When verifying the health of your cluster, check the following fields of each server:
- `healthy`: whether Autopilot considers this node healthy or not
- `status`: the voting status of the node. This will be `voter`, `leader`, or [`non-voter`](/vault/docs/concepts/integrated-storage#non-voting-nodes-enterprise-only)")
- `last_index`: the index of the last applied Raft log. This should be close to the `last_index` value of the leader.
- `version`: the version of Vault running on the server
- `node_type`: the type of node. On CE, this will always be `voter`. See below for an explanation of Enterprise node types.
### Enterprise only
Vault Enterprise will include additional output in its API response to indicate the current state of redundancy zones,
@@ -149,7 +164,7 @@ automated upgrade progress (if any), and optimistic failure tolerance.
}
},
"status": "await-new-voters",
"target_version": "1.12.0",
"target_version": "1.17.5",
"target_version_non_voters": [
"vault_5"
]
@@ -161,6 +176,11 @@ automated upgrade progress (if any), and optimistic failure tolerance.
}
```
`optimistic_failure_tolerance` describes the number of healthy active and
back-up voting servers that can fail gradually without causing an outage.
@include 'autopilot/node-types.mdx'
## Get configuration
This endpoint is used to get the configuration of the autopilot subsystem of Integrated Storage.
@@ -203,31 +223,7 @@ This endpoint is used to modify the configuration of the autopilot subsystem of
### Parameters
- `cleanup_dead_servers` `(bool: false)` - Controls whether to remove dead servers from
the Raft peer list periodically or when a new server joins. This requires that
`min_quorum` is also set.
- `last_contact_threshold` `(string: "10s")` - Limit on the amount of time a server can
go without leader contact before being considered unhealthy.
- `dead_server_last_contact_threshold` `(string: "24h")` - Limit on the amount of time
a server can go without leader contact before being considered failed. This
takes effect only when `cleanup_dead_servers` is `true`. This can not be set to a value
smaller than 1m. **We strongly recommend that this is kept at a high duration, such as a day,
as it being too low could result in removal of nodes that aren't actually dead.**
- `max_trailing_logs` `(int: 1000)` - Amount of entries in the Raft Log that a server
can be behind before being considered unhealthy.
- `min_quorum` `(int: 3)` - Minimum number of servers allowed in a cluster before
autopilot can prune dead servers. This should at least be 3. Applicable only for
voting nodes.
- `server_stabilization_time` `(string: "10s")` - Minimum amount of time a server must
be in a stable, healthy state before it can be added to the cluster.
- `disable_upgrade_migration` `(bool: false)` - Disables automatically upgrading Vault using
autopilot. (Enterprise-only)
@include 'autopilot/config.mdx'
### Sample request

View File

@@ -128,6 +128,13 @@ Usage: vault operator raft list-peers
}
```
Use the output of `list-peers` to ensure that your cluster is in an expected state.
If you've removed a server using `remove-peer`, the server should no longer be
listed in the `list-peers` output. If you've added a server using `add-peer` or
through `retry_join`, check the `list-peers` output to see that it has been added
to the cluster and (if the node has not been added as a non-voter)
it has been promoted to a voter.
## remove-peer
This command is used to remove a node from being a peer to the Raft cluster. In
@@ -229,14 +236,9 @@ Subcommands:
### autopilot state
Displays the state of the raft cluster under integrated storage as seen by
autopilot. It shows whether autopilot thinks the cluster is healthy or not,
and how many nodes could fail before the cluster becomes unhealthy ("Failure Tolerance").
autopilot. It shows whether autopilot thinks the cluster is healthy or not.
State includes a list of all servers by nodeID and IP address. Last Index
indicates how close the state on each node is to the leader's.
A node can have a status of "leader", "voter", and
"[non-voter](/vault/docs/concepts/integrated-storage#non-voting-nodes-enterprise-only)".
State includes a list of all servers by nodeID and IP address.
```text
Usage: vault operator raft autopilot state
@@ -249,34 +251,60 @@ Usage: vault operator raft autopilot state
#### Example output
```text
Healthy: true
Failure Tolerance: 1
Leader: raft1
Healthy: true
Failure Tolerance: 1
Leader: vault_1
Voters:
raft1
raft2
raft3
vault_1
vault_2
vault_3
Servers:
raft1
Name: raft1
Address: 127.0.0.1:8201
Status: leader
Node Status: alive
Healthy: true
Last Contact: 0s
Last Term: 3
Last Index: 38
raft2
Name: raft2
Address: 127.0.0.2:8201
Status: voter
Node Status: alive
Healthy: true
Last Contact: 2.514176729s
Last Term: 3
Last Index: 38
vault_1
Name: vault_1
Address: 127.0.0.1:8201
Status: leader
Node Status: alive
Healthy: true
Last Contact: 0s
Last Term: 3
Last Index: 61
Version: 1.17.3
Node Type: voter
vault_2
Name: vault_2
Address: 127.0.0.1:8203
Status: voter
Node Status: alive
Healthy: true
Last Contact: 564.765375ms
Last Term: 3
Last Index: 61
Version: 1.17.3
Node Type: voter
vault_3
Name: vault_3
Address: 127.0.0.1:8205
Status: voter
Node Status: alive
Healthy: true
Last Contact: 3.814017875s
Last Term: 3
Last Index: 61
Version: 1.17.3
Node Type: voter
```
Vault Enterprise will include additional output related to automated upgrades and redundancy zones.
The "Failure Tolerance" of a cluster is the number of nodes in the cluster that could
fail gradually without causing an outage.
When verifying the health of your cluster, check the following fields of each server:
- Healthy: whether Autopilot considers this node healthy or not
- Status: the voting status of the node. This will be `voter`, `leader`, or [`non-voter`](/vault/docs/concepts/integrated-storage#non-voting-nodes-enterprise-only).
- Last Index: the index of the last applied Raft log. This should be close to the "Last Index" value of the leader.
- Version: the version of Vault running on the server
- Node Type: the type of node. On CE, this will always be `voter`. See below for an explanation of Enterprise node types.
Vault Enterprise will include additional output related to automated upgrades, optimistic failure tolerance, and redundancy zones.
#### Example Vault enterprise output
@@ -292,7 +320,7 @@ Redundancy Zones:
Failure Tolerance: 1
Upgrade Info:
Status: await-new-voters
Target Version: 1.12.0
Target Version: 1.17.5
Target Version Voters:
Target Version Non-Voters: vault_5
Other Version Voters: vault_1, vault_3
@@ -310,6 +338,11 @@ Upgrade Info:
Other Version Non-Voters: vault_4
```
"Optimistic Failure Tolerance" describes the number of healthy active and
back-up voting servers that can fail gradually without causing an outage.
@include 'autopilot/node-types.mdx'
### autopilot get-config
Returns the configuration of the autopilot subsystem under integrated storage.
@@ -337,29 +370,49 @@ Usage: vault operator raft autopilot set-config [options]
Flags applicable to this command are the following:
- `cleanup-dead-servers` `(bool)` - Controls whether to remove dead servers from
- `cleanup-dead-servers` `(bool: false)` - Controls whether to remove dead servers from
the Raft peer list periodically or when a new server joins. This requires that
`min-quorum` is also set. Defaults to `false`.
`min-quorum` is also set.
- `last-contact-threshold` `(string)` - Limit on the amount of time a server can
go without leader contact before being considered unhealthy. Defaults to `10s`.
- `last-contact-threshold` `(string: "10s")` - Limit on the amount of time a server can
go without leader contact before being considered unhealthy.
- `dead-server-last-contact-threshold` `(string)` - Limit on the amount of time
a server can go without leader contact before being considered failed.
This takes effect only when `cleanup_dead_servers` is set as `true`. Defaults to `24h`.
-> **Note:** A failed server that autopilot has removed from the raft configuration cannot rejoin the cluster without being reinitialized.
- `dead-server-last-contact-threshold` `(string: "24h")` - Limit on the amount of time
a server can go without leader contact before being considered failed. This
takes effect only when `cleanup_dead_servers` is set. When adding new nodes
to your cluster, the `dead_server_last_contact_threshold` needs to be larger
than the amount of time that it takes to load a Raft snapshot, otherwise the
newly added nodes will be removed from your cluster before they have finished
loading the snapshot and starting up. If you are using an [HSM](/vault/docs/enterprise/hsm), your
`dead_server_last_contact_threshold` needs to be larger than the response
time of the HSM.
- `max-trailing-logs` `(int)` - Amount of entries in the Raft Log that a server
can be behind before being considered unhealthy. Defaults to `1000`.
<Warning>
- `min-quorum` `(int)` - Minimum number of servers that should always be present in a cluster.
Autopilot will not prune servers below this number. This should be set to the expected number
of voters in your cluster. There is no default.
We strongly recommend keeping `dead_server_last_contact_threshold` at a high
duration, such as a day, as it being too low could result in removal of nodes
that aren't actually dead
- `server-stabilization-time` `(string)` - Minimum amount of time a server must be in a healthy state before it
</Warning>
- `max-trailing-logs` `(int: 1000)` - Amount of entries in the Raft Log that a server
can be behind before being considered unhealthy. If this value is too low,
it can cause the cluster to lose quorum if a follower falls behind. This
value only needs to be increased from the default if you have a very high
write load on Vault and you see that it takes a long time to promote new
servers to becoming voters. This is an unlikely scenario and most users
should not modify this value.
- `min-quorum` `(int)` - The minimum number of servers that should always be
present in a cluster. Autopilot will not prune servers below this number.
**There is no default for this value** and it should be set to the expected
number of voters in your cluster when `cleanup_dead_servers` is set as `true`.
Use the [quorum size guidance](/vault/docs/internals/integrated-storage#quorum-size-and-failure-tolerance)
to determine the proper minimum quorum size for your cluster.
- `server-stabilization-time` `(string: "10s")` - Minimum amount of time a server must be in a healthy state before it
can become a voter. Until that happens, it will be visible as a peer in the cluster, but as a non-voter, meaning it
won't contribute to quorum. Defaults to `10s`.
won't contribute to quorum.
- `disable-upgrade-migration` `(bool)` - Controls whether to disable automated
upgrade migrations, an Enterprise-only feature. Defaults to `false`.
- `disable-upgrade-migration` `(bool: false)` - Controls whether to disable automated
upgrade migrations, an Enterprise-only feature.

View File

@@ -17,7 +17,7 @@ These two features were introduced in Vault 1.11.
Server stabilization helps to retain the stability of the Raft cluster by safely
joining new voting nodes to the cluster. When a new voter node is joined to an
existing cluster, autopilot adds it as a non-voter instead, and waits for a
pre-configured amount of time to monitor it's health. If the node remains to be
pre-configured amount of time to monitor its health. If the node remains
healthy for the entire duration of stabilization, then that node will be
promoted as a voter. The server stabilization period can be tuned using
`server_stabilization_time` (see below).
@@ -31,7 +31,7 @@ and `min_quorum` (see below).
## State API
State API provides detailed information about all the nodes in the Raft cluster
The [State API](/vault/api-docs/system/storage/raftautopilot#get-cluster-state) provides detailed information about all the nodes in the Raft cluster
in a single call. This API can be used for monitoring for cluster health.
### Follower health
@@ -50,40 +50,7 @@ although dead server cleanup is not enabled by default. Upgrade of
Raft clusters deployed with older versions of Vault will also transition to use
Autopilot automatically.
Autopilot exposes a [configuration
API](/vault/api-docs/system/storage/raftautopilot#set-configuration) to manage its
behavior. Autopilot gets initialized with the following default values. If these default values do not meet your expected autopilot behavior, don't forget to set them to your desired values.
- `cleanup_dead_servers` - `false`
- This controls whether to remove dead servers from
the Raft peer list periodically or when a new server joins. This requires that
`min-quorum` is also set.
- `dead_server_last_contact_threshold` - `24h`
- Limit on the amount of time
a server can go without leader contact before being considered failed. This
takes effect only when `cleanup_dead_servers` is set. **We strongly recommend
that this is kept at a high duration, such as a day, as it being too low could
result in removal of nodes that aren't actually dead.**
- `min_quorum` - This doesn't default to anything and should be set to the expected
number of voters in your cluster when `cleanup_dead_servers` is set as `true`.
- Minimum number of servers that should always be present in a cluster.
Autopilot will not prune servers below this number.
- `max_trailing_logs` - `1000`
- Amount of entries in the Raft Log that a server
can be behind before being considered unhealthy.
- `last_contact_threshold` - `10s`
- Limit on the amount of time a server can go without leader contact before being considered unhealthy.
- `server_stabilization_time` - `10s`
- Minimum amount of time a server must be in a healthy state before it can become a voter. Until that happens,
it will be visible as a peer in the cluster, but as a non-voter, meaning it won't contribute to quorum.
- `disable_upgrade_migration` - `false`
- Controls whether to disable automated upgrade migrations, an Enterprise-only feature.
@include 'autopilot/config.mdx'
~> **Note**: Autopilot in Vault does similar things to what autopilot does in
[Consul](https://www.consul.io/). However, the configuration in these 2 systems
@@ -94,7 +61,7 @@ provide the autopilot functionality.
## Automated upgrades
Automated Upgrades lets you automatically upgrade a cluster of Vault nodes to a new version as
[Automated Upgrades](/vault/docs/enterprise/automated-upgrades) lets you automatically upgrade a cluster of Vault nodes to a new version as
updated server nodes join the cluster. Once the number of nodes on the new version is
equal to or greater than the number of nodes on the old version, Autopilot will promote
the newer versioned nodes to voters, demote the older versioned nodes to non-voters,
@@ -104,7 +71,7 @@ nodes can be removed from the cluster.
## Redundancy zones
Redundancy Zones provide both scaling and resiliency benefits by deploying non-voting
[Redundancy Zones](/vault/docs/enterprise/redundancy-zones) provide both scaling and resiliency benefits by deploying non-voting
nodes alongside voting nodes on a per availability zone basis. When using redundancy zones,
each zone will have exactly one voting node and as many additional non-voting nodes as desired.
If the voting node in a zone fails, a non-voting node will be automatically promoted to

View File

@@ -60,6 +60,11 @@ API (both methods described below). When joining a node, the API address of the
recommend setting the [`api_addr`](/vault/docs/concepts/ha#direct-access) configuration
option on all nodes to make joining simpler.
Always join nodes to a cluster one at a time and wait for the node to become
healthy and (if applicable) a voter before continuing to add more nodes. The
status of a node can be verified by performing a [`list-peers`](/vault/docs/commands/operator/raft#list-peers)
command or by checking the [`autopilot state`](/vault/docs/commands/operator/raft#autopilot-state).
#### `retry_join` configuration
This method enables setting one, or more, target leader nodes in the config file.
@@ -95,9 +100,10 @@ provided, Vault will use [go-discover](https://github.com/hashicorp/go-discover)
to automatically attempt to discover and resolve potential Raft leader
addresses.
See the go-discover
Check the go-discover
[README](https://github.com/hashicorp/go-discover/blob/master/README.md) for
details on the format of the [`auto_join`](/vault/docs/configuration/storage/raft#auto_join) value.
details on the format of the [`auto_join`](/vault/docs/configuration/storage/raft#auto_join)
value per cloud provider.
```hcl
storage "raft" {
@@ -167,6 +173,14 @@ $ vault operator raft remove-peer node1
Peer removed successfully!
```
#### Re-joining after removal
If you have used `remove-peer` to remove a node from the Raft cluster, but you
later want to have this same node re-join the cluster, you will need to delete
any existing Raft data on the removed node before adding it back to the cluster.
This will involve stopping the Vault process, deleting the data directory containing
Raft data, and then restarting the Vault process.
### Listing peers
To see the current peer set for the cluster you can issue a

View File

@@ -36,3 +36,7 @@ wait to begin leadership transfer until it can ensure that there will be as much
new Vault version as there was on the old Vault version.
The status of redundancy zones can be monitored by consulting the [Autopilot state API endpoint](/vault/api-docs/system/storage/raftautopilot#get-cluster-state).
## Optimistic Failure Tolerance
@include 'autopilot/redundancy-zones.mdx'

View File

@@ -271,6 +271,28 @@ For example, if you start with a 5-node cluster:
You should always maintain quorum to limit the impact on failure tolerance when
changing or scaling your Vault instance.
### Redundancy Zones
If you are using autopilot with [redundancy zones](/vault/docs/enterprise/redundancy-zones),
the total number of servers will be different from the above, and is dependent
on how many redundancy zones and servers per redundancy zone that you choose.
@include 'autopilot/redundancy-zones.mdx'
<Highlight title="Best practice">
If you choose to use redundancy zones, we **strongly recommend** using at least 3
zones to ensure failure tolerance.
</Highlight>
Redundancy zones | Servers per zone | Quorum size | Failure tolerance | Optimistic failure tolerance
:--------------: | :--------------: | :---------: | :---------------: | :--------------------------:
2 | 2 | 2 | 0 | 2
3 | 2 | 2 | 1 | 3
3 | 3 | 2 | 1 | 5
5 | 2 | 3 | 2 | 6
[consensus protocol]: https://en.wikipedia.org/wiki/Consensus_(computer_science)
[consistency]: https://en.wikipedia.org/wiki/CAP_theorem
["Raft: In search of an Understandable Consensus Algorithm"]: https://raft.github.io/raft.pdf

View File

@@ -0,0 +1,53 @@
Autopilot exposes a [configuration
API](/vault/api-docs/system/storage/raftautopilot#set-configuration) to manage its
behavior. These items cannot be set in Vault server configuration files.
Autopilot gets initialized with the following default values. If these default
values do not meet your expected autopilot behavior, don't forget to set them to your desired values.
- `cleanup_dead_servers` `(bool: false)` - This controls whether to remove dead servers from
the Raft peer list periodically or when a new server joins. This requires that
`min-quorum` is also set.
- `dead_server_last_contact_threshold` `(string: "24h")` - Limit on the amount of time
a server can go without leader contact before being considered failed. This
takes effect only when `cleanup_dead_servers` is set. When adding new nodes
to your cluster, the `dead_server_last_contact_threshold` needs to be larger
than the amount of time that it takes to load a Raft snapshot, otherwise the
newly added nodes will be removed from your cluster before they have finished
loading the snapshot and starting up. If you are using an [HSM](/vault/docs/enterprise/hsm), your
`dead_server_last_contact_threshold` needs to be larger than the response
time of the HSM.
<Warning>
We strongly recommend keeping `dead_server_last_contact_threshold` at a high
duration, such as a day, as it being too low could result in removal of nodes
that aren't actually dead
</Warning>
- `min_quorum` `(int)` - The minimum number of servers that should always be
present in a cluster. Autopilot will not prune servers below this number.
**There is no default for this value** and it should be set to the expected
number of voters in your cluster when `cleanup_dead_servers` is set as `true`.
Use the [quorum size guidance](/vault/docs/internals/integrated-storage#quorum-size-and-failure-tolerance)
to determine the proper minimum quorum size for your cluster.
- `max_trailing_logs` `(int: 1000)` - Amount of entries in the Raft Log that a
server can be behind before being considered unhealthy. If this value is too low,
it can cause the cluster to lose quorum if a follower falls behind. This
value only needs to be increased from the default if you have a very high
write load on Vault and you see that it takes a long time to promote new
servers to becoming voters. This is an unlikely scenario and most users
should not modify this value.
- `last_contact_threshold` `(string "10s")` - Limit on the amount of time a
server can go without leader contact before being considered unhealthy.
- `server_stabilization_time` `(string "10s")` - Minimum amount of time a server
must be in a healthy state before it can become a voter. Until that happens,
it will be visible as a peer in the cluster, but as a non-voter, meaning it
won't contribute to quorum.
- `disable_upgrade_migration` `(bool: false)` - Disables automatically upgrading
Vault using autopilot (Enterprise-only)

View File

@@ -0,0 +1,6 @@
#### Enterprise Node Types
- `voter`: The server is a Raft voter and contributing to quorum.
- `read-replica`: The server is not a Raft voter, but receives a replica of all data.
- `zone-voter`: The main Raft voter in a redundancy zone.
- `zone-extra-voter`: An additional Raft voter in a redundancy zone.
- `zone-standby`: A non-voter in a redundancy zone that can be promoted to a voter, if needed.

View File

@@ -0,0 +1,25 @@
The majority of the voting servers in a cluster need to be available to agree on
changes in configuration. If a voting node becomes unavailable and that causes
the cluster to have fewer voting nodes than the quorum size, then Autopilot will not
be able to promote a non-voter to become a voter. This is the **failure tolerance** of
the cluster. Redundancy zones are not able to improve the failure tolerance of a
cluster.
Say that you have a cluster configured to have 2 redundancy zones and each zone
has 2 servers within it (for total of 4 nodes in the cluster). The quorum size
is 2. If the zone voter in either of the redundancy zones becomes unavailable,
the cluster does not have quorum and is not able to agree on the configuration
change needed to promote the non-voter in the zone into a voter.
Redundancy zones do improve the **optimistic failure tolerance** of a cluster.
The optimistic failure tolerance is the number of healthy active and back-up
voting servers that can fail gradually without causing an outage. If the Vault
cluster is able to maintain a quorum of voting nodes, then the cluster has the
capability to lose nodes gradually and promote the standby redundancy zone nodes
to take the place of voters.
For example, consider a cluster that is configured to have 3 redundancy zones
with 2 nodes in each zone. If a voting node becomes unreachable, the zone standby
in that zone is promoted. The cluster then maintains 3 voting nodes with 2 remaining
standbys. The cluster can handle an additional 2 gradual failures before it loses
quorum.