mirror of
https://github.com/optim-enterprises-bv/vault.git
synced 2025-11-01 11:08:10 +00:00
backport of commit 774d75e63e (#21294)
Co-authored-by: Sarah Chavis <62406755+schavis@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
5cefd4b0f7
commit
c51ec68826
@@ -1,64 +1,64 @@
|
|||||||
### API calls to update-primary may lead to data loss ((#update-primary-data-loss))
|
### API calls to update-primary may lead to data loss ((#update-primary-data-loss))
|
||||||
|
|
||||||
#### Affected versions
|
#### Affected versions
|
||||||
|
|
||||||
- All current versions of Vault
|
- All current versions of Vault
|
||||||
|
|
||||||
<Tip title="We are actively working on the underlying issue">
|
<Tip title="We are actively working on the underlying issue">
|
||||||
|
|
||||||
Look for **Fix a race condition with update-primary that could result in data
|
Look for **Fix a race condition with update-primary that could result in data
|
||||||
loss after a DR failover.** in a future changelog for the resolution.
|
loss after a DR failover.** in a future changelog for the resolution.
|
||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
#### Issue
|
#### Issue
|
||||||
|
|
||||||
The [update-primary](/vault/api-docs/system/replication/replication-performance#update-performance-secondary-s-primary)
|
The [update-primary](/vault/api-docs/system/replication/replication-performance#update-performance-secondary-s-primary)
|
||||||
endpoint temporarily removes all mount entries except for those that are managed
|
endpoint temporarily removes all mount entries except for those that are managed
|
||||||
automatically by vault (e.g. identity mounts). In certain situations, a race
|
automatically by vault (e.g. identity mounts). In certain situations, a race
|
||||||
condition between mount table truncation replication repairs may lead to data
|
condition between mount table truncation replication repairs may lead to data
|
||||||
loss when updating secondary replication clusters.
|
loss when updating secondary replication clusters.
|
||||||
|
|
||||||
Situations where the race condition may occur:
|
Situations where the race condition may occur:
|
||||||
|
|
||||||
- **When the cluster has local data (e.g., PKI certificates, app role secret IDs)
|
- **When the cluster has local data (e.g., PKI certificates, app role secret IDs)
|
||||||
in shared mounts**.
|
in shared mounts**.
|
||||||
Calling `update-primary` on a performance secondary with local data in shared
|
Calling `update-primary` on a performance secondary with local data in shared
|
||||||
mounts may corrupt the merkle tree on the secondary. The secondary still
|
mounts may corrupt the merkle tree on the secondary. The secondary still
|
||||||
contains all the previously stored data, but the corruption means that
|
contains all the previously stored data, but the corruption means that
|
||||||
downstream secondaries will not receive the shared data and will interpret the
|
downstream secondaries will not receive the shared data and will interpret the
|
||||||
update as a request to delete the information. If the downstream secondary is
|
update as a request to delete the information. If the downstream secondary is
|
||||||
promoted before the merkle tree is repaired, the newly promoted secondary will
|
promoted before the merkle tree is repaired, the newly promoted secondary will
|
||||||
not contain the expected local data. The missing data may be unrecoverable if
|
not contain the expected local data. The missing data may be unrecoverable if
|
||||||
the original secondary is is lost or destroyed.
|
the original secondary is is lost or destroyed.
|
||||||
- **When the cluster has an `Allow` paths defined.**
|
- **When the cluster has an `Allow` paths defined.**
|
||||||
As of Vault 1.0.3.1, startup, unseal, and calling `update-primary` all trigger a
|
As of Vault 1.0.3.1, startup, unseal, and calling `update-primary` all trigger a
|
||||||
background job that looks at the current mount data and removes invalid entries
|
background job that looks at the current mount data and removes invalid entries
|
||||||
based on path filters. When a secondary has `Allow` path filters, the cleanup
|
based on path filters. When a secondary has `Allow` path filters, the cleanup
|
||||||
code may misfire in the windown of time after update-primary truncats the mount
|
code may misfire in the windown of time after update-primary truncats the mount
|
||||||
tables but before the mount tables are rewritten by replication. The cleanup
|
tables but before the mount tables are rewritten by replication. The cleanup
|
||||||
code deletes data associated with the missing mount entries but does not modify
|
code deletes data associated with the missing mount entries but does not modify
|
||||||
the merkle tree. Because the merkle tree remains unchanged, replication will not
|
the merkle tree. Because the merkle tree remains unchanged, replication will not
|
||||||
know that the data is missing and needs to be repaired.
|
know that the data is missing and needs to be repaired.
|
||||||
|
|
||||||
#### Workaround 1: PR secondary with local data in shared mounts
|
#### Workaround 1: PR secondary with local data in shared mounts
|
||||||
|
|
||||||
Watch for `cleaning key in merkle tree` in the TRACE log immediately after an
|
Watch for `cleaning key in merkle tree` in the TRACE log immediately after an
|
||||||
update-primary call on a PR secondary to indicate the merkle tree may be
|
update-primary call on a PR secondary to indicate the merkle tree may be
|
||||||
corrupt. Repair the merkle tree by issuing a
|
corrupt. Repair the merkle tree by issuing a
|
||||||
[replication reindex request](/vault/api-docs/system/replication#reindex-replication)
|
[replication reindex request](/vault/api-docs/system/replication#reindex-replication)
|
||||||
to the PR secondary.
|
to the PR secondary.
|
||||||
|
|
||||||
If TRACE logs are no longer available, we recommend pre-emptively reindexing the
|
If TRACE logs are no longer available, we recommend pre-emptively reindexing the
|
||||||
PR secondary as a precaution.
|
PR secondary as a precaution.
|
||||||
|
|
||||||
#### Workaround 2: PR secondary with "Allow" path filters
|
#### Workaround 2: PR secondary with "Allow" path filters
|
||||||
|
|
||||||
Watch for `deleted mistakenly stored mount entry from backend` in the INFO log.
|
Watch for `deleted mistakenly stored mount entry from backend` in the INFO log.
|
||||||
Reindex the performance secondary to update the merkle tree with the missing
|
Reindex the performance secondary to update the merkle tree with the missing
|
||||||
data and allow replication to disseminate the changes. **You will not be able to
|
data and allow replication to disseminate the changes. **You will not be able to
|
||||||
recover local data on shared mounts (e.g., PKI certificates)**.
|
recover local data on shared mounts (e.g., PKI certificates)**.
|
||||||
|
|
||||||
If INFO logs are no longer available, query the shared mount in question to
|
If INFO logs are no longer available, query the shared mount in question to
|
||||||
confirm whether your role and configuration data are present on the primary but
|
confirm whether your role and configuration data are present on the primary but
|
||||||
missing from the secondary.
|
missing from the secondary.
|
||||||
Reference in New Issue
Block a user