Attempt to improve explanation of the current metric so Operators don't think things are failing when they aren't (#27955)

This commit is contained in:
Peter Wilson
2024-08-02 20:21:03 +01:00
committed by GitHub
parent 2dbb3d4dcb
commit 6b9261e1db
2 changed files with 14 additions and 10 deletions

View File

@@ -1,15 +1,17 @@
### vault.audit.log_request_failure ((#vault-audit-log_request_failure))
| Metric type | Value | Description |
|-------------|--------|---------------------------------------------------------|
| counter | number | Number of audit log request failures across all devices |
|-------------|--------|-------------------------------------------------------------------------------------------|
| gauge | number | Average (mean) number of audit log request failures across all devices during time period |
The number of request failures is a **crucial metric**.
A non-zero value for `vault.audit.log_request_failure` indicates that all your
configured audit devices failed to log a request (or response). If Vault cannot
A non-zero value for `vault.audit.log_request_failure` indicates that all
the configured audit devices failed to log a request (or response). If Vault cannot
properly audit a request, or the response to a request, the original request
will fail.
The `mean` value for this metric should be monitored, not the `count` which could be misleading.
Refer to the Vault logs and any device-specific metrics to troubleshoot the
failing audit log device.

View File

@@ -1,15 +1,17 @@
### vault.audit.log_response_failure ((#vault-audit-log_response_failure))
| Metric type | Value | Description |
|-------------|--------|---------------------------------------------------------|
| counter | number | Number of audit log response failures across all devices |
|-------------|--------|--------------------------------------------------------------------------------------------|
| gauge | number | Average (mean) number of audit log response failures across all devices during time period |
The number of request failures is a **crucial metric**.
A non-zero value for `vault.audit.log_response_failure` indicates that all of
the configured audit log devices failed to log a response to a request to Vault. If Vault cannot
A non-zero value for `vault.audit.log_response_failure` indicates that all
the configured audit log devices failed to log a response to a request. If Vault cannot
properly audit a request, or the response to a request, the original request
will fail.
The `mean` value for this metric should be monitored, not the `count` which could be misleading.
Refer to the device-specific metrics and logs to troubleshoot the failing audit
log device.