Files
vault/builtin/logical/pki/path_tidy.go
Alexander Scheel 6e6914547e Let PKI tidy associate revoked certs with their issuers (#16871)
* Refactor tidy steps into two separate helpers

This refactors the tidy go routine into two separate helpers, making it
clear where the boundaries of each are: variables are passed into these
method and concerns are separated. As more operations are rolled into
tidy, we can continue adding more helpers as appropriate. Additionally,
as we move to make auto-tidy occur, we can use these as points to hook
into periodic tidying.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Refactor revInfo checking to helper

This allows us to validate whether or not a revInfo entry contains a
presently valid issuer, from the existing mapping. Coupled with the
changeset to identify the issuer on revocation, we can begin adding
capabilities to tidy to update this association, decreasing CRL build
time and increasing the performance of OCSP.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Refactor issuer fetching for revocation purposes

Revocation needs to gracefully handle using the old legacy cert bundle,
so fetching issuers (and parsing them) needs to be done slightly
differently than other places. Refactor this from revokeCert into a
common helper that can be used by tidy.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Allow tidy to associate revoked certs, issuers

When revoking a certificate, we need to associate the issuer that signed
its certificate back to the revInfo entry. Historically this was
performed during CRL building (and still remains so), but when running
without CRL building and with only OCSP, performance will degrade as the
issuer needs to be found each time.

Instead, allow the tidy operation to take over this role, allowing us to
increase the performance of OCSP and CRL in this scenario, by decoupling
issuer identification from CRL building in the ideal case.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Add tests for tidy updates

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Add changelog entry

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Add documentation on new tidy parameter, metrics

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

* Refactor tidy config into shared struct

Finish adding metrics, status messages about new tidy operation.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
2022-08-26 10:13:45 -07:00

532 lines
18 KiB
Go

package pki
import (
"context"
"crypto/x509"
"fmt"
"net/http"
"sync/atomic"
"time"
"github.com/armon/go-metrics"
"github.com/hashicorp/go-hclog"
"github.com/hashicorp/vault/sdk/framework"
"github.com/hashicorp/vault/sdk/helper/consts"
"github.com/hashicorp/vault/sdk/logical"
)
type tidyConfig struct {
CertStore bool `json:"tidy_cert_store"`
RevokedCerts bool `json:"tidy_revoked_certs"`
IssuerAssocs bool `json:"tidy_revoked_cert_issuer_associations"`
SafetyBuffer time.Duration `json:"safety_buffer"`
}
func pathTidy(b *backend) *framework.Path {
return &framework.Path{
Pattern: "tidy$",
Fields: map[string]*framework.FieldSchema{
"tidy_cert_store": {
Type: framework.TypeBool,
Description: `Set to true to enable tidying up
the certificate store`,
},
"tidy_revocation_list": {
Type: framework.TypeBool,
Description: `Deprecated; synonym for 'tidy_revoked_certs`,
},
"tidy_revoked_certs": {
Type: framework.TypeBool,
Description: `Set to true to expire all revoked
and expired certificates, removing them both from the CRL and from storage. The
CRL will be rotated if this causes any values to be removed.`,
},
"tidy_revoked_cert_issuer_associations": {
Type: framework.TypeBool,
Description: `Set to true to validate issuer associations
on revocation entries. This helps increase the performance of CRL building
and OCSP responses.`,
},
"safety_buffer": {
Type: framework.TypeDurationSecond,
Description: `The amount of extra time that must have passed
beyond certificate expiration before it is removed
from the backend storage and/or revocation list.
Defaults to 72 hours.`,
Default: 259200, // 72h, but TypeDurationSecond currently requires defaults to be int
},
},
Operations: map[logical.Operation]framework.OperationHandler{
logical.UpdateOperation: &framework.PathOperation{
Callback: b.pathTidyWrite,
ForwardPerformanceStandby: true,
},
},
HelpSynopsis: pathTidyHelpSyn,
HelpDescription: pathTidyHelpDesc,
}
}
func pathTidyStatus(b *backend) *framework.Path {
return &framework.Path{
Pattern: "tidy-status$",
Operations: map[logical.Operation]framework.OperationHandler{
logical.ReadOperation: &framework.PathOperation{
Callback: b.pathTidyStatusRead,
ForwardPerformanceStandby: true,
},
},
HelpSynopsis: pathTidyStatusHelpSyn,
HelpDescription: pathTidyStatusHelpDesc,
}
}
func (b *backend) pathTidyWrite(ctx context.Context, req *logical.Request, d *framework.FieldData) (*logical.Response, error) {
safetyBuffer := d.Get("safety_buffer").(int)
tidyCertStore := d.Get("tidy_cert_store").(bool)
tidyRevokedCerts := d.Get("tidy_revoked_certs").(bool) || d.Get("tidy_revocation_list").(bool)
tidyRevokedAssocs := d.Get("tidy_revoked_cert_issuer_associations").(bool)
if safetyBuffer < 1 {
return logical.ErrorResponse("safety_buffer must be greater than zero"), nil
}
bufferDuration := time.Duration(safetyBuffer) * time.Second
config := &tidyConfig{
CertStore: tidyCertStore,
RevokedCerts: tidyRevokedCerts,
IssuerAssocs: tidyRevokedAssocs,
SafetyBuffer: bufferDuration,
}
if !atomic.CompareAndSwapUint32(b.tidyCASGuard, 0, 1) {
resp := &logical.Response{}
resp.AddWarning("Tidy operation already in progress.")
return resp, nil
}
// Tests using framework will screw up the storage so make a locally
// scoped req to hold a reference
req = &logical.Request{
Storage: req.Storage,
}
b.startTidyOperation(req, config)
resp := &logical.Response{}
if !tidyCertStore && !tidyRevokedCerts && !tidyRevokedAssocs {
resp.AddWarning("No targets to tidy; specify tidy_cert_store=true or tidy_revoked_certs=true or tidy_revoked_cert_issuer_associations=true to start a tidy operation.")
} else {
resp.AddWarning("Tidy operation successfully started. Any information from the operation will be printed to Vault's server logs.")
}
return logical.RespondWithStatusCode(resp, req, http.StatusAccepted)
}
func (b *backend) startTidyOperation(req *logical.Request, config *tidyConfig) {
go func() {
defer atomic.StoreUint32(b.tidyCASGuard, 0)
b.tidyStatusStart(config)
// Don't cancel when the original client request goes away.
ctx := context.Background()
logger := b.Logger().Named("tidy")
doTidy := func() error {
if config.CertStore {
if err := b.doTidyCertStore(ctx, req, logger, config); err != nil {
return err
}
}
if config.RevokedCerts || config.IssuerAssocs {
if err := b.doTidyRevocationStore(ctx, req, logger, config); err != nil {
return nil
}
}
return nil
}
if err := doTidy(); err != nil {
logger.Error("error running tidy", "error", err)
b.tidyStatusStop(err)
} else {
b.tidyStatusStop(nil)
}
}()
}
func (b *backend) doTidyCertStore(ctx context.Context, req *logical.Request, logger hclog.Logger, config *tidyConfig) error {
serials, err := req.Storage.List(ctx, "certs/")
if err != nil {
return fmt.Errorf("error fetching list of certs: %w", err)
}
serialCount := len(serials)
metrics.SetGauge([]string{"secrets", "pki", "tidy", "cert_store_total_entries"}, float32(serialCount))
for i, serial := range serials {
b.tidyStatusMessage(fmt.Sprintf("Tidying certificate store: checking entry %d of %d", i, serialCount))
metrics.SetGauge([]string{"secrets", "pki", "tidy", "cert_store_current_entry"}, float32(i))
certEntry, err := req.Storage.Get(ctx, "certs/"+serial)
if err != nil {
return fmt.Errorf("error fetching certificate %q: %w", serial, err)
}
if certEntry == nil {
logger.Warn("certificate entry is nil; tidying up since it is no longer useful for any server operations", "serial", serial)
if err := req.Storage.Delete(ctx, "certs/"+serial); err != nil {
return fmt.Errorf("error deleting nil entry with serial %s: %w", serial, err)
}
b.tidyStatusIncCertStoreCount()
continue
}
if certEntry.Value == nil || len(certEntry.Value) == 0 {
logger.Warn("certificate entry has no value; tidying up since it is no longer useful for any server operations", "serial", serial)
if err := req.Storage.Delete(ctx, "certs/"+serial); err != nil {
return fmt.Errorf("error deleting entry with nil value with serial %s: %w", serial, err)
}
b.tidyStatusIncCertStoreCount()
continue
}
cert, err := x509.ParseCertificate(certEntry.Value)
if err != nil {
return fmt.Errorf("unable to parse stored certificate with serial %q: %w", serial, err)
}
if time.Now().After(cert.NotAfter.Add(config.SafetyBuffer)) {
if err := req.Storage.Delete(ctx, "certs/"+serial); err != nil {
return fmt.Errorf("error deleting serial %q from storage: %w", serial, err)
}
b.tidyStatusIncCertStoreCount()
}
}
metrics.SetGauge([]string{"secrets", "pki", "tidy", "cert_store_total_entries_remaining"}, float32(uint(serialCount)-b.tidyStatus.certStoreDeletedCount))
return nil
}
func (b *backend) doTidyRevocationStore(ctx context.Context, req *logical.Request, logger hclog.Logger, config *tidyConfig) error {
b.revokeStorageLock.Lock()
defer b.revokeStorageLock.Unlock()
// Fetch and parse our issuers so we can associate them if necessary.
sc := b.makeStorageContext(ctx, req.Storage)
issuerIDCertMap, err := fetchIssuerMapForRevocationChecking(sc)
if err != nil {
return err
}
rebuildCRL := false
revokedSerials, err := req.Storage.List(ctx, "revoked/")
if err != nil {
return fmt.Errorf("error fetching list of revoked certs: %w", err)
}
revokedSerialsCount := len(revokedSerials)
metrics.SetGauge([]string{"secrets", "pki", "tidy", "revoked_cert_total_entries"}, float32(revokedSerialsCount))
fixedIssuers := 0
var revInfo revocationInfo
for i, serial := range revokedSerials {
b.tidyStatusMessage(fmt.Sprintf("Tidying revoked certificates: checking certificate %d of %d", i, len(revokedSerials)))
metrics.SetGauge([]string{"secrets", "pki", "tidy", "revoked_cert_current_entry"}, float32(i))
revokedEntry, err := req.Storage.Get(ctx, "revoked/"+serial)
if err != nil {
return fmt.Errorf("unable to fetch revoked cert with serial %q: %w", serial, err)
}
if revokedEntry == nil {
logger.Warn("revoked entry is nil; tidying up since it is no longer useful for any server operations", "serial", serial)
if err := req.Storage.Delete(ctx, "revoked/"+serial); err != nil {
return fmt.Errorf("error deleting nil revoked entry with serial %s: %w", serial, err)
}
b.tidyStatusIncRevokedCertCount()
continue
}
if revokedEntry.Value == nil || len(revokedEntry.Value) == 0 {
logger.Warn("revoked entry has nil value; tidying up since it is no longer useful for any server operations", "serial", serial)
if err := req.Storage.Delete(ctx, "revoked/"+serial); err != nil {
return fmt.Errorf("error deleting revoked entry with nil value with serial %s: %w", serial, err)
}
b.tidyStatusIncRevokedCertCount()
continue
}
err = revokedEntry.DecodeJSON(&revInfo)
if err != nil {
return fmt.Errorf("error decoding revocation entry for serial %q: %w", serial, err)
}
revokedCert, err := x509.ParseCertificate(revInfo.CertificateBytes)
if err != nil {
return fmt.Errorf("unable to parse stored revoked certificate with serial %q: %w", serial, err)
}
// Tidy operations over revoked certs should execute prior to
// tidyRevokedCerts as that may remove the entry. If that happens,
// we won't persist the revInfo changes (as it was deleted instead).
var storeCert bool
if config.IssuerAssocs {
if !isRevInfoIssuerValid(&revInfo, issuerIDCertMap) {
b.tidyStatusIncMissingIssuerCertCount()
revInfo.CertificateIssuer = issuerID("")
storeCert = true
if associateRevokedCertWithIsssuer(&revInfo, revokedCert, issuerIDCertMap) {
fixedIssuers += 1
}
}
}
if config.RevokedCerts {
// Only remove the entries from revoked/ and certs/ if we're
// past its NotAfter value. This is because we use the
// information on revoked/ to build the CRL and the
// information on certs/ for lookup.
if time.Now().After(revokedCert.NotAfter.Add(config.SafetyBuffer)) {
if err := req.Storage.Delete(ctx, "revoked/"+serial); err != nil {
return fmt.Errorf("error deleting serial %q from revoked list: %w", serial, err)
}
if err := req.Storage.Delete(ctx, "certs/"+serial); err != nil {
return fmt.Errorf("error deleting serial %q from store when tidying revoked: %w", serial, err)
}
rebuildCRL = true
storeCert = false
b.tidyStatusIncRevokedCertCount()
}
}
// If the entry wasn't removed but was otherwise modified,
// go ahead and write it back out.
if storeCert {
revokedEntry, err = logical.StorageEntryJSON("revoked/"+serial, revInfo)
if err != nil {
return fmt.Errorf("error building entry to persist changes to serial %v from revoked list: %v", serial, err)
}
err = req.Storage.Put(ctx, revokedEntry)
if err != nil {
return fmt.Errorf("error persisting changes to serial %v from revoked list: %v", serial, err)
}
}
}
metrics.SetGauge([]string{"secrets", "pki", "tidy", "revoked_cert_total_entries_remaining"}, float32(uint(revokedSerialsCount)-b.tidyStatus.revokedCertDeletedCount))
metrics.SetGauge([]string{"secrets", "pki", "tidy", "revoked_cert_entries_incorrect_issuers"}, float32(b.tidyStatus.missingIssuerCertCount))
metrics.SetGauge([]string{"secrets", "pki", "tidy", "revoked_cert_entries_fixed_issuers"}, float32(fixedIssuers))
if rebuildCRL {
// Expired certificates isn't generally an important
// reason to trigger a CRL rebuild for. Check if
// automatic CRL rebuilds have been enabled and defer
// the rebuild if so.
config, err := sc.getRevocationConfig()
if err != nil {
return err
}
if !config.AutoRebuild {
if err := b.crlBuilder.rebuild(ctx, b, req, false); err != nil {
return err
}
}
}
return nil
}
func (b *backend) pathTidyStatusRead(_ context.Context, _ *logical.Request, _ *framework.FieldData) (*logical.Response, error) {
// If this node is a performance secondary return an ErrReadOnly so that the request gets forwarded,
// but only if the PKI backend is not a local mount.
if b.System().ReplicationState().HasState(consts.ReplicationPerformanceSecondary) && !b.System().LocalMount() {
return nil, logical.ErrReadOnly
}
b.tidyStatusLock.RLock()
defer b.tidyStatusLock.RUnlock()
resp := &logical.Response{
Data: map[string]interface{}{
"safety_buffer": nil,
"tidy_cert_store": nil,
"tidy_revoked_certs": nil,
"state": "Inactive",
"error": nil,
"time_started": nil,
"time_finished": nil,
"message": nil,
"cert_store_deleted_count": nil,
"revoked_cert_deleted_count": nil,
"missing_issuer_cert_count": nil,
},
}
if b.tidyStatus.state == tidyStatusInactive {
return resp, nil
}
resp.Data["safety_buffer"] = b.tidyStatus.safetyBuffer
resp.Data["tidy_cert_store"] = b.tidyStatus.tidyCertStore
resp.Data["tidy_revoked_certs"] = b.tidyStatus.tidyRevokedCerts
resp.Data["tidy_revoked_cert_issuer_associations"] = b.tidyStatus.tidyRevokedAssocs
resp.Data["time_started"] = b.tidyStatus.timeStarted
resp.Data["message"] = b.tidyStatus.message
resp.Data["cert_store_deleted_count"] = b.tidyStatus.certStoreDeletedCount
resp.Data["revoked_cert_deleted_count"] = b.tidyStatus.revokedCertDeletedCount
resp.Data["missing_issuer_cert_count"] = b.tidyStatus.missingIssuerCertCount
switch b.tidyStatus.state {
case tidyStatusStarted:
resp.Data["state"] = "Running"
case tidyStatusFinished:
resp.Data["state"] = "Finished"
resp.Data["time_finished"] = b.tidyStatus.timeFinished
resp.Data["message"] = nil
case tidyStatusError:
resp.Data["state"] = "Error"
resp.Data["time_finished"] = b.tidyStatus.timeFinished
resp.Data["error"] = b.tidyStatus.err.Error()
// Don't clear the message so that it serves as a hint about when
// the error occurred.
}
return resp, nil
}
func (b *backend) tidyStatusStart(config *tidyConfig) {
b.tidyStatusLock.Lock()
defer b.tidyStatusLock.Unlock()
b.tidyStatus = &tidyStatus{
safetyBuffer: int(config.SafetyBuffer / time.Second),
tidyCertStore: config.CertStore,
tidyRevokedCerts: config.RevokedCerts,
tidyRevokedAssocs: config.IssuerAssocs,
state: tidyStatusStarted,
timeStarted: time.Now(),
}
metrics.SetGauge([]string{"secrets", "pki", "tidy", "start_time_epoch"}, float32(b.tidyStatus.timeStarted.Unix()))
}
func (b *backend) tidyStatusStop(err error) {
b.tidyStatusLock.Lock()
defer b.tidyStatusLock.Unlock()
b.tidyStatus.timeFinished = time.Now()
b.tidyStatus.err = err
if err == nil {
b.tidyStatus.state = tidyStatusFinished
} else {
b.tidyStatus.state = tidyStatusError
}
metrics.MeasureSince([]string{"secrets", "pki", "tidy", "duration"}, b.tidyStatus.timeStarted)
metrics.SetGauge([]string{"secrets", "pki", "tidy", "start_time_epoch"}, 0)
metrics.IncrCounter([]string{"secrets", "pki", "tidy", "cert_store_deleted_count"}, float32(b.tidyStatus.certStoreDeletedCount))
metrics.IncrCounter([]string{"secrets", "pki", "tidy", "revoked_cert_deleted_count"}, float32(b.tidyStatus.revokedCertDeletedCount))
if err != nil {
metrics.IncrCounter([]string{"secrets", "pki", "tidy", "failure"}, 1)
} else {
metrics.IncrCounter([]string{"secrets", "pki", "tidy", "success"}, 1)
}
}
func (b *backend) tidyStatusMessage(msg string) {
b.tidyStatusLock.Lock()
defer b.tidyStatusLock.Unlock()
b.tidyStatus.message = msg
}
func (b *backend) tidyStatusIncCertStoreCount() {
b.tidyStatusLock.Lock()
defer b.tidyStatusLock.Unlock()
b.tidyStatus.certStoreDeletedCount++
}
func (b *backend) tidyStatusIncRevokedCertCount() {
b.tidyStatusLock.Lock()
defer b.tidyStatusLock.Unlock()
b.tidyStatus.revokedCertDeletedCount++
}
func (b *backend) tidyStatusIncMissingIssuerCertCount() {
b.tidyStatusLock.Lock()
defer b.tidyStatusLock.Unlock()
b.tidyStatus.missingIssuerCertCount++
}
const pathTidyHelpSyn = `
Tidy up the backend by removing expired certificates, revocation information,
or both.
`
const pathTidyHelpDesc = `
This endpoint allows expired certificates and/or revocation information to be
removed from the backend, freeing up storage and shortening CRLs.
For safety, this function is a noop if called without parameters; cleanup from
normal certificate storage must be enabled with 'tidy_cert_store' and cleanup
from revocation information must be enabled with 'tidy_revocation_list'.
The 'safety_buffer' parameter is useful to ensure that clock skew amongst your
hosts cannot lead to a certificate being removed from the CRL while it is still
considered valid by other hosts (for instance, if their clocks are a few
minutes behind). The 'safety_buffer' parameter can be an integer number of
seconds or a string duration like "72h".
All certificates and/or revocation information currently stored in the backend
will be checked when this endpoint is hit. The expiration of the
certificate/revocation information of each certificate being held in
certificate storage or in revocation information will then be checked. If the
current time, minus the value of 'safety_buffer', is greater than the
expiration, it will be removed.
`
const pathTidyStatusHelpSyn = `
Returns the status of the tidy operation.
`
const pathTidyStatusHelpDesc = `
This is a read only endpoint that returns information about the current tidy
operation, or the most recent if none is currently running.
The result includes the following fields:
* 'safety_buffer': the value of this parameter when initiating the tidy operation
* 'tidy_cert_store': the value of this parameter when initiating the tidy operation
* 'tidy_revoked_certs': the value of this parameter when initiating the tidy operation
* 'tidy_revoked_cert_issuer_associations': the value of this parameter when initiating the tidy operation
* 'state': one of "Inactive", "Running", "Finished", "Error"
* 'error': the error message, if the operation ran into an error
* 'time_started': the time the operation started
* 'time_finished': the time the operation finished
* 'message': One of "Tidying certificate store: checking entry N of TOTAL" or
"Tidying revoked certificates: checking certificate N of TOTAL"
* 'cert_store_deleted_count': The number of certificate storage entries deleted
* 'revoked_cert_deleted_count': The number of revoked certificate entries deleted
* 'missing_issuer_cert_count': The number of revoked certificates which were missing a valid issuer reference
`