mirror of
https://github.com/optim-enterprises-bv/vault.git
synced 2025-11-03 03:58:01 +00:00
ACME Considerations Guide (#21225)
* Add notes on PKI performance and key types Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add ACME Public Internet section Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note on importance of tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note on cluster scalability Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note about server log location Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix ToC, finish public ACME discussion Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note on role restrictions and ACLs Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note on security considerations of ACME Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add consideration note about cluster URLs Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note on 90 day certificates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note about client counts and ACME Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> --------- Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
This commit is contained in:
@@ -17,10 +17,19 @@ generating the CA to use with this secrets engine.
|
|||||||
- [Managed Keys](#managed-keys)
|
- [Managed Keys](#managed-keys)
|
||||||
- [One CA Certificate, One Secrets Engine](#one-ca-certificate-one-secrets-engine)
|
- [One CA Certificate, One Secrets Engine](#one-ca-certificate-one-secrets-engine)
|
||||||
- [Always Configure a Default Issuer](#always-configure-a-default-issuer)
|
- [Always Configure a Default Issuer](#always-configure-a-default-issuer)
|
||||||
- [Key Types Matter](#key-types-matter)
|
- [Key Types Matter](#key-types-matter)
|
||||||
|
- [Cluster Performance and Key Types](#cluster-performance-and-key-types)
|
||||||
- [Use a CA Hierarchy](#use-a-ca-hierarchy)
|
- [Use a CA Hierarchy](#use-a-ca-hierarchy)
|
||||||
- [Cross-Signed Intermediates](#cross-signed-intermediates)
|
- [Cross-Signed Intermediates](#cross-signed-intermediates)
|
||||||
- [Keep certificate lifetimes short, for CRL's sake](#keep-certificate-lifetimes-short-for-crls-sake)
|
- [Cluster URLs are Important](#cluster-urls-are-important)
|
||||||
|
- [Automate Rotation with ACME](#automate-rotation-with-acme)
|
||||||
|
- [ACME Stores Certificates](#acme-stores-certificates)
|
||||||
|
- [ACME Role Restrictions Require EAB](#acme-role-restrictions-require-eab)
|
||||||
|
- [ACME and the Public Internet](#acme-and-the-public-internet)
|
||||||
|
- [ACME Errors are in Server Logs](#acme-errors-are-in-server-logs)
|
||||||
|
- [ACME Security Considerations](#acme-security-considerations)
|
||||||
|
- [ACME and Client Counting](#acme-and-client-counting)
|
||||||
|
- [Keep Certificate Lifetimes Short, For CRL's Sake](#keep-certificate-lifetimes-short-for-crls-sake)
|
||||||
- [NotAfter Behavior on Leaf Certificates](#notafter-behavior-on-leaf-certificates)
|
- [NotAfter Behavior on Leaf Certificates](#notafter-behavior-on-leaf-certificates)
|
||||||
- [Cluster Performance and Quantity of Leaf Certificates](#cluster-performance-and-quantity-of-leaf-certificates)
|
- [Cluster Performance and Quantity of Leaf Certificates](#cluster-performance-and-quantity-of-leaf-certificates)
|
||||||
- [You must configure issuing/CRL/OCSP information _in advance_](#you-must-configure-issuingcrlocsp-information-_in-advance_)
|
- [You must configure issuing/CRL/OCSP information _in advance_](#you-must-configure-issuingcrlocsp-information-_in-advance_)
|
||||||
@@ -120,7 +129,7 @@ issuer's CRL. This means maintaining a default issuer is important for both
|
|||||||
backwards compatibility for issuing certificates and for ensuring revoked
|
backwards compatibility for issuing certificates and for ensuring revoked
|
||||||
certificates land on a CRL.
|
certificates land on a CRL.
|
||||||
|
|
||||||
### Key Types Matter
|
## Key Types Matter
|
||||||
|
|
||||||
Certain key types have impacts on performance. Signing certificates from a RSA
|
Certain key types have impacts on performance. Signing certificates from a RSA
|
||||||
key will be slower than issuing from an ECDSA or Ed25519 key. Key generation
|
key will be slower than issuing from an ECDSA or Ed25519 key. Key generation
|
||||||
@@ -135,6 +144,60 @@ also be more expensive. Careful consideration of both issuer and issued key
|
|||||||
types can have meaningful impacts on performance of not only Vault, but
|
types can have meaningful impacts on performance of not only Vault, but
|
||||||
systems using these certificates.
|
systems using these certificates.
|
||||||
|
|
||||||
|
### Cluster Performance and Key Type
|
||||||
|
|
||||||
|
The [benchmark-vault](https://github.com/hashicorp/vault-benchmark) project
|
||||||
|
can be used to measure the performance of a Vault PKI instance. In general,
|
||||||
|
some considerations to be aware of:
|
||||||
|
|
||||||
|
- RSA key generation is much slower and highly variable than EC key
|
||||||
|
generation. If performance and throughput are a necessity, consider using
|
||||||
|
EC keys (including NIST P-curves and Ed25519) instead of RSA.
|
||||||
|
|
||||||
|
- Key signing requests (via `/pki/sign`) will be faster than (`/pki/issue`),
|
||||||
|
especially for RSA keys: this removes the necessity for Vault to generate
|
||||||
|
key material and can sign the key material provided by the client. This
|
||||||
|
signing step is common between both endpoints, so key generation is pure
|
||||||
|
overhead if the client has a sufficiently secure source of entropy.
|
||||||
|
|
||||||
|
- The CA's key type matters as well: using a RSA CA will result in a RSA
|
||||||
|
signature and takes longer than a ECDSA or Ed25519 CA.
|
||||||
|
|
||||||
|
- Storage is an important factor: with [BYOC Revocation](/vault/api-docs/secret/pki#revoke-certificate),
|
||||||
|
using `no_store=true` still gives you the ability to revoke certificates
|
||||||
|
and audit logs can be used to track issuance. Clusters using a remote
|
||||||
|
storage (like Consul) over a slow network and using `no_store=false` will
|
||||||
|
result in additional latency on issuance. Adding leases for every issued
|
||||||
|
certificate compounds the problem.
|
||||||
|
|
||||||
|
- Storing too many certificates results in longer `LIST /pki/certs` time,
|
||||||
|
including the time to tidy the instance. As such, for large scale
|
||||||
|
deployments (>= 250k active certificates) it is recommended to use audit
|
||||||
|
logs to track certificates outside of Vault.
|
||||||
|
|
||||||
|
As a general comparison on unspecified hardware, using `benchmark-vault` for
|
||||||
|
`30s` on a local, single node, raft-backed Vault instance:
|
||||||
|
|
||||||
|
- Vault can issue 300k certificates using EC P-256 for CA & leaf keys and
|
||||||
|
without storage.
|
||||||
|
|
||||||
|
- But switching to storing these leaves drops that number to 65k, and only
|
||||||
|
20k with leases.
|
||||||
|
|
||||||
|
- Using large, expensive RSA-4096 bit keys, Vault can only issue 160 leaves,
|
||||||
|
regardless of whether or not storage or leases were used. The 95% key
|
||||||
|
generation time is above 10s.
|
||||||
|
|
||||||
|
- In comparison, using P-521 keys, Vault can issue closer to 30k leaves
|
||||||
|
without leases and 18k with leases.
|
||||||
|
|
||||||
|
These numbers are for example only, to represent the impact different key types
|
||||||
|
can have on PKI cluster performance.
|
||||||
|
|
||||||
|
The use of ACME adds additional latency into these numbers, both because
|
||||||
|
certificates need to be stored and because challenge validation needs to
|
||||||
|
be performed.
|
||||||
|
|
||||||
## Use a CA Hierarchy
|
## Use a CA Hierarchy
|
||||||
|
|
||||||
It is generally recommended to use a hierarchical CA setup, with a root
|
It is generally recommended to use a hierarchical CA setup, with a root
|
||||||
@@ -176,7 +239,160 @@ can be constructed in the following order:
|
|||||||
All requests to this issuer for signing will now present the full cross-signed
|
All requests to this issuer for signing will now present the full cross-signed
|
||||||
chain.
|
chain.
|
||||||
|
|
||||||
## Keep certificate lifetimes short, for CRL's sake
|
## Cluster URLs are Important
|
||||||
|
|
||||||
|
In Vault 1.13, support for [templated AIA
|
||||||
|
URLs](/vault/api-docs/secret/pki#enable_aia_url_templating-1)
|
||||||
|
was added. With the [per-cluster URL
|
||||||
|
configuration](/vault/api-docs/secret/pki#set-cluster-configuration) pointing
|
||||||
|
to this Performance Replication cluster, AIA information will point to the
|
||||||
|
cluster that issued this certificate automatically.
|
||||||
|
|
||||||
|
In Vault 1.14, with ACME support, the same configuration is used for allowing
|
||||||
|
ACME clients to discover the URL of this cluster.
|
||||||
|
|
||||||
|
~> **Warning**: It is important to ensure that this configuration is
|
||||||
|
up to date and maintained correctly, always pointing to the node's
|
||||||
|
PR cluster address (which may be a Load Balanced or a DNS Round-Robbin
|
||||||
|
address). If this configuration is not set on every Performance Replication
|
||||||
|
cluster, certificate issuance (via REST and/or via ACME) will fail.
|
||||||
|
|
||||||
|
## Automate Rotation with ACME
|
||||||
|
|
||||||
|
In Vault 1.14, support for the [Automatic Certificate Management Environment
|
||||||
|
(ACME)](https://datatracker.ietf.org/doc/html/rfc8555) protocol has been
|
||||||
|
added to the PKI Engine. This is a standardized way to handle validation,
|
||||||
|
issuance, rotation, and revocation of server certificates.
|
||||||
|
|
||||||
|
Many ecosystems, from web servers like Caddy, Nginx, and Apache, to
|
||||||
|
orchestration environments like Kubernetes (via cert-manager) natively
|
||||||
|
support issuance via the ACME protocol. For deployments without native
|
||||||
|
support, stand-alone tools like certbot support fetching and renewing
|
||||||
|
certificates on behalf of consumers. Vault's PKI Engine only includes server
|
||||||
|
support for ACME; no client functionality has been included.
|
||||||
|
|
||||||
|
~> Note: Vault's PKI ACME server caps the certificate's validity at 90 days
|
||||||
|
maximum, regardless of role and/or global limits. Shorter validity
|
||||||
|
durations can be set via limiting the role's TTL to be under 90 days.
|
||||||
|
Aligning with Let's Encrypt, we do not support the optional `NotBefore`
|
||||||
|
and `NotAfter` order request parameters.
|
||||||
|
|
||||||
|
### ACME Stores Certificates
|
||||||
|
|
||||||
|
Because ACME requires stored certificates in order to function, the notes
|
||||||
|
[below about automating tidy](#automate-crl-building-and-tidying) are
|
||||||
|
especially important for the long-term health of the PKI cluster. ACME also
|
||||||
|
introduces additional resource types (accounts, orders, authorizations, and
|
||||||
|
challenges) that must be tidied via [the `tidy_acme=true`
|
||||||
|
option](/vault/api-docs/secret/pki#tidy). Orders, authorizations, and
|
||||||
|
challenges are [cleaned up based on the
|
||||||
|
`safety_buffer`](/vault/api-docs/secret/pki#safety_buffer)
|
||||||
|
parameter, but accounts can live longer past their last issued certificate
|
||||||
|
by controlling the [`acme_account_safety_buffer`
|
||||||
|
parameter](/vault/api-docs/secret/pki#acme_account_safety_buffer).
|
||||||
|
|
||||||
|
As a consequence of the above, and like the discussions in the [Cluster
|
||||||
|
Scalability](#cluster-scalability) section, because these roles have
|
||||||
|
`no_store=false` set, ACME can only issue certificates on the active nodes
|
||||||
|
of PR clusters; standby nodes, if contacted, will transparently forward
|
||||||
|
all requests to the active node.
|
||||||
|
|
||||||
|
### ACME Role Restrictions Require EAB
|
||||||
|
|
||||||
|
Because ACME by default has no external authorization engine and is
|
||||||
|
unauthenticated from a Vault perspective, the use of roles with ACME
|
||||||
|
in the default configuration are of limited value as any ACME client
|
||||||
|
can request certificates under any role by proving possession of the
|
||||||
|
requested certificate identifiers.
|
||||||
|
|
||||||
|
To solve this issue, there are two possible approaches:
|
||||||
|
|
||||||
|
1. Use a restrictive [`allowed_roles`, `allowed_issuers`, and
|
||||||
|
`default_directory_policy` ACME
|
||||||
|
configuration](/vault/api-docs/secret/pki#set-acme-configuration)
|
||||||
|
to let only a single role and issuer be used. This prevents user
|
||||||
|
choice, allowing some global restrictions to be placed on issuance
|
||||||
|
and avoids requiring ACME clients to have (at initial setup) access
|
||||||
|
to a Vault token other mechanism for acquiring a Vault EAB ACME token.
|
||||||
|
2. Use a more permissive [configuration with
|
||||||
|
`eab_policy=always-required`](/vault/api-docs/secret/pki#eab_policy)
|
||||||
|
to allow more roles and users to select the roles, but bind ACME clients
|
||||||
|
to a Vault token which can be suitably ACL'd to particular sets of
|
||||||
|
approved ACME directories.
|
||||||
|
|
||||||
|
The choice of approach depends on the policies of the organization wishing
|
||||||
|
to use ACME.
|
||||||
|
|
||||||
|
### ACME and the Public Internet
|
||||||
|
|
||||||
|
Using ACME is possible over the public internet; public CAs like Let's Encrypt
|
||||||
|
offer this as a service. Similarly, organizations running internal PKI
|
||||||
|
infrastructure might wish to issue server certificates to pieces of
|
||||||
|
infrastructure outside of their internal network boundaries, from a publicly
|
||||||
|
accessible Vault instance. By default, without enforcing a restrictive
|
||||||
|
`eab_policy`, this results in a complicated threat model: _any_ external
|
||||||
|
client which can prove possession of a domain can issue a certificate under
|
||||||
|
this CA, which might be considered more trusted by this organization.
|
||||||
|
|
||||||
|
As such, we strongly recommend publicly facing Vault instances (such as HCP
|
||||||
|
Vault) enforce that PKI mount operators have required a [restrictive
|
||||||
|
`eab_policy=always-required` configuration](/vault/api-docs/secret/pki#eab_policy).
|
||||||
|
System administrators of Vault instances can enforce this by [setting the
|
||||||
|
`VAULT_DISABLE_PUBLIC_ACME=true` environment
|
||||||
|
variable](/vault/api-docs/secret/pki#acme-external-account-bindings).
|
||||||
|
|
||||||
|
### ACME Errors are in Server Logs
|
||||||
|
|
||||||
|
Because the ACME client is not necessarily trusted (as account registration
|
||||||
|
may not be tied to a valid Vault token when EAB is not used), many error
|
||||||
|
messages end up in the Vault server logs out of security necessity. When
|
||||||
|
troubleshooting issues with clients requesting certificates, first check
|
||||||
|
the client's logs, if any, (e.g., certbot will state the log location on
|
||||||
|
errors), and then correlate with Vault server logs to identify the failure
|
||||||
|
reason.
|
||||||
|
|
||||||
|
### ACME Security Considerations
|
||||||
|
|
||||||
|
ACME allows any client to use Vault to make some sort of external call;
|
||||||
|
while the design of ACME attempts to minimize this scope and will prohibit
|
||||||
|
issuance if incorrect servers are contacted, it cannot account for all
|
||||||
|
possible remote server implementations. Vault's ACME server makes three
|
||||||
|
types of requests:
|
||||||
|
|
||||||
|
1. DNS requests for `_acme-challenge.<domain>`, which should be least
|
||||||
|
invasive and most safe.
|
||||||
|
2. TLS ALPN requests for the `acme-tls/1` protocol, which should be
|
||||||
|
safely handled by the TLS before any application code is invoked.
|
||||||
|
3. HTTP requests to `http://<domain>/.well-known/acme-challenge/<token>`,
|
||||||
|
which could be problematic based on server design; if all requests,
|
||||||
|
regardless of path, are treated the same and assumed to be trusted,
|
||||||
|
this could result in Vault being used to make (invalid) requests.
|
||||||
|
Ideally, any such server implementations should be updated to ignore
|
||||||
|
such ACME validation requests or to block access originating from Vault
|
||||||
|
to this service.
|
||||||
|
|
||||||
|
In all cases, no information about the response presented by the remote
|
||||||
|
server is returned to the ACME client.
|
||||||
|
|
||||||
|
When running Vault on multiple networks, note that Vault's ACME server
|
||||||
|
places no restrictions on requesting client/destination identifier
|
||||||
|
validations paths; a client could use a HTTP challenge to force Vault to
|
||||||
|
reach out to a server on a network it could otherwise not access.
|
||||||
|
|
||||||
|
### ACME and Client Counting
|
||||||
|
|
||||||
|
In Vault 1.14, ACME contributes differently to usage metrics than other
|
||||||
|
interactions with the PKI Secrets Engine. Due to its use of unauthenticated
|
||||||
|
requests (which do not generate Vault tokens), it would not be counted in
|
||||||
|
the traditional [activity log APIs](/vault/api-docs/system/internal-counters#activity-export).
|
||||||
|
Instead, certificates issued via ACME will be counted via their unique
|
||||||
|
certificate identifiers (the combination of CN, DNS SANs, and IP SANs).
|
||||||
|
These will create a stable identifier that will be consistent across
|
||||||
|
renewals, other ACME clients, mounts, and namespaces, contributing to
|
||||||
|
the activity log presently as a non-entity token attributed to the first
|
||||||
|
mount which created that request.
|
||||||
|
|
||||||
|
## Keep Certificate Lifetimes Short, For CRL's Sake
|
||||||
|
|
||||||
This secrets engine aligns with Vault's philosophy of short-lived secrets. As
|
This secrets engine aligns with Vault's philosophy of short-lived secrets. As
|
||||||
such it is not expected that CRLs will grow large; the only place a private key
|
such it is not expected that CRLs will grow large; the only place a private key
|
||||||
|
|||||||
Reference in New Issue
Block a user