mirror of
https://github.com/optim-enterprises-bv/vault.git
synced 2025-11-02 19:47:54 +00:00
ACME Considerations Guide (#21225)
* Add notes on PKI performance and key types Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add ACME Public Internet section Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note on importance of tidy Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note on cluster scalability Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note about server log location Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Fix ToC, finish public ACME discussion Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note on role restrictions and ACLs Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note on security considerations of ACME Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add consideration note about cluster URLs Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note on 90 day certificates Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> * Add note about client counts and ACME Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com> --------- Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
This commit is contained in:
@@ -17,10 +17,19 @@ generating the CA to use with this secrets engine.
|
||||
- [Managed Keys](#managed-keys)
|
||||
- [One CA Certificate, One Secrets Engine](#one-ca-certificate-one-secrets-engine)
|
||||
- [Always Configure a Default Issuer](#always-configure-a-default-issuer)
|
||||
- [Key Types Matter](#key-types-matter)
|
||||
- [Key Types Matter](#key-types-matter)
|
||||
- [Cluster Performance and Key Types](#cluster-performance-and-key-types)
|
||||
- [Use a CA Hierarchy](#use-a-ca-hierarchy)
|
||||
- [Cross-Signed Intermediates](#cross-signed-intermediates)
|
||||
- [Keep certificate lifetimes short, for CRL's sake](#keep-certificate-lifetimes-short-for-crls-sake)
|
||||
- [Cluster URLs are Important](#cluster-urls-are-important)
|
||||
- [Automate Rotation with ACME](#automate-rotation-with-acme)
|
||||
- [ACME Stores Certificates](#acme-stores-certificates)
|
||||
- [ACME Role Restrictions Require EAB](#acme-role-restrictions-require-eab)
|
||||
- [ACME and the Public Internet](#acme-and-the-public-internet)
|
||||
- [ACME Errors are in Server Logs](#acme-errors-are-in-server-logs)
|
||||
- [ACME Security Considerations](#acme-security-considerations)
|
||||
- [ACME and Client Counting](#acme-and-client-counting)
|
||||
- [Keep Certificate Lifetimes Short, For CRL's Sake](#keep-certificate-lifetimes-short-for-crls-sake)
|
||||
- [NotAfter Behavior on Leaf Certificates](#notafter-behavior-on-leaf-certificates)
|
||||
- [Cluster Performance and Quantity of Leaf Certificates](#cluster-performance-and-quantity-of-leaf-certificates)
|
||||
- [You must configure issuing/CRL/OCSP information _in advance_](#you-must-configure-issuingcrlocsp-information-_in-advance_)
|
||||
@@ -120,7 +129,7 @@ issuer's CRL. This means maintaining a default issuer is important for both
|
||||
backwards compatibility for issuing certificates and for ensuring revoked
|
||||
certificates land on a CRL.
|
||||
|
||||
### Key Types Matter
|
||||
## Key Types Matter
|
||||
|
||||
Certain key types have impacts on performance. Signing certificates from a RSA
|
||||
key will be slower than issuing from an ECDSA or Ed25519 key. Key generation
|
||||
@@ -135,6 +144,60 @@ also be more expensive. Careful consideration of both issuer and issued key
|
||||
types can have meaningful impacts on performance of not only Vault, but
|
||||
systems using these certificates.
|
||||
|
||||
### Cluster Performance and Key Type
|
||||
|
||||
The [benchmark-vault](https://github.com/hashicorp/vault-benchmark) project
|
||||
can be used to measure the performance of a Vault PKI instance. In general,
|
||||
some considerations to be aware of:
|
||||
|
||||
- RSA key generation is much slower and highly variable than EC key
|
||||
generation. If performance and throughput are a necessity, consider using
|
||||
EC keys (including NIST P-curves and Ed25519) instead of RSA.
|
||||
|
||||
- Key signing requests (via `/pki/sign`) will be faster than (`/pki/issue`),
|
||||
especially for RSA keys: this removes the necessity for Vault to generate
|
||||
key material and can sign the key material provided by the client. This
|
||||
signing step is common between both endpoints, so key generation is pure
|
||||
overhead if the client has a sufficiently secure source of entropy.
|
||||
|
||||
- The CA's key type matters as well: using a RSA CA will result in a RSA
|
||||
signature and takes longer than a ECDSA or Ed25519 CA.
|
||||
|
||||
- Storage is an important factor: with [BYOC Revocation](/vault/api-docs/secret/pki#revoke-certificate),
|
||||
using `no_store=true` still gives you the ability to revoke certificates
|
||||
and audit logs can be used to track issuance. Clusters using a remote
|
||||
storage (like Consul) over a slow network and using `no_store=false` will
|
||||
result in additional latency on issuance. Adding leases for every issued
|
||||
certificate compounds the problem.
|
||||
|
||||
- Storing too many certificates results in longer `LIST /pki/certs` time,
|
||||
including the time to tidy the instance. As such, for large scale
|
||||
deployments (>= 250k active certificates) it is recommended to use audit
|
||||
logs to track certificates outside of Vault.
|
||||
|
||||
As a general comparison on unspecified hardware, using `benchmark-vault` for
|
||||
`30s` on a local, single node, raft-backed Vault instance:
|
||||
|
||||
- Vault can issue 300k certificates using EC P-256 for CA & leaf keys and
|
||||
without storage.
|
||||
|
||||
- But switching to storing these leaves drops that number to 65k, and only
|
||||
20k with leases.
|
||||
|
||||
- Using large, expensive RSA-4096 bit keys, Vault can only issue 160 leaves,
|
||||
regardless of whether or not storage or leases were used. The 95% key
|
||||
generation time is above 10s.
|
||||
|
||||
- In comparison, using P-521 keys, Vault can issue closer to 30k leaves
|
||||
without leases and 18k with leases.
|
||||
|
||||
These numbers are for example only, to represent the impact different key types
|
||||
can have on PKI cluster performance.
|
||||
|
||||
The use of ACME adds additional latency into these numbers, both because
|
||||
certificates need to be stored and because challenge validation needs to
|
||||
be performed.
|
||||
|
||||
## Use a CA Hierarchy
|
||||
|
||||
It is generally recommended to use a hierarchical CA setup, with a root
|
||||
@@ -176,7 +239,160 @@ can be constructed in the following order:
|
||||
All requests to this issuer for signing will now present the full cross-signed
|
||||
chain.
|
||||
|
||||
## Keep certificate lifetimes short, for CRL's sake
|
||||
## Cluster URLs are Important
|
||||
|
||||
In Vault 1.13, support for [templated AIA
|
||||
URLs](/vault/api-docs/secret/pki#enable_aia_url_templating-1)
|
||||
was added. With the [per-cluster URL
|
||||
configuration](/vault/api-docs/secret/pki#set-cluster-configuration) pointing
|
||||
to this Performance Replication cluster, AIA information will point to the
|
||||
cluster that issued this certificate automatically.
|
||||
|
||||
In Vault 1.14, with ACME support, the same configuration is used for allowing
|
||||
ACME clients to discover the URL of this cluster.
|
||||
|
||||
~> **Warning**: It is important to ensure that this configuration is
|
||||
up to date and maintained correctly, always pointing to the node's
|
||||
PR cluster address (which may be a Load Balanced or a DNS Round-Robbin
|
||||
address). If this configuration is not set on every Performance Replication
|
||||
cluster, certificate issuance (via REST and/or via ACME) will fail.
|
||||
|
||||
## Automate Rotation with ACME
|
||||
|
||||
In Vault 1.14, support for the [Automatic Certificate Management Environment
|
||||
(ACME)](https://datatracker.ietf.org/doc/html/rfc8555) protocol has been
|
||||
added to the PKI Engine. This is a standardized way to handle validation,
|
||||
issuance, rotation, and revocation of server certificates.
|
||||
|
||||
Many ecosystems, from web servers like Caddy, Nginx, and Apache, to
|
||||
orchestration environments like Kubernetes (via cert-manager) natively
|
||||
support issuance via the ACME protocol. For deployments without native
|
||||
support, stand-alone tools like certbot support fetching and renewing
|
||||
certificates on behalf of consumers. Vault's PKI Engine only includes server
|
||||
support for ACME; no client functionality has been included.
|
||||
|
||||
~> Note: Vault's PKI ACME server caps the certificate's validity at 90 days
|
||||
maximum, regardless of role and/or global limits. Shorter validity
|
||||
durations can be set via limiting the role's TTL to be under 90 days.
|
||||
Aligning with Let's Encrypt, we do not support the optional `NotBefore`
|
||||
and `NotAfter` order request parameters.
|
||||
|
||||
### ACME Stores Certificates
|
||||
|
||||
Because ACME requires stored certificates in order to function, the notes
|
||||
[below about automating tidy](#automate-crl-building-and-tidying) are
|
||||
especially important for the long-term health of the PKI cluster. ACME also
|
||||
introduces additional resource types (accounts, orders, authorizations, and
|
||||
challenges) that must be tidied via [the `tidy_acme=true`
|
||||
option](/vault/api-docs/secret/pki#tidy). Orders, authorizations, and
|
||||
challenges are [cleaned up based on the
|
||||
`safety_buffer`](/vault/api-docs/secret/pki#safety_buffer)
|
||||
parameter, but accounts can live longer past their last issued certificate
|
||||
by controlling the [`acme_account_safety_buffer`
|
||||
parameter](/vault/api-docs/secret/pki#acme_account_safety_buffer).
|
||||
|
||||
As a consequence of the above, and like the discussions in the [Cluster
|
||||
Scalability](#cluster-scalability) section, because these roles have
|
||||
`no_store=false` set, ACME can only issue certificates on the active nodes
|
||||
of PR clusters; standby nodes, if contacted, will transparently forward
|
||||
all requests to the active node.
|
||||
|
||||
### ACME Role Restrictions Require EAB
|
||||
|
||||
Because ACME by default has no external authorization engine and is
|
||||
unauthenticated from a Vault perspective, the use of roles with ACME
|
||||
in the default configuration are of limited value as any ACME client
|
||||
can request certificates under any role by proving possession of the
|
||||
requested certificate identifiers.
|
||||
|
||||
To solve this issue, there are two possible approaches:
|
||||
|
||||
1. Use a restrictive [`allowed_roles`, `allowed_issuers`, and
|
||||
`default_directory_policy` ACME
|
||||
configuration](/vault/api-docs/secret/pki#set-acme-configuration)
|
||||
to let only a single role and issuer be used. This prevents user
|
||||
choice, allowing some global restrictions to be placed on issuance
|
||||
and avoids requiring ACME clients to have (at initial setup) access
|
||||
to a Vault token other mechanism for acquiring a Vault EAB ACME token.
|
||||
2. Use a more permissive [configuration with
|
||||
`eab_policy=always-required`](/vault/api-docs/secret/pki#eab_policy)
|
||||
to allow more roles and users to select the roles, but bind ACME clients
|
||||
to a Vault token which can be suitably ACL'd to particular sets of
|
||||
approved ACME directories.
|
||||
|
||||
The choice of approach depends on the policies of the organization wishing
|
||||
to use ACME.
|
||||
|
||||
### ACME and the Public Internet
|
||||
|
||||
Using ACME is possible over the public internet; public CAs like Let's Encrypt
|
||||
offer this as a service. Similarly, organizations running internal PKI
|
||||
infrastructure might wish to issue server certificates to pieces of
|
||||
infrastructure outside of their internal network boundaries, from a publicly
|
||||
accessible Vault instance. By default, without enforcing a restrictive
|
||||
`eab_policy`, this results in a complicated threat model: _any_ external
|
||||
client which can prove possession of a domain can issue a certificate under
|
||||
this CA, which might be considered more trusted by this organization.
|
||||
|
||||
As such, we strongly recommend publicly facing Vault instances (such as HCP
|
||||
Vault) enforce that PKI mount operators have required a [restrictive
|
||||
`eab_policy=always-required` configuration](/vault/api-docs/secret/pki#eab_policy).
|
||||
System administrators of Vault instances can enforce this by [setting the
|
||||
`VAULT_DISABLE_PUBLIC_ACME=true` environment
|
||||
variable](/vault/api-docs/secret/pki#acme-external-account-bindings).
|
||||
|
||||
### ACME Errors are in Server Logs
|
||||
|
||||
Because the ACME client is not necessarily trusted (as account registration
|
||||
may not be tied to a valid Vault token when EAB is not used), many error
|
||||
messages end up in the Vault server logs out of security necessity. When
|
||||
troubleshooting issues with clients requesting certificates, first check
|
||||
the client's logs, if any, (e.g., certbot will state the log location on
|
||||
errors), and then correlate with Vault server logs to identify the failure
|
||||
reason.
|
||||
|
||||
### ACME Security Considerations
|
||||
|
||||
ACME allows any client to use Vault to make some sort of external call;
|
||||
while the design of ACME attempts to minimize this scope and will prohibit
|
||||
issuance if incorrect servers are contacted, it cannot account for all
|
||||
possible remote server implementations. Vault's ACME server makes three
|
||||
types of requests:
|
||||
|
||||
1. DNS requests for `_acme-challenge.<domain>`, which should be least
|
||||
invasive and most safe.
|
||||
2. TLS ALPN requests for the `acme-tls/1` protocol, which should be
|
||||
safely handled by the TLS before any application code is invoked.
|
||||
3. HTTP requests to `http://<domain>/.well-known/acme-challenge/<token>`,
|
||||
which could be problematic based on server design; if all requests,
|
||||
regardless of path, are treated the same and assumed to be trusted,
|
||||
this could result in Vault being used to make (invalid) requests.
|
||||
Ideally, any such server implementations should be updated to ignore
|
||||
such ACME validation requests or to block access originating from Vault
|
||||
to this service.
|
||||
|
||||
In all cases, no information about the response presented by the remote
|
||||
server is returned to the ACME client.
|
||||
|
||||
When running Vault on multiple networks, note that Vault's ACME server
|
||||
places no restrictions on requesting client/destination identifier
|
||||
validations paths; a client could use a HTTP challenge to force Vault to
|
||||
reach out to a server on a network it could otherwise not access.
|
||||
|
||||
### ACME and Client Counting
|
||||
|
||||
In Vault 1.14, ACME contributes differently to usage metrics than other
|
||||
interactions with the PKI Secrets Engine. Due to its use of unauthenticated
|
||||
requests (which do not generate Vault tokens), it would not be counted in
|
||||
the traditional [activity log APIs](/vault/api-docs/system/internal-counters#activity-export).
|
||||
Instead, certificates issued via ACME will be counted via their unique
|
||||
certificate identifiers (the combination of CN, DNS SANs, and IP SANs).
|
||||
These will create a stable identifier that will be consistent across
|
||||
renewals, other ACME clients, mounts, and namespaces, contributing to
|
||||
the activity log presently as a non-entity token attributed to the first
|
||||
mount which created that request.
|
||||
|
||||
## Keep Certificate Lifetimes Short, For CRL's Sake
|
||||
|
||||
This secrets engine aligns with Vault's philosophy of short-lived secrets. As
|
||||
such it is not expected that CRLs will grow large; the only place a private key
|
||||
|
||||
Reference in New Issue
Block a user