Commit Graph

1774 Commits

Author SHA1 Message Date
Rachel Culpepper
9ebcbf6a0c vault-24597: add key types and key creation for CMAC (#25967)
* add key types for cmac for transit key creation

* add test for key creation

* fix test logic and add cases

* fix logic for hmac

* add go doc

* fix key size and add check for HMAC key
2024-04-19 09:39:59 -05:00
Scott Miller
fd9e113c82 Use a less strict URL validation for PKI issuing and crl distribution urls (#26477)
* Use a less strict URL validation for PKI issuing and crl distribution urls

* comma handling

* limit to ldap

* remove comma hack

* changelog

* Add unit test validating ldap CRL urls

---------

Co-authored-by: Steve Clark <steven.clark@hashicorp.com>
2024-04-18 17:35:33 +00:00
Peter Wilson
8bee54c89d VAULT-24452: audit refactor (#26460)
* Refactor audit code into audit package
* remove builtin/audit
* removed unrequired files
2024-04-18 08:25:04 +01:00
Christopher Swenson
961bf20bdb Use enumer to generate String() methods for most enums (#25705)
We have many hand-written String() methods (and similar) for enums.
These require more maintenance and are more error-prone than using
automatically generated methods. In addition, the auto-generated
versions can be more efficient.

Here, we switch to using https://github.com/loggerhead/enumer, itself
a fork of https://github.com/diegostamigni/enumer, no longer maintained,
and a fork of the mostly standard tool
https://pkg.go.dev/golang.org/x/tools/cmd/stringer.
We use this fork of enumer for Go 1.20+ compatibility and because
we require the `-transform` flag to be able to generate
constants that match our current code base.

Some enums were not targeted for this change:
2024-04-17 11:14:14 -07:00
Christopher Swenson
a65d9133a1 database: Avoid race condition in connection creation (#26147)
When creating database connections, there is a race
condition when multiple goroutines try to create the
connection at the same time. This happens, for
example, on leadership changes in a cluster.

Normally, the extra database connections are cleaned
up when this is detected. However, some database
implementations, notably Postgres, do not seem to
clean up in a timely manner, and can leak in these
scenarios.

To fix this, we create a global lock when creating
database connections to prevent multiple connections
from being created at the same time.

We also clean up the logic at the end so that
if (somehow) we ended up creating an additional
connection, we use the existing one rather than
the new one. This by itself would solve our
problem long-term, however, would still involve
many transient database connections being created
and immediately killed on leadership changes.

It's not ideal to have a single global lock for
database connection creation. Some potential
alternatives:

* a map of locks from the connection name to the lock.
  The biggest downside is the we probably will want to
  garbage collect this map so that we don't have an
  unbounded number of locks.
* a small pool of locks, where we hash the connection
  names to pick the lock. Using such a pool generally
  is a good way to introduce deadlock, but since we
  will only use it in a specific case, and the purpose
  is to improve performance for concurrent connection
  creation, this is probably acceptable.

Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
2024-03-26 16:58:07 +00:00
Josh Black
012c3422f8 Add acme clients internal data structures and adjust tests (#26020)
* add acme clients internal data structures and adjust tests

* fix another acme test

* replace manual list with ActivityClientTypes

* add changelog
2024-03-19 09:24:54 -07:00
Steven Clark
94d42235cf Address OCSP client caching issue (#25986)
* Address OCSP client caching issue

 - The OCSP cache built into the client that is used by cert-auth
   would cache the responses but when pulling out a cached value the
   response wasn't validating properly and was then thrown away.

 - The issue was around a confusion of the client's internal status
   vs the Go SDK OCSP status integer values.

 - Add a test that validates the cache is now used

* Add cl

* Fix PKI test failing now due to the OCSP cache working

 - Remove the previous lookup before revocation as now the OCSP
   cache works so we don't see the new revocation as we are actually
   leveraging the cache
2024-03-18 19:11:14 +00:00
Austin Gebauer
df57ff46ff Add stubs for plugin WIF (#25657)
* Add stubs for plugin wif

* add header to sdk file

* drop changelog to move it

* fix test
2024-02-27 12:10:43 -08:00
Josh Black
fa13dbd381 add gosimport to make fmt and run it (#25383)
* add gosimport to make fmt and run it

* move installation to tools.sh

* correct weird spacing issue

* Update Makefile

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>

* fix a weird issue

---------

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
2024-02-13 14:07:02 -08:00
Steven Clark
7463055f07 Transit: Release locks using defer statements (#25336)
* Transit: Release locks using defer statements

 - Leverage defer statements to Unlock the fetched policy
   to avoid issues with forgetting to manually Unlock during
   each return statement

* Add cl
2024-02-09 14:06:23 -05:00
Kit Haines
9536129091 Fix test to be less flacky (#25243)
* Fix test to be less flacky

* Fix duration to be asymmetrical, and try diff calculation more obviously.
2024-02-08 14:44:05 -05:00
Christopher Swenson
55d2dfb3d0 database: Emit event notifications (#24718)
Including for failures to write credentials and failure to rotate.
2024-02-05 10:30:00 -08:00
Kit Haines
99c74f5c80 Base Binary Cert and CSR Parse functions. (#24958)
* Base Binary Cert and CSR Parse functions.

* Add otherSANS parsing.

* Notate what doesn't exist on a CSR.

* Fix otherSans call err-checking and add basic-constriants to CSR

* Move BasicConstraint parsing to be optionally set.

* Refactored to use existing ParseBasicConstraintsExtension.

* Add handling for the ChangeSubjectName ext on CSR that is needed for EST

* Remove ChangeSubjectName - it's an attribute, not an extension, and there is no clean way to parse it, so pair down for now.

* Make these public methods, so they can be used in vault.

* Add unit tests for certutil.ParseCertificateToCreationParameters.

Also add unit tests for certutil.ParseCertificateToFields.

* Cleanup TestParseCertificate.

* Add unit tests for certutil.ParseCsrToCreationParameters and ParseCsrToFields.

* Fix return values for  "add_basic_constraints" in certutil.ParseCsrToFields.

Add a test for parsing CSRs where "add_basic_constraints" is false.

* Clear up some todos.

* Add a test for certutil.ParseCertificateToCreationParameters for non-CA cert.

* Tweak TestParseCertificate/full_non_CA_cert.

* Basics of three remaining fields - keyUsage; extKeyUsage; PolicyIdentifiers

* Fix tests and err handling

* Add unit tests for policy_identifiers; ext_key_usage_oids; key_usage

* Add test on ext_key_usage_oids

* Remove duplicate usages elsewhere.

* Add error handling to csr-checks.

* Remove extranames on returned types.

* Remove useless function.

---------

Co-authored-by: Victor Rodriguez <vrizo@hashicorp.com>
2024-02-01 10:03:43 -05:00
Tom Proctor
78ef25e70c HTTP API for pinning plugin versions (#25105) 2024-01-30 10:24:33 +00:00
vinay-gopalan
fcf7cf6c22 WIF support for AWS secrets engine (#24987)
* add new plugin wif fields to AWS Secrets Engine

* add changelog

* go get awsutil v0.3.0

* fix up changelog

* fix test and field parsing helper

* godoc on new test

* require role arn when audience set

* make fmt

---------

Co-authored-by: Austin Gebauer <agebauer@hashicorp.com>
Co-authored-by: Austin Gebauer <34121980+austingebauer@users.noreply.github.com>
2024-01-29 11:34:57 -08:00
Mike Palmiotto
43be9fc18a Request Limiter (#25093)
This commit introduces two new adaptive concurrency limiters in Vault,
which should handle overloading of the server during periods of
untenable request rate. The limiter adjusts the number of allowable
in-flight requests based on latency measurements performed across the
request duration. This approach allows us to reject entire requests
prior to doing any work and prevents clients from exceeding server
capacity.

The limiters intentionally target two separate vectors that have been
proven to lead to server over-utilization.

- Back pressure from the storage backend, resulting in bufferbloat in
  the WAL system. (enterprise)
- Back pressure from CPU over-utilization via PKI issue requests
  (specifically for RSA keys), resulting in failed heartbeats.

Storage constraints can be accounted for by limiting logical requests
according to their http.Method. We only limit requests with write-based
methods, since these will result in storage Puts and exhibit the
aforementioned bufferbloat.

CPU constraints are accounted for using the same underlying library and
technique; however, they require special treatment. The maximum number
of concurrent pki/issue requests found in testing (again, specifically
for RSA keys) is far lower than the minimum tolerable write request
rate. Without separate limiting, we would artificially impose limits on
tolerable request rates for non-PKI requests. To specifically target PKI
issue requests, we add a new PathsSpecial field, called limited,
allowing backends to specify a list of paths which should get
special-case request limiting.

For the sake of code cleanliness and future extensibility, we introduce
the concept of a LimiterRegistry. The registry proposed in this PR has
two entries, corresponding with the two vectors above. Each Limiter
entry has its own corresponding maximum and minimum concurrency,
allowing them to react to latency deviation independently and handle
high volumes of requests to targeted bottlenecks (CPU and storage).

In both cases, utilization will be effectively throttled before Vault
reaches any degraded state. The resulting 503 - Service Unavailable is a
retryable HTTP response code, which can be handled to gracefully retry
and eventually succeed. Clients should handle this by retrying with
jitter and exponential backoff. This is done within Vault's API, using
the go-retryablehttp library.

Limiter testing was performed via benchmarks of mixed workloads and
across a deployment of agent pods with great success.
2024-01-26 14:26:21 -05:00
Kuba Wieczorek
71afc5bdb4 Swap calls to t.Log to a corehelpers test logger in ACME tests (#25096) 2024-01-26 18:20:44 +00:00
Tom Proctor
af27ab3524 Add version pinning to plugin catalog (#24960)
Adds the ability to pin a version for a specific plugin type + name to enable an easier plugin upgrade UX. After pinning and reloading, that version should be the only version in use.

No HTTP API implementation yet for managing pins, so no user-facing effects yet.
2024-01-26 17:21:43 +00:00
Rachel Culpepper
ec404c0d30 add changes for EST tests (#25089) 2024-01-26 08:22:53 -06:00
Kit Haines
ab8887c875 Migration of OtherSANs Parsing Call to SDK helper from pki-issuer (#24946)
* Migration of OtherSANs Parsing Call to SDK helper from pki-issuer

* Based on PR feedback from Steve, remove internal variable, reference certutil directly.
2024-01-19 09:21:51 -05:00
Kit Haines
fb71d7f3c8 make-fmt (#24940) 2024-01-18 20:00:00 +00:00
Steven Clark
6f5a7a9e8c Add WriteRaw to client api and new PKI test helper (#24818)
- This is to support the EST test cases within Vault Enterprise
2024-01-11 13:51:42 -05:00
Tom Proctor
6e537bb376 Support reloading database plugins across multiple mounts (#24512)
* Support reloading database plugins across multiple mounts
* Add clarifying comment to MountEntry.Path field
* Tests: Replace non-parallelisable t.Setenv with plugin env settings
2024-01-08 12:21:13 +00:00
Steven Clark
ade75bcf00 Update licensing across various source files (#24672) 2024-01-04 12:59:46 -05:00
Violet Hynes
75d0581464 VAULT-8790 Ensure time.NewTicker never gets called with a negative value (#24402)
* Ensure time.NewTicker never gets called with a negative value

* Remove naughty newline

* VAULT-8790 review feedback
2024-01-03 15:34:41 -05:00
Austin Gebauer
43c282f15a tools: upgrades gofumpt to v0.5.0 (#24637) 2023-12-22 14:36:44 -08:00
Tom Proctor
dc5c3e8d97 New database plugin API to reload by plugin name (#24472) 2023-12-13 10:23:34 +00:00
Steven Clark
8963ae495d PKI: Refactor storage of certificates into a common method (#24415)
- Move the copy/pasted code to store certificates into a
   common method within the PKI plugin
2023-12-07 11:51:51 -05:00
Steven Clark
cbf6dc2c4f PKI refactoring to start breaking apart monolith into sub-packages (#24406)
* PKI refactoring to start breaking apart monolith into sub-packages

 - This was broken down by commit within enterprise for ease of review
   but would be too difficult to bring back individual commits back
   to the CE repository. (they would be squashed anyways)
 - This change was created by exporting a patch of the enterprise PR
   and applying it to CE repository

* Fix TestBackend_OID_SANs to not be rely on map ordering
2023-12-07 09:22:53 -05:00
Tom Proctor
a4180c193b Refactor plugin catalog and plugin runtime catalog into their own package (#24403)
* Refactor plugin catalog into its own package
* Fix some unnecessarily slow tests due to accidentally running multiple plugin processes
* Clean up MakeTestPluginDir helper
* Move getBackendVersion tests to plugin catalog package
* Use corehelpers.MakeTestPlugin consistently
* Fix semgrep failure: check for nil value from logical.Storage
2023-12-07 12:36:17 +00:00
Steven Clark
a41852379b Document and augment tests that PKI accepts 8192 bit RSA keys (#24364)
- Noticed that our documentation was out of date, we allow 8192
   bit RSA keys to be used as an argument to the various PKI
   issuer/key creation APIs.
 - Augument some unit tests to verify this continues to work
2023-12-05 15:26:03 -05:00
Steven Clark
5781891292 PKI: Address some errors that were not wrapped properly (#24118) 2023-11-27 15:50:54 -05:00
Steven Clark
53040690a2 PKI: Do not set NextUpdate OCSP field when ocsp_expiry is 0 (#24192)
* Do not set NextUpdate OCSP field when ocsp_expiry is 0

* Add cl
2023-11-20 10:32:05 -05:00
Robert Hanzlík
28e3507680 allow to skip TLS check in acme http-01 challenge (#22521)
* allow to skip TLS check in acme http-01 challenge

* remove configurable logic, just ignore TLS

* add changelog

* Add test case

---------

Co-authored-by: Steve Clark <steven.clark@hashicorp.com>
2023-11-15 11:10:29 -05:00
Steven Clark
92682f33ce Address a panic when exporting RSA public keys in transit (#24054)
* Address a panic export RSA public keys in transit

 - When attempting to export the public key for an RSA key that
   we only have a private key for, the export panics with a nil
   deference.
 - Add additional tests around Transit key exporting

* Add cl
2023-11-14 09:40:37 -05:00
Robert
54bf0807c1 secrets/aws: add support for STS Session Tokens with TOTP (#23690)
* Add test coverage

* Add session_token field, deprecate security_token

* Undo auth docs

* Update api docs

* Add MFA code support

---------

Co-authored-by: Graham Christensen <graham@grahamc.com>
Co-authored-by: Sarah Chavis <62406755+schavis@users.noreply.github.com>
Co-authored-by: Austin Gebauer <34121980+austingebauer@users.noreply.github.com>
2023-11-08 17:06:28 -06:00
Steven Clark
a10685c521 Pin curl docker image to a specific docker version instead of latest (#23763)
- Try to avoid these build failures as our proxy does seem to have
   issues around pulling images with the 'latest' tag at times.

```
acme_test.go:206:
	Error Trace:	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/builtin/logical/pkiext/pkiext_binary/acme_test.go:206
          	        /home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/builtin/logical/pkiext/pkiext_binary/acme_test.go:75
	Error:      	Received unexpected error:
				container create failed: Error response from daemon: No such image: docker.mirror.hashicorp.services/curlimages/curl:latest
	Test:       	Test_ACME/group/caddy_http_eab
	Messages:   	could not start cURL container
```
2023-10-24 10:04:23 -04:00
Steven Clark
3623dfc227 Add support for plugins to specify binary request paths (#23729)
* wip

* more pruning

* Integrate OCSP into binary paths PoC

 - Simplify some of the changes to the router
 - Remove the binary test PKI endpoint
 - Switch OCSP to use the new binary paths backend variable

* Fix proto generation and test compilation

* Add unit test for binary request handling

---------

Co-authored-by: Scott G. Miller <smiller@hashicorp.com>
2023-10-23 17:04:42 -04:00
davidadeleon
72d66e2813 Fix consul token revocation with namespace and admin partition specific policies (#23010)
* fix lease revocation when config token exists in one namespace but can create tokens in another

* add test

* Add similar check for admin partition

* Add admin partition test

---------

Co-authored-by: robmonte <17119716+robmonte@users.noreply.github.com>
2023-10-20 13:06:20 -05:00
modrake
eca4b4d801 Relplat 897 copywrite fixes for mutliple licenses (#23722) 2023-10-20 08:40:43 -07:00
Steven Clark
d0501db90f Forbid setting auto_rotate_period on transit managed keys (#23723)
* Forbid setting auto_rotate_period on transit managed keys

 - Prevent and guard against auto-rotating managed keys as we
   generate an invalid key version without the uuid field set.
 - Hook in the datakey generation api into managed key encryption.

* Add cl
2023-10-19 15:29:01 -04:00
kpcraig
6aabb22b7c AWS Static Secrets: Requeue credential for rotation if initial attempt fails (#23673) 2023-10-17 14:12:33 -04:00
Steven Clark
bc4be73a1c Fix Transit managed key fixes - OSS (#23676)
- This is the OSS parts of the greater enterprise PR to address some
   issues with signing and encryption within Transit using managed keys.
2023-10-16 15:52:59 -04:00
kpcraig
30f19b383f VAULT-18307: update rotation period for aws static roles on update (#23528) 2023-10-11 17:06:58 +00:00
Steven Clark
b0fef53184 Do not attempt to shutdown ACME thread on non-active nodes (#23293) 2023-09-26 16:32:52 -04:00
Steven Clark
dbfaa6f81a Stop processing ACME verifications when active node is stepped down (#23278)
- Do not load existing ACME challenges persisted within storage on non-active nodes. This was the main culprit of the issues, secondary nodes would load existing persisted challenges trying to resolve them but writes would fail leading to the excessive logging.
    - We now handle this by not starting the ACME background thread on non-active nodes, while also checking within the scheduling loop and breaking out. That will force a re-reading of the Closing channel that should have been called by the PKI plugin's Cleanup method.

- If a node is stepped down from being the active node while it is actively processing a verification, we could get into an infinite loop due to an ErrReadOnly error attempting to clean up a challenge entry

- Add a maximum number of retries for errors around attempting to decode,fetch challenge/authorization entries from disk. We use double the number of "normal" max attempts for these types of errors, than we would for normal ACME retry attempts to avoid collision issues. Note that these additional retry attempts are not persisted to disk and will restart on every node start

- Add a 1 second backoff to any disk related error to not immediately spin on disk/io errors for challenges.
2023-09-26 13:59:13 -04:00
vinay-gopalan
8924f9592d Remove SA Credentials from DB Connection Details on Read (#23256) 2023-09-22 10:49:46 -07:00
John-Michael Faircloth
9569b16114 secrets/db: add rotation error path test (#23182)
* secrets/db: add rotation error path test

We add a test to verify that failed rotations can successfully recover
and that they do not occur outside of a rotation window. Additionally,
we remove registering some external plugins in getCluster() that shaves
off about 5 minutes the database package tests.

* remove dead code and add test comment

* revert to original container helper after refactor
2023-09-20 14:07:17 -05:00
John-Michael Faircloth
1e76ad42ef secrets/db: add tests for static role config updates (#23153) 2023-09-19 10:12:09 -05:00
Steven Clark
293e8b8ac5 Fix enterprise failure of TestCRLIssuerRemoval (#23038)
This fixes the enterprise failure of the test
 ```
  === FAIL: builtin/logical/pki TestCRLIssuerRemoval (0.00s)
     crl_test.go:1456:
         	Error Trace:	/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/builtin/logical/pki/crl_test.go:1456
         	Error:      	Received unexpected error:
         	            	Global, cross-cluster revocation queue cannot be enabled when auto rebuilding is disabled as the local cluster may not have the certificate entry!
         	Test:       	TestCRLIssuerRemoval
         	Messages:   	failed enabling unified CRLs on enterprise

 ```
2023-09-13 08:11:52 -04:00