Commit Graph

668 Commits

Author SHA1 Message Date
Josh Black
fa13dbd381 add gosimport to make fmt and run it (#25383)
* add gosimport to make fmt and run it

* move installation to tools.sh

* correct weird spacing issue

* Update Makefile

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>

* fix a weird issue

---------

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
2024-02-13 14:07:02 -08:00
Steven Clark
3cd74cef9c Forward EST .well-known requests on performance replicas (#25304)
- CE fix for properly forwarding the EST .well-known requests to
   performance replicas internally instead of redirecting through
   http headers
2024-02-08 16:33:53 -05:00
Mike Palmiotto
e4a11ae7cd Request Limiter Reload tests (#25126)
This PR introduces a new testonly endpoint for introspecting the
RequestLimiter state. It makes use of the endpoint to verify that changes to
the request_limiter config are honored across reload.

In the future, we may choose to make the sys/internal/request-limiter/status
endpoint available in normal binaries, but this is an expedient way to expose
the status for testing without having to rush the design.

In order to re-use as much of the existing command package utility funcionality
as possible without introducing sprawling code changes, I introduced a new
server_util.go and exported some fields via accessors.

The tests shook out a couple of bugs (including a deadlock and lack of
locking around the core limiterRegistry state).
2024-02-01 09:11:08 -05:00
Tom Proctor
6e111d92fe Support setting plugin TMPDIR in config as well as env (#24978) 2024-01-30 13:10:23 +00:00
Mike Palmiotto
12f69a8ce5 Request Limiter listener config opt-out (#25098)
This commit introduces a new listener config option to allow disabling the request limiter per-listener.
2024-01-26 15:24:32 -05:00
Mike Palmiotto
43be9fc18a Request Limiter (#25093)
This commit introduces two new adaptive concurrency limiters in Vault,
which should handle overloading of the server during periods of
untenable request rate. The limiter adjusts the number of allowable
in-flight requests based on latency measurements performed across the
request duration. This approach allows us to reject entire requests
prior to doing any work and prevents clients from exceeding server
capacity.

The limiters intentionally target two separate vectors that have been
proven to lead to server over-utilization.

- Back pressure from the storage backend, resulting in bufferbloat in
  the WAL system. (enterprise)
- Back pressure from CPU over-utilization via PKI issue requests
  (specifically for RSA keys), resulting in failed heartbeats.

Storage constraints can be accounted for by limiting logical requests
according to their http.Method. We only limit requests with write-based
methods, since these will result in storage Puts and exhibit the
aforementioned bufferbloat.

CPU constraints are accounted for using the same underlying library and
technique; however, they require special treatment. The maximum number
of concurrent pki/issue requests found in testing (again, specifically
for RSA keys) is far lower than the minimum tolerable write request
rate. Without separate limiting, we would artificially impose limits on
tolerable request rates for non-PKI requests. To specifically target PKI
issue requests, we add a new PathsSpecial field, called limited,
allowing backends to specify a list of paths which should get
special-case request limiting.

For the sake of code cleanliness and future extensibility, we introduce
the concept of a LimiterRegistry. The registry proposed in this PR has
two entries, corresponding with the two vectors above. Each Limiter
entry has its own corresponding maximum and minimum concurrency,
allowing them to react to latency deviation independently and handle
high volumes of requests to targeted bottlenecks (CPU and storage).

In both cases, utilization will be effectively throttled before Vault
reaches any degraded state. The resulting 503 - Service Unavailable is a
retryable HTTP response code, which can be handled to gracefully retry
and eventually succeed. Clients should handle this by retrying with
jitter and exponential backoff. This is done within Vault's API, using
the go-retryablehttp library.

Limiter testing was performed via benchmarks of mixed workloads and
across a deployment of agent pods with great success.
2024-01-26 14:26:21 -05:00
Scott Miller
9bb4f9e996 Re-process .well-known redirects with a recursive handler call rather than a 302 redirect (#24890)
* Re-process .well-known redirects with a recursive handler call rather than a 302 redirect

* Track when the RequestURI mismatches path (in a redirect) and add it to the audit log

* call cancelFunc
2024-01-19 09:59:58 -06:00
Peter Wilson
ebf627ceed VAULT-23050: Remove undocumented feature flag to disable audit eventlogger (#24764)
* Work towards removing the feature flag that disabled eventlogger for audit events

* Removed audited headers from LogRequest and LogResponse and clean up

* make clear we don't use a method param, and comment tweak

* Moved BenchmarkAuditFile_request to audit_broker_test and renamed. Clean up

* fixed calls from tests to Factory's

* waffling godoc for a ported and tweaked test

* Remove duplicate code from previous merges, remove uneeded code

* Refactor file audit backend tests

---------

Co-authored-by: Kuba Wieczorek <kuba.wieczorek@hashicorp.com>
2024-01-11 11:30:36 +00:00
Tom Proctor
6e537bb376 Support reloading database plugins across multiple mounts (#24512)
* Support reloading database plugins across multiple mounts
* Add clarifying comment to MountEntry.Path field
* Tests: Replace non-parallelisable t.Setenv with plugin env settings
2024-01-08 12:21:13 +00:00
Kuba Wieczorek
17ffe62d0d [VAULT-22481] Add audit filtering feature (#24558)
* VAULT-22481: Audit filter node (#24465)

* Initial commit on adding filter nodes for audit

* tests for audit filter

* test: longer filter - more conditions

* copywrite headers

* Check interface for the right type

* Add audit filtering feature (#24554)

* Support filter nodes in backend factories and add some tests

* More tests and cleanup

* Attempt to move control of registration for nodes and pipelines to the audit broker (#24505)

* invert control of the pipelines/nodes to the audit broker vs. within each backend

* update noop audit test code to implement the pipeliner interface

* noop mount path has trailing slash

* attempting to make NoopAudit more friendly

* NoopAudit uses known salt

* Refactor audit.ProcessManual to support filter nodes

* HasFiltering

* rename the pipeliner

* use exported AuditEvent in Filter

* Add tests for registering and deregistering backends on the audit broker

* Add missing licence header to one file, fix a typo in two tests

---------

Co-authored-by: Peter Wilson <peter.wilson@hashicorp.com>

* Add changelog file

* update bexpr datum to use a strong type

* go docs updates

* test path

* PR review comments

* handle scenarios/outcomes from broker.send

* don't need to re-check the complete sinks

* add extra check to deregister to ensure that re-registering non-filtered device sets sink threshold

* Ensure that the multierror is appended before attempting to return it

---------

Co-authored-by: Peter Wilson <peter.wilson@hashicorp.com>
2023-12-18 18:01:49 +00:00
Peter Wilson
24c6e82a84 Remove old audit behavior from test code (#24540)
* Export audit event

* Move older tests away from audit behavior that didn't use eventlogger

* spelling--;

* no more struct initialization of NoopAudit outside of NewNoopAudit

* locking since we're accessing the shared backend
2023-12-15 09:26:34 +00:00
Tom Proctor
a4180c193b Refactor plugin catalog and plugin runtime catalog into their own package (#24403)
* Refactor plugin catalog into its own package
* Fix some unnecessarily slow tests due to accidentally running multiple plugin processes
* Clean up MakeTestPluginDir helper
* Move getBackendVersion tests to plugin catalog package
* Use corehelpers.MakeTestPlugin consistently
* Fix semgrep failure: check for nil value from logical.Storage
2023-12-07 12:36:17 +00:00
miagilepner
959d548ac6 Add PATCH to CORS allowed request methods (#24373)
* add PATCH to cors request methods

* changelog
2023-12-07 11:27:35 +01:00
divyaac
6e020e38e0 Add_Chroot_Namespace_In_Response (#24355) 2023-12-04 14:51:44 -08:00
Hamid Ghaf
aeb817dfba Buffer body read up to MaxRequestSize (#24354) 2023-12-04 13:22:22 -08:00
Nick Cabatoff
b8f531142b Use our heartbeat echo RPCs to estimate clock skew, expose it in status APIs (#24343) 2023-12-04 12:04:38 -05:00
Nick Cabatoff
85b3dba310 Rework sys/health tests to use structs and cmp (#24324) 2023-12-04 08:34:25 -05:00
Kuba Wieczorek
8f064b90ec [VAULT-22270] API: add enterprise field to the response from /sys/health/ endpoint (#24270) 2023-11-28 14:22:33 +00:00
Scott Miller
7a8ced4d36 Implement RFC 5785 (.well-known) Redirects (#23973)
* Re-implementation of API redirects with more deterministic matching

* add missing file

* Handle query params properly

* licensing

* Add single src deregister

* Implement specifically RFC 5785 (.well-known) redirects.

Also implement a unit test for HA setups, making sure the standby node redirects to the active (as usual), and that then the active redirects the .well-known request to a backend, and that that is subsequently satisfied.

* Remove test code

* Rename well known redirect logic

* comments/cleanup

* PR feedback

* Remove wip typo

* Update http/handler.go

Co-authored-by: Steven Clark <steven.clark@hashicorp.com>

* Fix registrations with trailing slashes

---------

Co-authored-by: Steven Clark <steven.clark@hashicorp.com>
2023-11-15 15:21:52 -06:00
divyaac
3e94f2fcb5 Added OSS changes (#23951) 2023-11-01 23:12:51 +00:00
Marc Boudreau
6af8bc7ce0 replace nytimes/gziphandler with klauspost/compress/gzhttp (#23898) 2023-10-31 12:38:07 -04:00
Jason O'Donnell
29d8929824 api/seal-status: fix deadlock when namespace is set on seal-status calls (#23861)
* api/seal-status: fix deadlock when namespace is set on seal-status calls

* changelog
2023-10-27 09:59:50 -04:00
Steven Clark
3623dfc227 Add support for plugins to specify binary request paths (#23729)
* wip

* more pruning

* Integrate OCSP into binary paths PoC

 - Simplify some of the changes to the router
 - Remove the binary test PKI endpoint
 - Switch OCSP to use the new binary paths backend variable

* Fix proto generation and test compilation

* Add unit test for binary request handling

---------

Co-authored-by: Scott G. Miller <smiller@hashicorp.com>
2023-10-23 17:04:42 -04:00
Violet Hynes
aeb6d14ebd Update the default kv factory to kv.Factory (#23584)
* Update the default kv mount to kv.Factory

* Imports

* Set some tests that care about leaseapssthroughbackend to use it

* extra newline

* More test updates

* Test updates

* Refactor KV mounting in tests

* Re-add comment
2023-10-23 11:20:22 -04:00
modrake
eca4b4d801 Relplat 897 copywrite fixes for mutliple licenses (#23722) 2023-10-20 08:40:43 -07:00
davidadeleon
8b15e7d216 Revert "Implement user lockout log (#23140)" (#23741)
This reverts commit 92fcfda8ad.
2023-10-20 11:21:18 -04:00
Marc Boudreau
1ebbf449b4 Improve Robustness of Custom Context Values Types (#23697) 2023-10-18 09:30:00 -04:00
Nick Cabatoff
0df48451f4 Remove some no longer used ent init hooks. (#23704) 2023-10-17 17:21:37 -04:00
Marc Boudreau
4e22153987 VAULT-19869: Use Custom Types for Context Keys (#23649)
* create custom type for disable-replication-status-endpoints context key
make use of custom context key type in middleware function

* clean up code to remove various compiler warnings
unnecessary return statement
if condition that is always true
fix use of deprecated ioutil.NopCloser
empty if block

* remove unused unexported function

* clean up code
remove unnecessary nil check around a range expression

* clean up code
removed redundant return statement

* use http.StatusTemporaryRedirect constant instead of literal integer

* create custom type for context key for max_request_size parameter

* create custom type for context key for original request path
2023-10-13 14:04:26 -04:00
Nick Cabatoff
67d743e273 Step 3 of part 3 of removing ent init hooks: call stubs instead of var func hooks. (#23646) 2023-10-13 13:36:15 -04:00
Nick Cabatoff
e232da5ffa Teach stubmaker how to work with methods, not just funcs. (#23634)
Teach stubmaker how to work with methods, not just funcs.  Fix some stubs defined in #23557 which either had the wrong signature or needed to be public.
2023-10-12 14:38:28 -04:00
Marc Boudreau
01cd9d37bb Add Ability to Disable Replication Status Endpoints in Listener Configuration (#23547)
* CI: Pre-emptively delete logs dir after cache restore in test-collect-reports (#23600)

* Fix OktaNumberChallenge (#23565)

* remove arg

* changelog

* exclude changelog in verifying doc/ui PRs (#23601)

* Audit: eventlogger sink node reopen on SIGHUP (#23598)

* ensure nodes are asked to reload audit files on SIGHUP

* added changelog

* Capture errors emitted from all nodes during proccessing of audit pipelines (#23582)

* Update security-scan.yml

* Listeners: Redaction only for TCP (#23592)

* redaction should only work for TCP listeners, also fix bug that allowed custom response headers for unix listeners

* fix failing test

* updates from PR feedback

* fix panic when unlocking unlocked user (#23611)

* VAULT-18307: update rotation period for aws static roles on update (#23528)

* add disable_replication_status_endpoints tcp listener config parameter

* add wrapping handler for disabled replication status endpoints setting

* adapt disable_replication_status_endpoints configuration parsing code to refactored parsing code

* refactor configuration parsing code to facilitate testing

* fix a panic when parsing configuration

* update refactored configuration parsing code

* fix merge corruption

* add changelog file

* document new TCP listener configuration parameter

* make sure disable_replication_status_endpoints only has effect on TCP listeners

* use active voice for explanation of disable_replication_status_endpoints

* fix minor merge issue

---------

Co-authored-by: Kuba Wieczorek <kuba.wieczorek@hashicorp.com>
Co-authored-by: Angel Garbarino <Monkeychip@users.noreply.github.com>
Co-authored-by: Hamid Ghaf <83242695+hghaf099@users.noreply.github.com>
Co-authored-by: Peter Wilson <peter.wilson@hashicorp.com>
Co-authored-by: Mark Collao <106274486+mcollao-hc@users.noreply.github.com>
Co-authored-by: davidadeleon <56207066+davidadeleon@users.noreply.github.com>
Co-authored-by: kpcraig <3031348+kpcraig@users.noreply.github.com>
2023-10-11 14:23:21 -04:00
davidadeleon
92fcfda8ad Implement user lockout log (#23140)
* implement user lockout logger

* formatting

* make user lockout log interval configurable

* create func to get locked user count, and fix potential deadlock

* fix test

* fix test

* add changelog
2023-10-06 15:58:42 -04:00
Nick Cabatoff
2b460a3a21 Define a bunch of stubs that will replace existing init hooks. (#23557) 2023-10-06 15:17:18 -04:00
Peter Wilson
e5432b0577 VAULT-19863: Per-listener redaction settings (#23534)
* add redaction config settings to listener

* sys seal redaction + test modification for default handler properties

* build date should be redacted by 'redact_version' too

* sys-health redaction + test fiddling

* sys-leader redaction

* added changelog

* Lots of places need ListenerConfig

* Renamed options to something more specific for now

* tests for listener config options

* changelog updated

* updates based on PR comments

* updates based on PR comments - removed unrequired test case field

* fixes for docker tests and potentially server dev mode related flags
2023-10-06 17:39:02 +01:00
Violet Hynes
f943c37a83 VAULT-19237 Add mount_type to secret response (#23047)
* VAULT-19237 Add mount_type to secret response

* VAULT-19237 changelog

* VAULT-19237 make MountType generic

* VAULT-19237 clean up comment

* VAULT-19237 update changelog

* VAULT-19237 update test, remove mounttype from wrapped responses

* VAULT-19237 fix a lot of tests

* VAULT-19237 standby test
2023-09-20 09:28:52 -04:00
Christopher Swenson
82e9b610df events: Don't accept websocket connection until subscription is active (#23024)
The WebSocket tests have been very flaky because we weren't able to tell when a WebSocket was fully connected and subscribed to events.

We reworked the websocket subscription code to accept the websocket only after subscribing.

This should eliminate all flakiness in these tests. 🤞 (We can follow-up in an enterprise PR to simplify some of the tests after this fix is merged.)

I ran this locally a bunch of times and with data race detection enabled, and did not see any failures.

Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
2023-09-13 14:28:17 -07:00
Christopher Swenson
e3d00597c6 Safer check in http/events validateSubscribeAccessLoop (#22913)
We grab the state lock and check that the core is not shutting down.

This panic mostly seems to happen if Vault is shutting down, usually
in a test.

Also, we try clean up the go-bexpr test by sending duplicates, and
deduplicating in the receive loop.
2023-09-08 18:26:01 +00:00
Christopher Swenson
b6e8cb3a4c events: Avoid data race by checking for root token (#22916)
before starting access loop checker.

We were seeing rare data races because the `core.IsRoot()` check was
being called after the subscriber access loop checker was starter.
The subscriber access loop calls `core.CheckToken()`, which can modify
the values that `IsRoot()` is reading.
2023-09-08 10:15:01 -07:00
Tom Proctor
55a901d423 Add timeout for http events tests (#22892)
Does not address the test flake in TestBexprFilters, but makes it fail fast(er) when it does flake.
2023-09-08 14:16:01 +01:00
Christopher Swenson
707fab113e Fix auto-squash events experiments (#22876)
When #22835 was merged, it was auto-squashed, so the `experiments`
import was removed, but the test still referenced it.

This removes the (now unnecessary) experiment from the test.
2023-09-07 21:10:02 +00:00
Christopher Swenson
022469da45 events: WebSocket subscriptions support go-bexpr expressions (#22835)
Subscribing to events through a WebSocket now support boolean
expressions to filter only the events wanted based on the fields

* `event_type`
* `operation`
* `source_plugin_mount`
* `data_path`
* `namespace`

Example expressions:

These can be passed to `vault events subscribe`, e.g.,:
* `event_type == abc`
* `source_plugin_mount == secret/`
* `event_type != def and operation != write`

```sh
vault events subscribe -filter='source_plugin_mount == secret/' 'kv*'
```

The docs for the `vault events subscribe` command and API endpoint
will be coming shortly in a different PR, and will include a better
specification for these expressions, similar to (or linking to)
https://developer.hashicorp.com/boundary/docs/concepts/filtering
2023-09-07 20:11:53 +00:00
Christopher Swenson
7f7907d3a0 events: Enable by default, disable flag (#22815)
The flag `events.alpha1` will no longer do anything, but we keep it
to prevent breaking users who have it in their configurations or
startup flags, or if it is referenced in other code.
2023-09-07 18:27:14 +00:00
Christopher Swenson
81f30d26e4 events: remove flaky test (#22808)
The more I looked at this test, the more I realized it wasn't testing
anything except that the namespace parameter was being parsed by the
websocket.

So, I moved that parameter parsing check to `TestEventsSubscribe()`,
which is not flaky, and removed the flaky test altogether.

There is a similar set of tests in the enterprise repo that I will
try to simplify and make less flaky, though.
2023-09-07 11:03:41 -07:00
Christopher Swenson
f0a23e117f events: Continuously verify policies (#22705)
Previously, when a user initiated a websocket subscription,
the access to the `sys/events/subscribe` endpoint was checked then,
and only once.

Now, perform continuous policy checks:

* We check access to the `sys/events/subscribe` endpoint every five
  minutes. If this check fails, then the websocket is terminated.
* Upon receiving any message, we verify that the `subscribe`
  capability is present for that namespace, data path, and event type.
  If it is not, then the message is not delivered. If the message is
  allowed, we cache that result for five minutes.

Tests for this are in a separate enterprise PR.

Documentation will be updated in another PR.

Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
2023-09-05 16:28:09 -07:00
Peter Wilson
3eba73892b Eventbus related refactoring (#22732)
* refactored test to try and see if we can solve flakey test errors

* refactored code for readability

* don't defer in a for loop
2023-09-01 17:00:37 +00:00
Ellie
bd36e66ea6 Add config value that gives users options to skip calculating role for each lease (#22651)
* Add config value that gives users options to skip calculating role for each lease

* add changelog

* change name

* add config for testing

* Update changelog/22651.txt

Co-authored-by: Violet Hynes <violet.hynes@hashicorp.com>

* update tests, docs and reorder logic in conditional

* fix comment

* update comment

* fix comment again

* Update comments and change if order

* change comment again

* add other comment

* fix tests

* add documentation

* edit docs

* Update http/util.go

Co-authored-by: Mike Palmiotto <mike.palmiotto@hashicorp.com>

* Update vault/core.go

* Update vault/core.go

* update var name

* udpate docs

* Update vault/request_handling.go

Co-authored-by: Mike Palmiotto <mike.palmiotto@hashicorp.com>

* 1 more docs change

---------

Co-authored-by: Violet Hynes <violet.hynes@hashicorp.com>
Co-authored-by: Mike Palmiotto <mike.palmiotto@hashicorp.com>
2023-09-01 07:01:41 -05:00
Mike Palmiotto
c4a8b23d93 Only resolve roles for role quotas and leases (#22597) 2023-08-30 10:13:30 -04:00
Christopher Swenson
917a5863fe Allow more time for CI events test (#22589)
CI is sometimes slow, so 100ms was not enough time for all events
to be sent and processed in `http/events_test.go`.

We bumped that timeout to a full 1 second, but also added a trick at
the end to shorten the timeout once the expected number of events
have been receieved. This way, once the test has passed, we only
wait 100ms for any "extra" events to make the test fail, instead
of waiting for the full 1 second before we let the test pass. This
should keep the test relatively fast, while still allowing for it to
be slow sometimes.
2023-08-29 09:32:18 -07:00
Ellie
cccfdb088f reduce calls to DetermineRoleFromLoginRequest from 3 to 1 for aws auth method (#22583)
* reduce calls to DetermineRoleFromLoginRequest from 3 to 1 for aws auth method

* change ordering of LoginCreateToken args

* replace another determineRoleFromLoginRequest function with role from context

* add changelog

* Check for role in context if not there make call to DeteremineRoleFromLoginRequest

* move context role check below nanmespace check

* Update changelog/22583.txt

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>

* revert signature to same order

* make sure resp is last argument

* retrieve role from context closer to where role variable is needed

* remove failsafe for role in mfa login

* Update changelog/22583.txt

---------

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
2023-08-28 16:01:07 -05:00