* Pulls in github.com/go-secure-stdlib/plugincontainer@v0.3.0 which exposes a new `Config.Rootless` option to opt in to extra container configuration options that allow establishing communication with a non-root plugin within a rootless container runtime.
* Adds a new "rootless" option for plugin runtimes, so Vault needs to be explicitly told whether the container runtime on the machine is rootless or not. It defaults to false as rootless installs are not the default.
* Updates `run_config.go` to use the new option when the plugin runtime is rootless.
* Adds new `-rootless` flag to `vault plugin runtime register`, and `rootless` API option to the register API.
* Adds rootless Docker installation to CI to support tests for the new functionality.
* Minor test refactor to minimise the number of test Vault cores that need to be made for the external plugin container tests.
* Documentation for the new rootless configuration and the new (reduced) set of restrictions for plugin containers.
* As well as adding rootless support, we've decided to drop explicit support for podman for now, but there's no barrier other than support burden to adding it back again in future so it will depend on demand.
Add support for testing Vault Enterprise with HA seal support by adding
a new `seal_ha` scenario that configures more than one seal type for a
Vault cluster. We also extend existing scenarios to support testing
with or without the Seal HA code path enabled.
* Extract starting vault into a separate enos module to allow for better
handling of complex clusters that need to be started more than once.
* Extract seal key creation into a separate module and provide it to
target modules. This allows us to create more than one seal key and
associate it with instances. This also allows us to forego creating
keys when using shamir seals.
* [QT-615] Add support for configuring more that one seal type to
`vault_cluster` module.
* [QT-616] Add `seal_ha` scenario
* [QT-625] Add `seal_ha_beta` variant to existing scenarios to test with
both code paths.
* Unpin action-setup-terraform
* Add `kms:TagResource` to service user IAM profile
Signed-off-by: Ryan Cragun <me@ryan.ec>
* VAULT-20487 update build failure slack output
* VAULT-20487 add new needs
* VAULT-20487 make it run on my branch
* VAULT-20487 make it run
* VAULT-20487 finalize?
Fix missing log files: we need to use an absolute path, since go test chdirs into the test package dir before running tests. Move the cleanup-on-success behaviour from NewTestCluster into NewTestLogger so it applies more broadly.
* Stop running fips tests on PRs: we expect fips-specific failures to be rare enough that it's not worth the cost.
* Allow PRs with the label "fips" to run fips tests.
Terraform 1.6.x seems to have some incompatiblity with the current
version fo enos and its usage of tfjson. Pin to 1.5.x until it has been
resolved.
```
│ Error: json: cannot unmarshal array into Go struct field rawState.checks of type tfjson.CheckResultStatic
│
```
Signed-off-by: Ryan Cragun <me@ryan.ec>
Sometimes destroying resources in AWS will fail because of unexpected
dependency violations or other such nonsense. When this happens the
behavior of Vault that we wanted to verify has already been successfully
accomplished, however the required workflow will fail. This change
allows us to succeed if `enos scenario launch` completes but allows
`enos scenario destroy` to fail. We still notify our slack channel on
destroy failures so that we can investigate issues, however it won't
require a PR author to retry.
* Execute `enos scenario launch` instead of `enos scenario run` to allow
for very occasional issues when tearing down test infrastructure.
* Improve an error message when getting secondary cluster IP addresses.
* Don't race to get secondary cluster IP addresses.
* Add secondary token to replication scenario outputs.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* Remove old initial versions from the upgrade scenario as they're
unreliable.
* Ensure that shellcheck is available on runners for linting job.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Update our `proxy` and `agent` scenarios to support new variants and
perform baseline verification and their scenario specific verification.
We integrate these updated scenarios into the pipeline by adding them
to artifact samples.
We've also improved the reliability of the `autopilot` and `replication`
scenarios by refactoring our IP address gathering. Previously, we'd ask
vault for the primary IP address and use some Terraform logic to determine
followers. The leader IP address gathering script was also implicitly
responsible for ensuring that a found leader was within a given group of
hosts, and thus waiting for a given cluster to have a leader, and also for
doing some arithmetic and outputting `replication` specific output data.
We've broken these responsibilities into individual modules, improved their
error messages, and fixed various races and bugs, including:
* Fix a race between creating the file audit device and installing and starting
vault in the `replication` scenario.
* Fix how we determine our leader and follower IP addresses. We now query
vault instead of a prior implementation that inferred the followers and sometimes
did not allow all nodes to be an expected leader.
* Fix a bug where we'd always always fail on the first wrong condition
in the `vault_verify_performance_replication` module.
We also performed some maintenance tasks on Enos scenarios byupdating our
references from `oss` to `ce` to handle the naming and license changes. We
also enabled `shellcheck` linting for enos module scripts.
* Rename `oss` to `ce` for license and naming changes.
* Convert template enos scripts to scripts that take environment
variables.
* Add `shellcheck` linting for enos module scripts.
* Add additional `backend` and `seal` support to `proxy` and `agent`
scenarios.
* Update scenarios to include all baseline verification.
* Add `proxy` and `agent` scenarios to artifact samples.
* Remove IP address verification from the `vault_get_cluster_ips`
modules and implement a new `vault_wait_for_leader` module.
* Determine follower IP addresses by querying vault in the
`vault_get_cluster_ips` module.
* Move replication specific behavior out of the `vault_get_cluster_ips`
module and into it's own `replication_data` module.
* Extend initial version support for the `upgrade` and `autopilot`
scenarios.
We also discovered an issue with undo_logs that has been described in
the VAULT-20259. As such, we've disabled the undo_logs check until
it has been fixed.
Signed-off-by: Ryan Cragun <me@ryan.ec>
This adds edition handling to the test-run-enos-scenario-matrix
workflow. Previously we'd pass the version and edition from the caller,
but that isn't an option in the release testing workflow, which only
passes the metadata version without the edition.
Signed-off-by: Ryan Cragun <me@ryan.ec>
The CRT orchestrator triggers the release testing workflows for all
release versions using the same main ref. Therefore, if we have
concurrency controls in place we could cancel them if more than one
release branch is executing workflows.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Replace our prior implementation of Enos test groups with the new Enos
sampling feature. With this feature we're able to describe which
scenarios and variant combinations are valid for a given artifact and
allow enos to create a valid sample field (a matrix of all compatible
scenarios) and take an observation (select some to run) for us. This
ensures that every valid scenario and variant combination will
now be a candidate for testing in the pipeline. See QT-504[0] for further
details on the Enos sampling capabilities.
Our prior implementation only tested the amd64 and arm64 zip artifacts,
as well as the Docker container. We now include the following new artifacts
in the test matrix:
* CE Amd64 Debian package
* CE Amd64 RPM package
* CE Arm64 Debian package
* CE Arm64 RPM package
Each artifact includes a sample definition for both pre-merge/post-merge
(build) and release testing.
Changes:
* Remove the hand crafted `enos-run-matrices` ci matrix targets and replace
them with per-artifact samples.
* Use enos sampling to generate different sample groups on all pull
requests.
* Update the enos scenario matrices to handle HSM and FIPS packages.
* Simplify enos scenarios by using shared globals instead of
cargo-culted locals.
Note: This will require coordination with vault-enterprise to ensure a
smooth migration to the new system. Integrating new scenarios or
modifying existing scenarios/variants should be much smoother after this
initial migration.
[0] https://github.com/hashicorp/enos/pull/102
Signed-off-by: Ryan Cragun <me@ryan.ec>
We can't use `sudo` on our self-hosted runners at the moment to do
the install and Docker reload.
So, we'll disable this for now, which should automatically cause
the gVisor-related tests to be skipped.
* Also makes plugin directory optional when registering container plugins
* And threads plugin runtime settings through to plugin execution config
* Add runsc to github runner for plugin container tests
* adding new version bump refactoring
* address comments
* remove changes used for testing
* add the version bump event!
* fix local enos scenarios
* remove unnecessary local get_local_metadata steps from scenarios
* add version base, pre, and meta to the get_local_metadata module
* use the get_local_metadata module in the local builder for version
metadata
* update the version verifier to always require a build date
Signed-off-by: Ryan Cragun <me@ryan.ec>
* Update to embed the base version from the VERSION file directly into version.go.
This ensures that any go tests can use the same (valid) version as CI and so can local builds and local enos runs.
We still want to be able to set a default metadata value in version_base.go as this is not something that we set in the VERSION file - we pass this in as an ldflag in CI (matters more for ENT but we want to keep these files in sync across repos).
* update comment
* fixing bad merge
* removing actions-go-build as it won't work with the latest go caching changes
* fix logic for getting version in enos-lint.yml
* fix version number
* removing unneeded module
---------
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Claire <claire@hashicorp.com>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* Attempt to new-line/emojify test output
* Update emoji
* Make it always run, for testing
* Put the emojis first
* Add a space
* OSS -> CE
* Update enterprise tests also
* Test failure
* Test failures but better
* Print it even if not main :)
* Fix the comparison
* Finalize changes
Includes everything after the 3rd position as the PLUGIN_SERVICE, so
that plugins like "vault-plugin-database-redis-elasticache" end up
with the full name in the changelog entry.
* Remove diff-oss-ci
* Eliminate another inconsistency
* Fix logic: we want to only apply the fork check on the CE repo. On ent we want to always run the job.
---------
Co-authored-by: hc-github-team-secure-vault-core <github-team-secure-vault-core@hashicorp.com>
* adding testonly CI test job
* small instance for testonly tests
* feedback
* shopt
* disable glob expansion
* revert back to a large instance
* fix a mistake