* VAULT-33693: actions: fix push event PR labels
Fix pull request label metadata when triggered `push` event types.
We now use Github's `associatedPullRequests()` connection on the
`Commit` associated with the SHA to resolve the labels.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Verify vault secret integrity in unauthenticated I/O streams (audit log, STDOUT/STDERR via the systemd journal) by scanning the text with Vault Radar. We search for both known and unknown secrets by using an index of KVV2 values and also by radar's built-in heuristics for credentials, secrets, and keys.
The verification has been added to many scenarios where a slight time increase is allowed, as we now have to install Vault Radar and scan the text. In practice this adds less than 10 seconds to the overall duration of a scenario.
In the in-place upgrade scenario we explicitly exclude this verification when upgrading from a version that we know will fail the check. We also make the verification opt-in so as to not require a Vault Radar license to run Enos scenarios, though it will always be enabled in CI.
As part of this we also update our enos workflow to utilize secret values from our self-hosted Vault when executing in the vault-enterprise repo context.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Cache scopes allow other branches to inherit default branch scopes,
which means that release branches can restore a key from main. Instead,
we now include the vault version as part of the cache key to ensure
we don't include versions that are incompatible with our version.
Signed-off-by: Ryan Cragun <me@ryan.ec>
As the Vault pipeline and release processes evolve over time, so too must the tooling that drives them. Historically we've utilized a combination of CI features and shell scripts that are wrapped into make targets to drive our CI. While this
approach has worked, it requires careful consideration of what features to use (bash in CI almost never matches bash in developer machines, etc.) and often requires a deep understanding of several CLI tools (jq, etc). `make` itself also has limitations in user experience, e.g. passing flags.
As we're all in on Github Actions as our pipeline coordinator, continuing to utilize and build CLI tools to perform our pipeline tasks makes sense. This PR adds a new CLI tool called `pipeline` which we can use to build new isolated tasks that we can string together in Github Actions. We intend to use this utility as the interface for future release automation work, see VAULT-27514.
For the first task in this new `pipeline` tool, I've chosen to build two small sub-commands:
* `pipeline releases list-versions` - Allows us to list Vault versions between a range. The range is configurable either by setting `--upper` and/or `--lower` bounds, or by using the `--nminus` to set the N-X to go back from the current branches version. As CE and ENT do not have version parity we also consider the `--edition`, as well as none-to-many `--skip` flags to exclude specific versions.
* `pipeline generate enos-dynamic-config` - Which creates dynamic enos configuration based on the branch and the current list of release versions. It takes largely the same flags as the `release list-versions` command, however it also expects a `--dir` for the enos directory and a `--file` where the dynamic configuration will be written. This allows us to dynamically update and feed the latest versions into our sampling algorithm to get coverage over all supported prior versions.
We then integrate these new tools into the pipeline itself and cache the dynamic config on a weekly basis. We also cache the pipeline tool itself as it will likely become a repository for pipeline specific tooling. The caching strategy for the `pipeline` tool itself will make most workflows that require it super fast.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* VAULT-31402: Add verification for all container images
Add verification for all container images that are generated as part of
the build. Before this change we only ever tested a limited subset of
"default" containers based on Alpine Linux that we publish via the
Docker hub and AWS ECR.
Now we support testing all Alpine and UBI based container images. We
also verify the repository and tag information embedded in each by
deploying them and verifying the repo and tag metadata match our
expectations.
This does change the k8s scenario interface quite a bit. We now take in
an archive image and set image/repo/tag information based on the
scenario variants.
To enable this I also needed to add `tar` to the UBI base image. It was
already available in the Alpine image and is used to copy utilities to
the image when deploying and configuring the cluster via Enos.
Since some images contain multiple tags we also add samples for each
image and randomly select which variant to test on a given PR.
Signed-off-by: Ryan Cragun <me@ryan.ec>
- If we encounter a deadlock/long running test it is better to have go
test timeout. As we've noticed if we hit the GitHub step timeout, we
lose all information about what was running at the time of the timeout
making things harder to diagnose.
- Having the timeout through go test itself on a long running test it
outputs what test was running along with a full panic output within
the logs which is quite useful to diagnose
In order for our enterprise nightlies to run the same test-go job but
across a matrix of different base references we need to consider the
checkout ref in our failure and summary uploads in order to prevent
an upload race.
We also configure Git with our token before setting up Go so that
enterprise CI workflows can execute without downloading a module cache.
Signed-off-by: Ryan Cragun <me@ryan.ec>
It appears that with the latest runner image[0] that we occasionally see
a flaky test with an error related to our fontconfig cache:
```
Error: Browser timeout exceeded: 10s
Error while executing test: Acceptance | wrapped_token query param functionality: it authenticates when used with the with=token query param
Stderr:
Fontconfig error: No writable cache directories
[0822/180212.113587:WARNING:sandbox_linux.cc(430)] InitializeSandbox() called with multiple threads in process gpu-process.
```
This change rebuilds the fontconfig cache on Github hosted runners.
Hopefully we can remove this at some point when a new runner image is
released.
[0] https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20240818.1
Signed-off-by: Ryan Cragun <me@ryan.ec>
* Add LTS explanation and clarify other label explanations
* Link to doc containing LTS calendar
* Change order for simpler cognitive load
* A bit simpler based on feedback
Optimize the cost of the Security `scan` workflow by utilizing a
different runner. Previously this workflow would use the
`custom-linux-xl` in `vault` vs. the `c6a.4xlarge` on-demand runner in
`vault-enterprise. This resulted in the `vault` workflow costing an
order of magnitude more each month.
I tested with the following instances sizes to compare cost to execution
time:
| Runnner | Estimated Time | Cost Factor | Cost Score |
|---------|-----------------|-------------|-------------|
|ubuntu-latest|19m|1|19|
|custom-linux-small|21.5m|2|43|
|custom-linux-medium|11.5m|4|46|
|custom-linux-xl|8.5m|16|136|
Currently the `CI` and `build` require workflows take anywhere from
16-20 minutes on `vault`. Our goal is to not exceed that.
At this time we're going to try out `ubuntu-latest` as it gives us ~85%
savings and by far the best bang for our buck. If it ends up being a
burden we can switch to `custom-linux-medium` for ~66% cost savings but
still a reasonable runtime.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* VAULT-29583: Modernize default distributions in enos scenarios
Our scenarios have been running the last gen of distributions in CI.
This updates our default distributions as follows:
- Amazon: 2023
- Leap: 15.6
- RHEL: 8.10, 9.4
- SLES: 15.6
- Ubuntu: 20.04, 24.04
With these changes we also unlock a few new variants combinations:
- `distro:amzn seal:pkcs11`
- `arch:arm64 distro:leap`
We also normalize our distro key for Amazon Linux to `amzn`, which
matches the uname output on both versions that we've supported.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* Pin protoc-gen-go-grpc to 1.4.0
They introduced a replace statement within the go.mod file which
causes failures running go install protoc-gen-go-grpc@latest
Workaround for now is to pin to the previous version
See https://github.com/grpc/grpc-go/issues/7448
* Add missing v to version v1.4.0 instead of 1.4.0 within tools/tools.sh