* VAULT-33074: add `github` sub-command to `pipeline`
Investigating test workflow failures is common task that engineers on the
sustaining rotation perform. This task often requires quite a bit of
manual labor by manually inspecting all failed/cancelled workflows in
the Github UI on per repo/branch/workflow basis and performing root cause
analysis.
As we work to improve our pipeline discoverability this PR adds a new `github`
sub-command to the `pipeline` utility that allows querying for such workflows
and returning either machine readable or human readable summaries in a single
place. Eventually we plan to automate sending a summary of this data to
an OTEL collector automatically but for now sustaining engineers can
utilize it to query for workflows with lots of various criteria.
A common pattern for investigating build/enos test failure workflows would be:
```shell
export GITHUB_TOKEN="YOUR_TOKEN"
go run -race ./tools/pipeline/... github list-workflow-runs -o hashicorp -r vault -d '2025-01-13..2025-01-23' --branch main --status failure build
```
This will list `build` workflow runs in `hashicorp/vault` repo for the
`main` branch with the `status` or `conclusion` of `failure` within the date
range of `2025-01-13..2025-01-23`.
A sustaining engineer will likely do this for both `vault` and
`vault-enterprise` repositories along with `enos-release-testing-oss` and
`enos-release-testing-ent` workflows in addition to `build` in order to
get a full picture of the last weeks failures.
You can also use this utility to summarize workflows based on other
statuses, branches, HEAD SHA's, event triggers, github actors, etc. For
a full list of filter arguments you can pass `-h` to the sub-command.
> [!CAUTION]
> Be careful not to run this without setting strict filter arguments.
> Failing to do so could result in trying to summarize way too many
> workflows resulting in your API token being disabled for an hour.
Signed-off-by: Ryan Cragun <me@ryan.ec>
As the Vault pipeline and release processes evolve over time, so too must the tooling that drives them. Historically we've utilized a combination of CI features and shell scripts that are wrapped into make targets to drive our CI. While this
approach has worked, it requires careful consideration of what features to use (bash in CI almost never matches bash in developer machines, etc.) and often requires a deep understanding of several CLI tools (jq, etc). `make` itself also has limitations in user experience, e.g. passing flags.
As we're all in on Github Actions as our pipeline coordinator, continuing to utilize and build CLI tools to perform our pipeline tasks makes sense. This PR adds a new CLI tool called `pipeline` which we can use to build new isolated tasks that we can string together in Github Actions. We intend to use this utility as the interface for future release automation work, see VAULT-27514.
For the first task in this new `pipeline` tool, I've chosen to build two small sub-commands:
* `pipeline releases list-versions` - Allows us to list Vault versions between a range. The range is configurable either by setting `--upper` and/or `--lower` bounds, or by using the `--nminus` to set the N-X to go back from the current branches version. As CE and ENT do not have version parity we also consider the `--edition`, as well as none-to-many `--skip` flags to exclude specific versions.
* `pipeline generate enos-dynamic-config` - Which creates dynamic enos configuration based on the branch and the current list of release versions. It takes largely the same flags as the `release list-versions` command, however it also expects a `--dir` for the enos directory and a `--file` where the dynamic configuration will be written. This allows us to dynamically update and feed the latest versions into our sampling algorithm to get coverage over all supported prior versions.
We then integrate these new tools into the pipeline itself and cache the dynamic config on a weekly basis. We also cache the pipeline tool itself as it will likely become a repository for pipeline specific tooling. The caching strategy for the `pipeline` tool itself will make most workflows that require it super fast.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* Move command/config + command/token to api/cliconfig + api/tokenhelper
* Remove unused functions and unused import
* Simplify and inline function copied from SDK
* Delete unused duplicated/forwarding config implementation from command package
* Delete unused code, unexport API surface that's only used internally to the package
* Fix up license headers
* Add changelog
* Tweak .gitignore to track hcl files in testdata/ folders
* combine into one checker
* combine and simplify ci checks
* add to test package list
* remove testing test
* only run deprecations check
* only run deprecations check
* remove unneeded repo check
* fix bash options
Add an updated `target_ec2_instances` module that is capable of
dynamically splitting target instances over subnet/az's that are
compatible with the AMI architecture and the associated instance type
for the architecture. Use the `target_ec2_instances` module where
necessary. Ensure that `raft` storage scenarios don't provision
unnecessary infrastructure with a new `target_ec2_shim` module.
After a lot of trial, the state of Ec2 spot instance capacity, their
associated APIs, and current support for different fleet types in AWS
Terraform provider, have proven to make using spot instances for
scenario targets too unreliable.
The current state of each method:
* `target_ec2_fleet`: unusable due to the fact that the `instant` type
does not guarantee fulfillment of either `spot` or `on-demand`
instance request types. The module does support both `on-demand` and
`spot` request types and is capable of bidding across a maximum of
four availability zones, which makes it an attractive choice if the
`instant` type would always fulfill requests. Perhaps a `request` type
with `wait_for_fulfillment` option like `aws_spot_fleet_request` would
make it more viable for future consideration.
* `target_ec2_spot_fleet`: more reliable if bidding for target instances
that have capacity in the chosen zone. Issues in the AWS provider
prevent us from bidding across multiple zones succesfully. Over the
last 2-3 months target capacity for the instance types we'd prefer to
use has dropped dramatically and the price is near-or-at on-demand.
The volatility for nearly no cost savings means we should put this
option on the shelf for now.
* `target_ec2_instances`: the most reliable method we've got. It is now
capable of automatically determing which subnets and availability
zones to provision targets in and has been updated to be usable for
both Vault and Consul targets. By default we use the cheapest medium
instance types that we've found are reliable to test vault.
* Update .gitignore
* enos/modules/create_vpc: create a subnet for every availability zone
* enos/modules/target_ec2_fleet: bid across the maximum of four
availability zones for targets
* enos/modules/target_ec2_spot_fleet: attempt to make the spot fleet bid
across more availability zones for targets
* enos/modules/target_ec2_instances: create module to use
ec2:RunInstances for scenario targets
* enos/modules/target_ec2_shim: create shim module to satisfy the
target module interface
* enos/scenarios: use target_ec2_shim for backend targets on raft
storage scenarios
* enos/modules/az_finder: remove unsed module
Signed-off-by: Ryan Cragun <me@ryan.ec>
* VAULT-15385 Add GHA that checks for nil, nil returns on functions that return an error
* VAULT-15385 add failing function, for sanity
* VAULT-15385 fix makefile
* VAULT-15385 remove test dir
* VAULT-15385 Fix typo
* VAULT-15385 fix job name
* VAULT-15385 Add test to packages
* VAULT-15835 add opt-out
* VAULT-15835 Wrong file for comment
* VAULT-15835 remove failing function
* VAULT-15835 return not nil-nil :)
* VAULT-15835 Restrict to two-result functions
The previous strategy for provisioning infrastructure targets was to use
the cheapest instances that could reliably perform as Vault cluster
nodes. With this change we introduce a new model for target node
infrastructure. We've replaced on-demand instances for a spot
fleet. While the spot price fluctuates based on dynamic pricing,
capacity, region, instance type, and platform, cost savings for our
most common combinations range between 20-70%.
This change only includes spot fleet targets for Vault clusters.
We'll be updating our Consul backend bidding in another PR.
* Create a new `vault_cluster` module that handles installation,
configuration, initializing, and unsealing Vault clusters.
* Create a `target_ec2_instances` module that can provision a group of
instances on-demand.
* Create a `target_ec2_spot_fleet` module that can bid on a fleet of
spot instances.
* Extend every Enos scenario to utilize the spot fleet target acquisition
strategy and the `vault_cluster` module.
* Update our Enos CI modules to handle both the `aws-nuke` permissions
and also the privileges to provision spot fleets.
* Only use us-east-1 and us-west-2 in our scenario matrices as costs are
lower than us-west-1.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* example for checking go doc tests
* add analyzer test and action
* get metadata step
* install revgrep
* fix for ci
* add revgrep to go.mod
* clarify how analysistest works
This was not copied over when the this code was
copied in https://github.com/hashicorp/vault/pull/12340.
Also adds a stub for the `.copywrite.hcl` file (for when
Vault is onboarded to Copywrite) and adds the `pkcs7` and
`ui/node_modules` to the ignore pattern.
* Started work on adding log-file support to Agent
* Allow log file to be picked up and appended
* Use NewLogFile everywhere
* Tried to pull out the config aggregation from Agent.Run
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
Add our initial Enos integration tests to Vault. The Enos scenario
workflow will automatically be run on branches that are created from the
`hashicorp/vault` repository. See the README.md in ./enos a full description
of how to compose and execute scenarios locally.
* Simplify the metadata build workflow jobs
* Automatically determine the Go version from go.mod
* Add formatting check for Enos integration scenarios
* Add Enos smoke and upgrade integration scenarios
* Add Consul backend matrix support
* Add Ubuntu and RHEL distro support
* Add Vault edition support
* Add Vault architecture support
* Add Vault builder support
* Add Vault Shamir and awskms auto-unseal support
* Add Raft storage support
* Add Raft auto-join voter verification
* Add Vault version verification
* Add Vault seal verification
* Add in-place upgrade support for all variants
* Add four scenario variants to CI. These test a maximal distribution of
the aforementioned variants with the `linux/amd64` Vault install
bundle.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Rebecca Willett <rwillett@hashicorp.com>
Co-authored-by: Jaymala <jaymalasinha@gmail.com>
* adding CRT to main branch
* cleanup
* um i dont know how that got removed but heres the fix
* add vault.service
Co-authored-by: Kyle Penfound <kpenfound11@gmail.com>
* copy over the webui
move web_ui to http
remove web ui files, add .gitkeep
updates, messing with gitkeep and ignoring web_ui
update ui scripts
gitkeep
ignore http/web_ui
Remove debugging
remove the jwt reference, that was from something else
restore old jwt plugin
move things around
Revert "move things around"
This reverts commit 2a35121850f5b6b82064ecf78ebee5246601c04f.
Update ui path handling to not need the web_ui name part
add desc
move the http.FS conversion internal to assetFS
update gitignore
remove bindata dep
clean up some comments
remove asset check script that's no longer needed
Update readme
remove more bindata things
restore asset check
update packagespec
update stub
stub the assetFS method and set uiBuiltIn to false for non-ui builds
update packagespec to build ui
* fail if assets aren't found
* tidy up vendor
* go mod tidy
* updating .circleci
* restore tools.go
* re-re-re-run make packages
* re-enable arm64
* Adding change log
* Removing a file
Co-authored-by: hamid ghaf <hamid@hashicorp.com>
* Simplify Run(): the function that was being sent over a channel doesn't
need to close over anything except latestToken, and we don't need to
create a new one each iteration. Instead just pass the relevant items,
namely the token and sink to work on.
* Disallow the following config combinations:
1. auto_auth.method.wrap_ttl > 0 and multiple file sinks
2. auto_auth.method.wrap_ttl > 0 and single file sink with wrap_ttl > 0
3. auto_auth.method.wrap_ttl > 0 and cache.use_auto_auth_token = true
* Expose errors that occur when APIProxy is forwarding request to Vault.
* Fix merge issues.
* Since we want to use the Agent listener for #6384, move listener config
from top-level 'cache' block to new top-level 'listeners' block.
* Make agent config allow cache and listener blocks without auto-auth
configured.
* vault-agent-cache: squashed 250+ commits
* Add proper token revocation validations to the tests
* Add more test cases
* Avoid leaking by not closing request/response bodies; add comments
* Fix revoke orphan use case; update tests
* Add CLI test for making request over unix socket
* agent/cache: remove namespace-related tests
* Strip-off the auto-auth token from the lookup response
* Output listener details along with configuration
* Add scheme to API address output
* leasecache: use IndexNameLease for prefix lease revocations
* Make CLI accept the fully qualified unix address
* export VAULT_AGENT_ADDR=unix://path/to/socket
* unix:/ to unix://
* conversion stage 1
* correct image paths
* add sidebar title to frontmatter
* docs/concepts and docs/internals
* configuration docs and multi-level nav corrections
* commands docs, index file corrections, small item nav correction
* secrets converted
* auth
* add enterprise and agent docs
* add extra dividers
* secret section, wip
* correct sidebar nav title in front matter for apu section, start working on api items
* auth and backend, a couple directory structure fixes
* remove old docs
* intro side nav converted
* reset sidebar styles, add hashi-global-styles
* basic styling for nav sidebar
* folder collapse functionality
* patch up border length on last list item
* wip restructure for content component
* taking middleman hacking to the extreme, but its working
* small css fix
* add new mega nav
* fix a small mistake from the rebase
* fix a content resolution issue with middleman
* title a couple missing docs pages
* update deps, remove temporary markup
* community page
* footer to layout, community page css adjustments
* wip downloads page
* deps updated, downloads page ready
* fix community page
* homepage progress
* add components, adjust spacing
* docs and api landing pages
* a bunch of fixes, add docs and api landing pages
* update deps, add deploy scripts
* add readme note
* update deploy command
* overview page, index title
* Update doc fields
Note this still requires the link fields to be populated -- this is solely related to copy on the description fields
* Update api_basic_categories.yml
Updated API category descriptions. Like the document descriptions you'll still need to update the link headers to the proper target pages.
* Add bottom hero, adjust CSS, responsive friendly
* Add mega nav title
* homepage adjustments, asset boosts
* small fixes
* docs page styling fixes
* meganav title
* some category link corrections
* Update API categories page
updated to reflect the second level headings for api categories
* Update docs_detailed_categories.yml
Updated to represent the existing docs structure
* Update docs_detailed_categories.yml
* docs page data fix, extra operator page remove
* api data fix
* fix makefile
* update deps, add product subnav to docs and api landing pages
* Rearrange non-hands-on guides to _docs_
Since there is no place for these on learn.hashicorp, we'll put them
under _docs_.
* WIP Redirects for guides to docs
* content and component updates
* font weight hotfix, redirects
* fix guides and intro sidenavs
* fix some redirects
* small style tweaks
* Redirects to learn and internally to docs
* Remove redirect to `/vault`
* Remove `.html` from destination on redirects
* fix incorrect index redirect
* final touchups
* address feedback from michell for makefile and product downloads
* adding UI handlers and UI header configuration
* forcing specific static headers
* properly getting UI config value from config/environment
* fixing formatting in stub UI text
* use http.Header
* case-insensitive X-Vault header check
* fixing var name
* wrap both stubbed and real UI in header handler
* adding test for >1 keys