Commit Graph

117 Commits

Author SHA1 Message Date
Tin Vo
ac3bb7b2d4 VAULT-32188: Enos test for PKI certificates (#29007)
* updating pki test

* updating pki test

* updating pki test

* updating pki script

* resolving conflicts

* adding pki cert verifications

* resolving conflicts

* updating test

* removing comments

* addressing bash formatting

* updating test

* adding description

* fixing lint error

* fixing lint error

* fixing lint issue

* removing unneeded scenario

* resolving conflicts

* debugging pipeline error

* fixing pipeline tests'

* fixing pipeline tests'

* testing smoke test

* fixing pipeline error

* debugging pipeline error

* debugging pipeline error

* debugging pipeline error

* debugging agent test ci failure

* fixing ci errors

* uncomment token

* updating script

* updating hosts

* fixing lint

* fixing lint

* fixing lint

* adding revoked certificate

* undo kv.tf change

* updating cert issuing

* updating issuing certs to include issuer

* updating pki cert verification

* addressing comments

* fixing lint

* fixing lint

* fixing lint

* fixing lint

* updating verify_secrets_engine_read module

* fixing lint

* fixing lint

* fixing lint

* debugging lint

* testing pipeline

* adding verify variables for autopilot

* adding pki read variable for autopilot

* updating vault engine read variables

* addressing comments

* fixing lint

* update test for enterprise

* update pki tests to adapt to enterprise
2025-01-23 11:30:20 -08:00
Rebecca Willett
8cee664204 Add 'how to run' instructions to each Enos scenario (#29299)
* Add 'how to run' instructions for each scenario
2025-01-10 21:17:09 +00:00
Tin Vo
d5a7ac2680 adding logic to handle cloud-init error code 2 (#28598)
* adding logic to print failures and retry if there is an cloud-init error

* adding logic to print failures and retry if there is an cloud-init error

* fixing timeout error

* fixing timeout error

* fixing timeout error

* fixing timeout error

* fixing timeout error

* updating retry to 2

* updating cloud init status logic

* updating cloud init status logic

* addressing comments

* addressing comments

* fixing error from sync scriot
2024-11-22 12:06:32 -08:00
Ryan Cragun
3b31b3e939 VAULT-32206: verify audit log and systemd journal secret integrity (#28932)
Verify vault secret integrity in unauthenticated I/O streams (audit log, STDOUT/STDERR via the systemd journal) by scanning the text with Vault Radar. We search for both known and unknown secrets by using an index of KVV2 values and also by radar's built-in heuristics for credentials, secrets, and keys.

The verification has been added to many scenarios where a slight time increase is allowed, as we now have to install Vault Radar and scan the text. In practice this adds less than 10 seconds to the overall duration of a scenario.

In the in-place upgrade scenario we explicitly exclude this verification when upgrading from a version that we know will fail the check. We also make the verification opt-in so as to not require a Vault Radar license to run Enos scenarios, though it will always be enabled in CI.

As part of this we also update our enos workflow to utilize secret values from our self-hosted Vault when executing in the vault-enterprise repo context.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-11-22 11:14:01 -07:00
Ryan Cragun
ce5885279b VAULT-31181: Add pipeline tool to Vault (#28536)
As the Vault pipeline and release processes evolve over time, so too must the tooling that drives them. Historically we've utilized a combination of CI features and shell scripts that are wrapped into make targets to drive our CI. While this 
approach has worked, it requires careful consideration of what features to use (bash in CI almost never matches bash in developer machines, etc.) and often requires a deep understanding of several CLI tools (jq, etc). `make` itself also has limitations in user experience, e.g. passing flags.

As we're all in on Github Actions as our pipeline coordinator, continuing to utilize and build CLI tools to perform our pipeline tasks makes sense. This PR adds a new CLI tool called `pipeline` which we can use to build new isolated tasks that we can string together in Github Actions. We intend to use this utility as the interface for future release automation work, see VAULT-27514.

For the first task in this new `pipeline` tool, I've chosen to build two small sub-commands:

* `pipeline releases list-versions` - Allows us to list Vault versions between a range. The range is configurable either by setting `--upper` and/or `--lower` bounds, or by using the `--nminus` to set the N-X to go back from the current branches version. As CE and ENT do not have version parity we also consider the `--edition`, as well as none-to-many `--skip` flags to exclude specific versions.

* `pipeline generate enos-dynamic-config` - Which creates dynamic enos configuration based on the branch and the current list of release versions. It takes largely the same flags as the `release list-versions` command, however it also expects a `--dir` for the enos directory and a `--file` where the dynamic configuration will be written. This allows us to dynamically update and feed the latest versions into our sampling algorithm to get coverage over all supported prior versions.

We then integrate these new tools into the pipeline itself and cache the dynamic config on a weekly basis. We also cache the pipeline tool itself as it will likely become a repository for pipeline specific tooling. The caching strategy for the `pipeline` tool itself will make most workflows that require it super fast.


Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-10-23 15:31:24 -06:00
Ryan Cragun
c8e6169d5d VAULT-31402: Add verification for all container images (#28605)
* VAULT-31402: Add verification for all container images

Add verification for all container images that are generated as part of
the build. Before this change we only ever tested a limited subset of
"default" containers based on Alpine Linux that we publish via the
Docker hub and AWS ECR.

Now we support testing all Alpine and UBI based container images. We
also verify the repository and tag information embedded in each by
deploying them and verifying the repo and tag metadata match our
expectations.

This does change the k8s scenario interface quite a bit. We now take in
an archive image and set image/repo/tag information based on the
scenario variants.

To enable this I also needed to add `tar` to the UBI base image. It was
already available in the Alpine image and is used to copy utilities to
the image when deploying and configuring the cluster via Enos.

Since some images contain multiple tags we also add samples for each
image and randomly select which variant to test on a given PR.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-10-07 10:16:22 -06:00
Tin Vo
4836c83e5a removing unused variable (#28537)
* removing unused variable

* testing pipeline

* testing pipeline
2024-10-02 12:06:26 -07:00
Ryan Cragun
c8c51b1b9d VAULT-30819: verify DR secondary leader before unsealing followers (#28459)
* VAULT-30819: verify DR secondary leader before unsealing followers

After we've enabled DR replication on the secondary leader the existing
cluster followers will be resealed with the primary clusters encryption
keys. We have to unseal the followers to make them available. To ensure
that we absolutely take every precaution before attempting to unseal the
followers we now verify that the secondary leader is the cluster leader,
has a valid merkle tree, and is streaming wals from the primary cluster
before we attempt to unseal the secondary followers.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-09-24 09:13:40 -06:00
Ryan Cragun
b977fac936 VAULT-30819: DR replicatio: wait for seal rewrap before enabling DR (#28425)
Ensure that both clusters have completed their seal rewrap before
enabling DR on the secondary. We don't want the secondary to come back
up in an in-between state.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-09-18 10:29:03 -06:00
Ryan Cragun
1082629d1f VAULT-30819: Fix two potential flakes in DR replication (#28409)
Fix two occasional flakes in the DR replication scenario:
* Always verify that all nodes in the cluster are unsealed before
  verifying test data. Previously we only verified seal status on
  followers.
* Fix an occasional timeout when waiting for the cluster to unseal by
  rewriting the module to retry for a set duration instead of
  exponential backoff.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-09-17 12:32:15 -06:00
Ryan Cragun
392412829b [VAULT-30189] enos: verify identity and OIDC tokens (#28274)
* [VAULT-30189] enos: verify identity and OIDC tokens

Expand our baseline API and data verification by including the identity
and identity OIDC tokens secrets engines. We now create a test entity,
entity-alias, identity group, various policies, and associate them with
the entity. For the OIDC side, we now configure the OIDC issuer, create
and rotate named keys, create and associate roles with the named key,
and issue and introspect tokens.

During a second phase we also verify that the those some entities,
groups, keys, roles, config, etc all exist with the expected values.
This is useful to test durability after upgrades, migrations, etc.

This change also includes new updates our prior `auth/userpass` and `kv`
verification. We had two modules that were loosely coupled and
interdependent. This restructures those both into a singular module with
child modules and fixes the assumed values by requiring the read module
to verify against the created state.

Going forward we can continue to extend this secrets engine verification
module with additional create and read checks for new secrets engines.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-09-09 14:29:11 -06:00
Ryan Cragun
0764d7d177 enos: poweroff and terminate instances when shutting them down (#28316)
Previously our `shutdown_nodes` modules would halt the machine. While
this is useful for simulating a failure it makes cleaning up the halted
machines very slow in AWS.

Instead, we now poweroff the machines and utilize EC2's instance
poweroff handling to immediately terminate the instances.

I've test both scenarios locally utilizing the change and both still
work as expected. I also timed before and after and this change saves 5
MINUTES in total runtime (~40%) for the PR replication scenario. I assume
it yields similar results for autopilot.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-09-09 13:22:41 -06:00
Luis (LT) Carbonell
cdf3da4066 Add DR failover scenario to Enos (#28256)
* Add DR failover scenario to Enos

* Update enos/enos-scenario-dr-replication.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

* Update enos/enos-qualities.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

* Update enos/enos-scenario-dr-replication.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

* Update enos/enos-scenario-dr-replication.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

* Update enos/enos-qualities.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

* Update enos/enos-scenario-dr-replication.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

* Update enos/enos-scenario-dr-replication.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

* Update enos/enos-scenario-dr-replication.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

* Update enos/enos-scenario-pr-replication.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

* remove superuser

* Update enos/enos-scenario-dr-replication.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

* Update enos/enos-scenario-dr-replication.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

* Update enos/enos-scenario-dr-replication.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

* Update enos/enos-scenario-dr-replication.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

* Update enos/enos-scenario-dr-replication.hcl

Co-authored-by: Ryan Cragun <me@ryan.ec>

---------

Co-authored-by: Ryan Cragun <me@ryan.ec>
2024-09-05 21:33:53 +00:00
Ryan Cragun
b5d32b7bec enos: add shfmt formatting to enos module scripts (#28142)
Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-08-23 13:45:30 -06:00
Ryan Cragun
10430fecba enos: don't exit in verify-billing-start retry loop (#28132)
Previously we'd fail in the verify-billing-start.sh retry loop instead
of returning a 1. This fixes that and normalizes the script.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-08-20 17:36:09 -06:00
akshya96
76a49a5700 Auto-roll billing start enos test (#27981)
* auto-roll billing start enos test

* enos: don't expect curl available in docker image (#27984)

Signed-off-by: Ryan Cragun <me@ryan.ec>

* Update interoperability-matrix.mdx (#27977)

Updating the existing Vault/YubiHSM integration with a newer version of Vault as well as now supporting Managed Keys.

* Update hana db pkg (#27950)

* database/hana: use go-hdb v1.10.1

* docs/hana: quotes around password so dashes don't break it

* Clarify audit log failure telemetry docs. (#27969)

* Clarify audit log failure telemetry docs.

* Add the note about the misleading counts

* Auto-rolling billing start docs PR  (#27926)

* auto-roll docs changes

* addressing comments

* address comments

* Update website/content/api-docs/system/internal-counters.mdx

Co-authored-by: Sarah Chavis <62406755+schavis@users.noreply.github.com>

* addressing some changes

* update docs

* update docs with common explanation file

* updated note info

* fix 1.18 upgrade doc

* fix content-check error

* Update website/content/partials/auto-roll-billing-start-example.mdx

Co-authored-by: miagilepner <mia.epner@hashicorp.com>

---------

Co-authored-by: Sarah Chavis <62406755+schavis@users.noreply.github.com>
Co-authored-by: miagilepner <mia.epner@hashicorp.com>

* docker: add upgrade notes for curl removal (#27995)

Signed-off-by: Ryan Cragun <me@ryan.ec>

* Update vault-plugin-auth-jwt to v0.21.1 (#27992)

* docs: fix upgrade 1.16.x (#27999)

Signed-off-by: Ryan Cragun <me@ryan.ec>

* UI: Add unsupportedCriticalCertExtensions to jwt config expected payload (#27996)

* Client Count Docs Updates/Cleanup (#27862)

* Docs changes

* More condensation of docs

* Added some clarity on date ranges

* Edited wording'

* Added estimation client count info

* Update website/content/api-docs/system/internal-counters.mdx

Co-authored-by: miagilepner <mia.epner@hashicorp.com>

---------

Co-authored-by: miagilepner <mia.epner@hashicorp.com>

* update(kubernetes.mdx): k8s-tokenreview URL (#27993)

Co-authored-by: akshya96 <87045294+akshya96@users.noreply.github.com>

* Update programmatic-management.mdx to clarify Terraform prereqs (#27548)

* UI: Replace getNewModel with hydrateModel when model exists (#27978)

* Replace getNewModel with hydrateModel when model exists

* Update getNewModel to only handle nonexistant model types

* Update test

* clarify test

* Fix auth-config models which need hydration not generation

* rename file to match service name

* cleanup + tests

* Add comment about helpUrl method

* Changelog for 1.17.3, 1.16.7 enterprise, 1.15.13 enterprise (#28018)

* changelog for 1.17.3, 1.16.7 enterprise, 1.15.13 enterprise

* Add spacing to match older changelogs

* Fix typo in variables.tf (#27693)

intialize -> initialize

Co-authored-by: akshya96 <87045294+akshya96@users.noreply.github.com>

* Update 1_15-auto-upgrade.mdx (#27675)

* Update 1_15-auto-upgrade.mdx

* Update known issue version numbers for AP issue

---------

Co-authored-by: Sarah Chavis <62406755+schavis@users.noreply.github.com>

* Update 1_16-default-policy-needs-to-be-updated.mdx (#27157)

Made a few grammar changes plus updating term from Vault IU to Vault UI

* change instances variable to hosts

* for each hosts

* add cluster addr port

* Add ENVs using NewTestDockerCluster (#27457)

* Add ENVs using NewTestDockerCluster

Currently NewTestDockerCluster had no means for setting any
environment variables. This makes it tricky to create test
for functionality that require thems, like having to set
AWS environment variables.

DockerClusterOptions now exposes an option to pass extra
enviroment variables to the containers, which are appended
to the existing ones.

* adding changelog

* added test case for setting env variables to containers

* fix changelog typo; env name

* Update changelog/27457.txt

Co-authored-by: akshya96 <87045294+akshya96@users.noreply.github.com>

* adding the missing copyright

---------

Co-authored-by: akshya96 <87045294+akshya96@users.noreply.github.com>

* UI: Build KV v2 overview page (#28106)

* move date-from-now helper to addon

* make overview cards consistent across engines

* make kv-paths-card component

* remove overview margin all together

* small styling changes for paths card

* small selector additions

* add overview card test

* add overview page and test

* add default timestamp format

* cleanup paths test

* fix dateFromNow import

* fix selectors, cleanup pki selectors

* and more selector cleanup

* make deactivated state single arg

* fix template and remove @isDeleted and @isDestroyed

* add test and hide badge unless deactivated

* address failings from changing selectors

* oops, not ready to show overview tab just yet!

* add deletionTime to currentSecret metadata getter

* Bump actions/download-artifact from 4.1.7 to 4.1.8 (#27704)

Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4.1.7 to 4.1.8.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](65a9edc588...fa0a91b85d)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: akshya96 <87045294+akshya96@users.noreply.github.com>

* Bump actions/setup-node from 4.0.2 to 4.0.3 (#27738)

Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4.0.2 to 4.0.3.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](60edb5dd54...1e60f620b9)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: akshya96 <87045294+akshya96@users.noreply.github.com>

* Add valid IP callout (#28112)

Co-authored-by: Yoko Hyakuna <yoko@hashicorp.com>

* Refactor SSH Configuration workflow (#28122)

* initial copy from other #28004

* pr feedback

* grr

* Bump browser-actions/setup-chrome from 1.7.1 to 1.7.2 (#28101)

Bumps [browser-actions/setup-chrome](https://github.com/browser-actions/setup-chrome) from 1.7.1 to 1.7.2.
- [Release notes](https://github.com/browser-actions/setup-chrome/releases)
- [Changelog](https://github.com/browser-actions/setup-chrome/blob/master/CHANGELOG.md)
- [Commits](db1b524c26...facf10a55b)

---
updated-dependencies:
- dependency-name: browser-actions/setup-chrome
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: divyaac <divya.chandrasekaran@hashicorp.com>

* Bump vault-gcp-secrets-plugin (#28089)

Co-authored-by: divyaac <divya.chandrasekaran@hashicorp.com>

* docs: correct list syntax (#28119)

Co-authored-by: divyaac <divya.chandrasekaran@hashicorp.com>

* add semgrepconstraint check in skip step

---------

Signed-off-by: Ryan Cragun <me@ryan.ec>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Adam Rowan <92474478+bear359@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
Co-authored-by: Paul Banks <pbanks@hashicorp.com>
Co-authored-by: Sarah Chavis <62406755+schavis@users.noreply.github.com>
Co-authored-by: miagilepner <mia.epner@hashicorp.com>
Co-authored-by: Scott Miller <smiller@hashicorp.com>
Co-authored-by: Chelsea Shaw <82459713+hashishaw@users.noreply.github.com>
Co-authored-by: divyaac <divya.chandrasekaran@hashicorp.com>
Co-authored-by: Roman O'Brien <58272664+romanobrien@users.noreply.github.com>
Co-authored-by: Adrian Todorov <adrian.todorov@hashicorp.com>
Co-authored-by: VAL <val@hashicorp.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Owen Zhang <86668876+owenzorrin@users.noreply.github.com>
Co-authored-by: gkoutsou <gkoutsou@users.noreply.github.com>
Co-authored-by: claire bontempo <68122737+hellobontempo@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jonathan Frappier <92055993+jonathanfrappier@users.noreply.github.com>
Co-authored-by: Yoko Hyakuna <yoko@hashicorp.com>
Co-authored-by: Angel Garbarino <Monkeychip@users.noreply.github.com>
Co-authored-by: Max Levine <max@maxlevine.co.uk>
Co-authored-by: Steffy Fort <steffyfort@gmail.com>
2024-08-20 13:47:20 -07:00
Ryan Cragun
339721e953 enos: renable undo logs verification (#27206)
After VAULT-20259 we did not enable the undo logs verification. This
reenables the check but modified to check the status of the primary and
follower nodes, as they should have different values.

While testing this I accidentally flubbed my version input and found the
diagnostic a bit confusing to read so I updated the error message on
version mismatch to be a bit easier to read.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-08-14 13:45:50 -06:00
Ryan Cragun
bf0e156496 enos: wait for both clusters to be healthy before configuring replication (#28049)
Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-08-09 16:28:28 -06:00
Ryan Cragun
74b6cc799a VAULT-29583: Modernize default distributions in enos scenarios (#28012)
* VAULT-29583: Modernize default distributions in enos scenarios

Our scenarios have been running the last gen of distributions in CI.
This updates our default distributions as follows:
  - Amazon: 2023
  - Leap:   15.6
  - RHEL:   8.10, 9.4
  - SLES:   15.6
  - Ubuntu: 20.04, 24.04

With these changes we also unlock a few new variants combinations:
  - `distro:amzn seal:pkcs11`
  - `arch:arm64 distro:leap`

We also normalize our distro key for Amazon Linux to `amzn`, which
matches the uname output on both versions that we've supported.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-08-09 13:43:28 -06:00
Ryan Cragun
8c2548f8be VAULT-29739: Wait for cluster unseal before checking version history (#28022)
Sometimes the replication scenario will race with other steps and
attempt to check the `v1/sys/version-history` API before the cluster is
ready. Eventually when it gets retried some of the original nodes are
down so it will fail. This makes the verification happen later, only
after we've ensured the cluster is unsealed and have gotten leader and
cluster IP addresses. We also make dependent steps require the version
verification so that if it does fail for some reason it will retry
before doing the rest of the scenario.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-08-09 13:43:11 -06:00
Ikko Eltociear Ashimine
b29f52d56e Fix typo in variables.tf (#27693)
intialize -> initialize

Co-authored-by: akshya96 <87045294+akshya96@users.noreply.github.com>
2024-08-07 14:13:00 -07:00
Ryan Cragun
6366455922 enos: don't expect curl available in docker image (#27984)
Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-08-05 15:25:39 -06:00
Ryan Cragun
e246b2652c enos: support ancient systemd in vault_upgrade (#27960)
Amazon Linux 2 uses an ancient version of Systemd/systemctl so instead
of using -P when determining the unit file we use the less convenient
-p.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-08-02 20:36:46 +00:00
Ryan Cragun
720e942662 [VAULT-2937] Verify the /sys/version-history in enos scenarios (#27947)
When verifying the Vault version, in addition to verifying the CLI
version we also check that the `/sys/version-history` contains the
expected version.

As part of this we also fix a bug where when doing an in-place upgrade
with a Debian or Redhat package we also remove the self-managed
`vault.service` systemd unit to ensure that correctly start up using the
new version of Vault.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-08-02 13:26:39 -06:00
Ryan Cragun
174da88b9d VAULT-28146: Add IPV6 support to enos scenarios (#27884)
* VAULT-28146: Add IPV6 support to enos scenarios

Add support for testing all raft storage scenarios and variants when
running Vault with IPV6 networking. We retain our previous support for
IPV4 and create a new variant `ip_version` which can be used to
configure the IP version that we wish to test with.

It's important to note that the VPC in IPV6 mode is technically mixed
and that target machines still associate public IPV6 addresses. That
allows us to execute our resources against them from IPV4 networks like
developer machines and CI runners. Despite that, we've taken care to
ensure that only IPV6 addresses are used in IPV6 mode.

Because we previously had assumed the IP Version, Vault address, and
listener ports in so many places, this PR is essentially a rewrite and
removal of those assumptions. There are also a few places where
improvements to scenarios have been included as I encountered them while
working on the IPV6 changes.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-07-30 11:00:27 -06:00
Ryan Cragun
89e9e0f2cd VAULT-28307 enos: allow arm64 fips1402 and hsm editions (#27571)
In preperation for arm64 builds of hsm, fips1402, and hsm.fips1402
editions of Vault Enterprise we'll allow them in our test scenarios.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-06-21 15:24:46 -06:00
Ryan Cragun
456f180fb9 enos: use the correct install during upgrade (#27496)
Fix the upgrade scenario by utilizing the correct Vault install location
for the initial install and by using different initial upgrade versions.
Before we had upgrade versions that were only available for Enterprise
which would fail on CE.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-06-13 22:21:02 +00:00
Ryan Cragun
84935e4416 [QT-697] enos: add descriptions and quality verification (#27311)
In order to take advantage of enos' ability to outline scenarios and to
inventory what verification they perform we needed to retrofit all of
that information to our existing scenarios and steps.

This change introduces an initial set of descriptions and verification
declarations that we can continue to refine over time.

As doing this required that I re-read every scenanario in its entirety I
also updated and fixed a few things along the way that I noticed,
including adding a few small features to enos that we utilize to make
handling initial versions programtic between versions instead of having a
delta between our globals in each branch.

* Update autopilot and in-place upgrade initial versions
* Programatically determine which initial versions to use based on Vault
  version
* Partially normalize steps between scenarios to make comparisons easier
* Update the MOTD to explain that VAULT_ADDR and VAULT_TOKEN have been
  set
* Add scenario and step descriptions to scenarios
* Add initial scenario quality verification declarations to scenarios
* Unpin Terraform in scenarios as >= 1.8.4 should work fine
2024-06-13 11:16:33 -06:00
Rebecca Willett
1f0639a79c Remove Leap 15.4 from testing matrices and AMI data sources; remove vestiges of Ubuntu 18.04 testing (#27416) 2024-06-10 11:44:32 -04:00
Ryan Cragun
0513545dd8 [VAULT-27917] fix(enos): handle SLES guestregister.service unreliability (#27380)
* [VAULT-27917] fix(enos): handle SLES guestregister.service unreliability

The SLES provided `guestregister.service` systemd unit is unreliable
enough that it will fail ~ 1/9 times when provisioning SLES instances.
When this happens the machine will never successfully exec SUSEConnect
to enroll and we'll get no access to the SLES repositories and
subsequently break our scenarios.

I resolved this by restructuring our `install_packages` module to to
separate repository synchronization, repository addition, and package
installation into different scripts and resources and by adding special
case handling for SLES and the `guestregister.service`.

I also make a distinction between `dnf` and `yum` because while they are
sort of the same thing on RHEL, it is not the case with Amazon2. I also
shimmed out the rest of the support for Apt in case we ever need to add
repos there.

* Revert "Temporarily remove SLES from samples (#27378)"

This reverts commit 490cdd9066.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-06-06 17:37:50 -06:00
Rebecca Willett
490cdd9066 Temporarily remove SLES from samples (#27378) 2024-06-05 15:53:03 -07:00
Rebecca Willett
79cd3238d5 Add exclude block (#27369) 2024-06-05 18:12:00 +00:00
Rebecca Willett
c28739512a Add Amazon Linux, openSUSE Leap, and SUSE SLES support to Enos scenarios and modules (#25983)
Add Consul edition support to Enos scenarios and modules
Add Linux distros and Consul edition to Enos samples
Bump RHEL versions to 9.3 and 8.9
2024-06-05 12:58:35 -04:00
Ryan Cragun
c5ba06557f enos: fix typo in dev_single_cluster (#27207)
Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-05-23 20:22:44 +00:00
Ryan Cragun
fea81ab8bc enos: improve artifact:local dev scenario experience (#27095)
* Better handle symlinks in artifact paths.
* Fix a race condition in the local builder where Terraform wouldn't
  wait for local builds to finish before attempting to install vault on
  target nodes.
* Make building the web ui configurable in the dev scenario.
* Rename `vault_artifactory_artifact` to `build_artifactory_artifact` to
  better align with existing "build" modules.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-05-17 10:22:08 -06:00
Ryan Cragun
aa2aa0b627 enos: correctly set up profile (#27055)
Handle cases where `vault_cluster` is used more than once on a host.
This includes cases where we aren't initing.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-05-15 20:07:35 +00:00
Ryan Cragun
dced3ad3f0 [VAULT-26888] Create developer scenarios (#27028)
* [VAULT-26888] Create developer scenarios

Create developer scenarios that have simplified inputs designed for
provisioning clusters and limited verification.

* Migrate Artifactory installation module from support team focused
  scenarios to the vault repository.
* Migrate support focused scenarios to the repo and update them to use
  the latest in-repo modules.
* Fully document and comment scenarios to help users outline, configure,
  and use the scenarios.
* Remove outdated references to the private registry that is not needed.
* Automatically configure the login shell profile to include the path to
  the vault binary and the VAULT_ADDR/VAULT_TOKEN environment variables.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-05-15 12:10:27 -06:00
Ryan Cragun
27ab988205 [QT-695] Add config_mode variant to some scenarios (#26380)
Add `config_mode` variant to some scenarios so we can dynamically change
how we primarily configure the Vault cluster, either by a configuration
file or with environment variables.

As part of this change we also:
* Start consuming the Enos terraform provider from public Terraform
  registry.
* Remove the old `seal_ha_beta` variant as it is no longer required.
* Add a module that performs a `vault operator step-down` so that we can
  force leader elections in scenarios.
* Wire up an operator step-down into some scenarios to test both the old
  and new multiseal code paths during leader elections.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-04-22 12:34:47 -06:00
Ryan Cragun
89c75d3d7c [QT-637] Streamline our build pipeline (#24892)
Context
-------
Building and testing Vault artifacts on pull requests and merges is
responsible for about 1/3rd of our overall spend on Vault CI. Of the
artifacts that we ship as part of a release, we do Enos testing scenarios
on the `linux/amd64` and `linux/arm64` binaries and their derivative
artifacts. The extended build artifacts for non-Linux platforms or less
common machine architectures are not tested at this time. They are built,
notarized, and signed as part of every pull request update and merge. As
we don't actually test these artifacts, the only gain we get from this
rather expensive behavior is that we wont merge a change that would prevent
Vault from building on one of the extended targets. Extended platform or
architecture changes are quite rare, so performing this work as frequently
as we do is costly in both monetary and developer time for little relative
safety benefit.

Goals
-----
Rethink and implement how and when we build binaries and artifacts of Vault
so that we can spend less money on repetitive work and while also reducing
the time it takes for the build and test pipelines to complete.

Solution
--------
Instead of building all release artifacts on every push, we'll opt to build
only our testable (core) artifacts. With this change we are introducing a
bit of risk. We could merge a change that breaks an extended platform and
only find out after the fact when we trigger a complete build for a release.
We'll hedge against that risk by building all of the release targets on a
scheduled cadence to ensure that they are still buildable.

We'll make building all of the targets optional on any pull request by
use of a `build/all` label on the pull request.

Further considerations
----------------------
* We want to reduce the total number of workflows and runners for all of our
  pipelines if possible. As each workflow runner has infrastructure cost and
  runner time penalties, using a single runner over many is often preferred.
* Many of our jobs runners have been optimized for cost and performance. We
  should simplify the choices of which runners to use.
* CRT requires us to use the same build workflow in both CE and Ent.
  Historically that meant that modifying `build.yml` in CE would result in a
  merge conflict with `build.yml` in Ent, and break our merge workflows.
* Workflow flow control in both `build.yml` and `ci.yml` can be quite
  complicated, as each needs to maintain compatibility whether executed as CE
  or Ent, and when triggered with various Github events like pull_request,
  push, and workflow_call, each with their own requirements.
* Many jobs utilize similar patterns of flow control and metadata but are not
  reusable.
* Workflow call depth has a maximum of four, so we need to be quite
  considerate when calling other workflows.
* Called workflows can only have 10 inputs.

Implementation
--------------
* Refactor the `build.yml` workflow to be agnostic to whether or not it is
  executing in CE or Ent. That makes future updates to the build much easier
  as we won't have to worry about merge conflicts when the change is merged
  downstream.
* Extract common steps in workflows into composite actions that we can reuse.
* Fix bugs where some but not all workflows would use different Git
  references when building and testing a pull request.
* We rewrite the application, docs, and UI change helpers as a composite
  action. This allows us to re-use this logic to make consistent behavior
  choices across build and CI.
* We combine several `build.yml` and `ci.yml` jobs into our final job.
  This reduces the number of workflows required for the same behavior while
  saving time overall.
* Update most of our action pins.

Results
-------

| Metric            | Before   | After   | Diff  |
|-------------------|----------|---------|-------|
| Duration:         | ~14-18m  | ~15-18m | ~ =   |
| Workflows:        | 43       | 18      | - 58% |
| Billable time:    | ~1h15m   | 16m     | - 79% |
| Saved artifacts:  | 34       | 12      | - 65% |

Infra costs should map closely to billable time.
Network I/O costs should map closely to the workflow count.
Storage costs should map directly with saved artifacts.

We could probably get parity with duration by getting more clever with
our UBI container build, as that's where we're seeing the increase. I'm
not yet concerned as it takes roughly the same time for this job to
complete as it did before.

While the CI workflow was not the focus on the PR, some shared
refactoring does show some marginal improvements there.

| Metric            | Before   | After    | Diff   |
|-------------------|----------|----------|--------|
| Duration:         | ~24m     | ~12.75m  | - 15%  |
| Workflows:        | 55       | 47       | - 8%   |
| Billable time:    | ~4h20m   | ~3h36m   | - 7%   |

Further focus on streamlining the CI workflows would likely result in a
few more marginal improvements, but nothing on the order like we've seen
with the build workflow.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-02-06 21:11:33 +00:00
Mike Palmiotto
3389a572b9 enos: Add Default LCQ validation to autopilot upgrade scenario (#24602)
* enos: Add default lcq validation to autopilot upgrade scenario

* Add timeout/retries to default lcq autopilot test
2023-12-20 15:25:20 -07:00
Ryan Cragun
d6bfe428f3 enos: don't include consul_version in autopilot (#24461)
Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-12-11 14:26:19 -07:00
Ryan Cragun
a087f7b267 [QT-627] enos: add pkcs11 seal testing with softhsm (#24349)
Add support for testing `+ent.hsm` and `+ent.hsm.fips1402` Vault editions
with `pkcs11` seal types utilizing a shared `softhsm` token. Softhsm2 is
a software HSM that will load seal keys from a local disk via pkcs11.
The pkcs11 seal implementation is fairly complex as we have to create a
one or more shared tokens with various keys and distribute them to all
nodes in the cluster before starting Vault. We also have to ensure that
each sets labels are unique.

We also make a few quality of life updates by utilizing globals for
variants that don't often change and update base versions for various
scenarios.

* Add `seal_pkcs11` module for creating a `pkcs11` seal key using
  `softhsm2` as our backing implementation.
* Require the latest enos provider to gain access to the `enos_user`
  resource to ensure correct ownership and permissions of the
  `softhsm2` data directory and files.
* Add `pkcs11` seal to all scenarios that support configuring a seal
  type.
* Extract system package installation out of the `vault_cluster` module
  and into its own `install_package` module that we can reuse.
* Fix a bug when using the local builder variant that mangled the path.
  This likely slipped in during the migration to auto-version bumping.
* Fix an issue where restarting Vault nodes with a socket seal would
  fail because a seal socket sync wasn't available on all nodes. Now we
  start the socket listener on all nodes to ensure any node can become
  primary and "audit" to the socket listner.
* Remove unused attributes from some verify modules.
* Go back to using cheaper AWS regions.
* Use globals for variants.
* Update initial vault version for `upgrade` and `autopilot` scenarios.
* Update the consul versions for all scenarios that support a consul
  storage backend.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-12-08 14:00:45 -07:00
Ryan Cragun
30a8435499 [QT-617] Add seal migration to seal_ha scenario (#23919)
Test HA seal migration in the `seal_ha` by removing the primary seal,
ensuring seal rewrap has completed, and verifying that data written
through the primary seal is available in the new primary seal.
We also add a verification for the seal type at various stages of the scenario.

* Allow configuring the seal alias and priority in the `start_vault`
  module.
* Add seal migration to `seal_ha` scenario.
* Verify the data written through the original primary seal after the
  seal migration.
* [QT-629] Verify the seal type at various stages in `seal_ha`.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-10-31 19:42:26 +00:00
Ryan Cragun
a46def288f [QT-616] Add seal_ha enos scenario (#23812)
Add support for testing Vault Enterprise with HA seal support by adding
a new `seal_ha` scenario that configures more than one seal type for a
Vault cluster. We also extend existing scenarios to support testing
with or without the Seal HA code path enabled.

* Extract starting vault into a separate enos module to allow for better
  handling of complex clusters that need to be started more than once.
* Extract seal key creation into a separate module and provide it to
  target modules. This allows us to create more than one seal key and
  associate it with instances. This also allows us to forego creating
  keys when using shamir seals.
* [QT-615] Add support for configuring more that one seal type to
  `vault_cluster` module.
* [QT-616] Add `seal_ha` scenario
* [QT-625] Add `seal_ha_beta` variant to existing scenarios to test with
  both code paths.
* Unpin action-setup-terraform
* Add `kms:TagResource` to service user IAM profile

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-10-26 15:13:30 -06:00
Ryan Cragun
9afd5e52ae [QT-602] Don't fail if scenarios cannot completely destroy infra (#23473)
Sometimes destroying resources in AWS will fail because of unexpected
dependency violations or other such nonsense. When this happens the
behavior of Vault that we wanted to verify has already been successfully
accomplished, however the required workflow will fail. This change
allows us to succeed if `enos scenario launch` completes but allows
`enos scenario destroy` to fail. We still notify our slack channel on
destroy failures so that we can investigate issues, however it won't
require a PR author to retry.

* Execute `enos scenario launch` instead of `enos scenario run` to allow
  for very occasional issues when tearing down test infrastructure.
* Improve an error message when getting secondary cluster IP addresses.
* Don't race to get secondary cluster IP addresses.
* Add secondary token to replication scenario outputs.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-10-03 13:04:55 -06:00
Ryan Cragun
1b321e3e7e test: restart socket sink if it's not listening (#23397)
Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-09-28 22:20:24 +00:00
Ryan Cragun
807bacbc9c test: don't use us-east-1 during an outage (#23396)
An ongoing incident in us-east-1 is impacting CI. We'll temporarily use
Ohio as it's cheaper than California.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-09-28 22:20:08 +00:00
Ryan Cragun
460b5de47b test: increase wait timers in new modules (#23355)
Increase default retries for modules used in replication.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-09-27 17:19:57 -06:00
Ryan Cragun
5cdce48a6a replication: wait longer for replication to resync (#23336)
Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-09-27 20:50:28 +00:00
Ryan Cragun
7725117846 enos: remove old initial version from upgrades (#23323)
* Remove old initial versions from the upgrade scenario as they're
  unreliable.
* Ensure that shellcheck is available on runners for linting job.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-09-27 12:24:08 -06:00