We further optimize the CI workflow for better costs and speed.
We tested the Go CI workflows across several instance classes
and update our compute choices. We achieve an average execution
speed improvement of 2-2.5 minutes per test workflow while
reducing the infrastructure cost by about 20%. We also also save
another ~2 minutes by installing `gotestsum` from the Github release
instead of downloading the Go modules and compiling it every time.
In addition to the speed improvements, we also further reduced our cache
usage by updating the `security-scan` workflow to not cache Go modules.
We also use the `cache/save` and `cache/restore` actions for timing
caches. This results is saving half as many cache results for timing
data.
*UI test results*
results for 2x runs:
* c6a.2xlarge (12m54s, 11m55s)
* c6a.4xlarge (10m47s, 11m6s)
* c6a.8xlarge (11m32s, 10m51s)
* m5.2xlarge (15m23s, 14m16s)
* m5.4xlarge (14m48s, 12m54s)
* m5.8xlarge (12m27s, 12m24s)
* m6a.2xlarge (11m55s, 12m20s)
* m6a.4xlarge (10m54s, 10m43s)
* m6a.8xlarge (10m33s, 10m51s)
Current runner:
m5.2xlarge (15m23s, 14m16s, avg 14m50s) @ 0.448/hr = $0.11
Faster candidates
* c6a.2xlarge (12m54s, 11m55s, avg 12m24s) @ 0.3816/hr = $0.078
* m6a.2xlarge (11m55s, 12m20s, avg 12m8s) @ 0.4032/hr = $0.081
* c6a.4xlarge (10m47s, 11m6s, avg 10m56s) @ 0.7632/hr = $0.139
* m6a.4xlarge (10m54s, 10m43s, avg 10m48s) @ 0.8064/hr = $0.140
Best bang for the buck for test-ui:
m6a.2xlarge, > 25% cost savings from current and we save ~2.5 minutes.
*Go test results*
During testing the external replication tests, when not broken up, will
always take the longest. Our original analysis focuses on this job.
Most other tests groups will finish ~3m faster so we'll use subtract
that time when estimating the cost for the whole job.
external replication job results:
* c6a.2xlarge (20m49s, 19m20s, avg 20m5s)
* c6a.4xlarge (19m1s, 19m38s, avg 19m20s)
* c6a.8xlarge (19m51s, 18m54s, avg 19m23s)
* m5.2xlarge (22m12s, 20m29s, avg 21m20s)
* m5.4xlarge (20m7s, 19m3s, avg 20m35s)
* m5.8xlarge (20m24s, 19m42s, avg 20m3s)
* m6a.2xlarge (21m10s, 19m37s, avg 20m23s)
* m6a.4xlarge (18m58s, 19m51s, avg 19m24s)
* m6a.8xlarge (19m27s, 18m47s, avg 19m7s)
There is little separation in time when we increase class size. In the
best case a class size increase yields about a ~5% performance increase
and doubles the cost. For test-go our best bang for the buck is
certainly going to be in the 2xlarge class.
Current runner:
m5.2xlarge (22m12s, 20m29s, avg 21m20s) @ 0.448/hr (16@avg-3m + 1@avg) = $2.35
Candidates in the same class
* c6a.2xlarge (20m49s, 19m20s, avg 20m5s) @ 0.3816/hr (16@avg-3m + 1@avg) = $1.86
* m6a.2xlarge (21m10s, 19m37s, avg 20m23s) @ 0.4032/hr (16@avg-3m + 1@avg) = $2.00
Best bang for the buck for test-go:
c6a.2xlarge: 20% cost savings and save about ~2.25 minutes.
We ran the tests with similar instances and saw similar execution times as
with test-go. Therefore we can use the same recommended instance sizes.
After breaking up test-go's external replication tests, the longest group
was shorter on average. I choose to look at group 3 as it was usually the
longest grouping:
* c6a.2xlarge: (14m51s, 14m48s)
* c6a.4xlarge: (14m14s, 14m15)
* c6a.8xlarge: (14m0s, 13m54s)
* m5.2xlarge: (15m36s, 15m35s)
* m5.4xlarge: (14m46s, 14m49s)
* m5.8xlarge: (14m25s, 14m25s)
* m6a.2xlarge: 14m51s, 14m53s)
* m6a.4xlarge: 14m16s, 14m16s)
* m6a.8xlarge: (14m2s, 13m57s)
Again, we see ~5% performance gains between the 2x and 8x instance classes
at quadruple the cost. The c6a and m6a families are almost identical, with
the c6a class being cheaper.
*Notes*
* UI and Go Test timing results: https://github.com/hashicorp/vault-enterprise/actions/runs/5556957460/jobs/10150759959
* Go Test with data race detection timing results: https://github.com/hashicorp/vault-enterprise/actions/runs/5558013192
* Go Test with replication broken up: https://github.com/hashicorp/vault-enterprise/actions/runs/5558490899
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* Sync missing scenarios and modules
* Clean up variables and examples vars
* Add a `lint` make target for enos
* Update enos `fmt` workflow to run the `lint` target.
* Always use ipv4 addresses in target security groups.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* backport of commit dc104898f7 (#21853)
* fix multiline
* shellcheck, and success message for builds
* add full path
* cat the summary
* fix and faster
* fix if condition
* base64 in a separate step
* echo
* check against empty string
* add echo
* only use matrix ids
* only id
* echo matrix
* remove wrapping array
* tojson
* try echo again
* use jq to get packages
* don't quote
* only run binary tests once
* only run binary tests once
* test what's wrong with the binary
* separate file
* use matrix file
* failed test
* update comment on success
* correct variable name
* bae64 fix
* output to file
* use multiline
* fix
* fix formatting
* fix newline
* fix whitespace
* correct body, remove comma
* small fixes
* shellcheck
* another shellcheck fix
* fix deprecation checker
* only run comments for prs
* Update .github/workflows/test-go.yml
Co-authored-by: Mike Palmiotto <mike.palmiotto@hashicorp.com>
* Update .github/workflows/test-go.yml
Co-authored-by: Mike Palmiotto <mike.palmiotto@hashicorp.com>
* fixes
---------
Co-authored-by: Mike Palmiotto <mike.palmiotto@hashicorp.com>
* backport of commit 3b00dde1ba (#21936)
* limit test comments
* remove unecessary tee
* fix go test condition
* fix
* fail test
* remove ailways entirely
* fix columns
* make a bunch of tests fail
* separate line
* include Failures:
* remove test fails
* fix whitespace
* backport of commit 245430215c (#21973)
* only add binary tests if they exist
* shellcheck
---------
Co-authored-by: miagilepner <mia.epner@hashicorp.com>
Co-authored-by: Mike Palmiotto <mike.palmiotto@hashicorp.com>
In order to reliably store Go test times in the Github Actions cache we
need to reduce our cache thrashing by not using more than 10gb over all
of our caches. This change reduces our cache usage significantly by
sharing Go module cache between our Go CI workflows and our build
workflows. We lose our per-builder cache which will result in a bit of
performance hit, but we'll enable better automatic rebalancing of our CI
workflows. Overall we should see a per branch reduction in cache sizes
from ~17gb to ~850mb.
Some preliminary investigation into this new strategy:
Prior build workflow strategy on a cache miss:
Download modules: ~20s
Build Vault: ~40s
Upload cache: ~30s
Total: ~1m30s
Prior build workflow strategy on a cache hit:
Download and decompress modules and build cache: ~12s
Build Vault: ~15s
Total: ~28s
New build workflow strategy on a cache miss:
Download modules: ~20
Build Vault: ~40s
Upload cache: ~6s
Total: ~1m6s
New build workflow strategy on a cache hit:
Download and decompress modules: ~3s
Build Vault: ~40s
Total: ~43s
Expected time if we used no Go caching:
Download modules: ~20
Build Vault: ~40s
Total: ~1m
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* update github.com/protonmail/go-crypto
Updates the transitive dependency github.com/cloudflare/circl which will help address scanning noise related to CVE-2023-1732.
Vault is not affected by this issue as it does not use the vulnerable functionality.
* go mod tidy
---------
Co-authored-by: mickael e <mickael@hashicorp.com>
* combine into one checker
* combine and simplify ci checks
* add to test package list
* remove testing test
* only run deprecations check
* only run deprecations check
* remove unneeded repo check
* fix bash options
Co-authored-by: miagilepner <mia.epner@hashicorp.com>