[QT-590] Optimize the CI testing workflow (#21959)

We further optimize the CI workflow for better costs and speed.
We tested the Go CI workflows across several instance classes
and update our compute choices. We achieve an average execution
speed improvement of 2-2.5 minutes per test workflow while
reducing the infrastructure cost by about 20%. We also also save
another ~2 minutes by installing `gotestsum` from the Github release
instead of downloading the Go modules and compiling it every time.

In addition to the speed improvements, we also further reduced our cache
usage by updating the `security-scan` workflow to not cache Go modules.
We also use the `cache/save` and `cache/restore` actions for timing
caches. This results is saving half as many cache results for timing
data.

*UI test results*
results for 2x runs:
* c6a.2xlarge (12m54s, 11m55s)
* c6a.4xlarge (10m47s, 11m6s)
* c6a.8xlarge (11m32s, 10m51s)
* m5.2xlarge (15m23s, 14m16s)
* m5.4xlarge (14m48s, 12m54s)
* m5.8xlarge (12m27s, 12m24s)
* m6a.2xlarge (11m55s, 12m20s)
* m6a.4xlarge (10m54s, 10m43s)
* m6a.8xlarge (10m33s, 10m51s)

Current runner:
m5.2xlarge (15m23s, 14m16s, avg 14m50s) @ 0.448/hr = $0.11

Faster candidates
* c6a.2xlarge (12m54s, 11m55s, avg 12m24s) @ 0.3816/hr = $0.078
* m6a.2xlarge (11m55s, 12m20s, avg 12m8s) @ 0.4032/hr = $0.081
* c6a.4xlarge (10m47s, 11m6s, avg 10m56s) @ 0.7632/hr = $0.139
* m6a.4xlarge (10m54s, 10m43s, avg 10m48s) @ 0.8064/hr = $0.140

Best bang for the buck for test-ui:
  m6a.2xlarge, > 25% cost savings from current and we save ~2.5 minutes.

*Go test results*
During testing the external replication tests, when not broken up, will
always take the longest. Our original analysis focuses on this job.
Most other tests groups will finish ~3m faster so we'll use subtract
that time when estimating the cost for the whole job.

external replication job results:
* c6a.2xlarge (20m49s, 19m20s, avg 20m5s)
* c6a.4xlarge (19m1s, 19m38s, avg 19m20s)
* c6a.8xlarge (19m51s, 18m54s, avg 19m23s)
* m5.2xlarge (22m12s, 20m29s, avg 21m20s)
* m5.4xlarge (20m7s, 19m3s, avg 20m35s)
* m5.8xlarge (20m24s, 19m42s, avg 20m3s)
* m6a.2xlarge (21m10s, 19m37s, avg 20m23s)
* m6a.4xlarge (18m58s, 19m51s, avg 19m24s)
* m6a.8xlarge (19m27s, 18m47s, avg 19m7s)

There is little separation in time when we increase class size. In the
best case a class size increase yields about a ~5% performance increase
and doubles the cost. For test-go our best bang for the buck is
certainly going to be in the 2xlarge class.

Current runner:
m5.2xlarge (22m12s, 20m29s, avg 21m20s) @ 0.448/hr (16@avg-3m + 1@avg) = $2.35

Candidates in the same class
* c6a.2xlarge (20m49s, 19m20s, avg 20m5s) @ 0.3816/hr (16@avg-3m + 1@avg) = $1.86
* m6a.2xlarge (21m10s, 19m37s, avg 20m23s) @ 0.4032/hr (16@avg-3m + 1@avg) = $2.00

Best bang for the buck for test-go:
  c6a.2xlarge: 20% cost savings and save about ~2.25 minutes.

We ran the tests with similar instances and saw similar execution times as
with test-go. Therefore we can use the same recommended instance sizes.

After breaking up test-go's external replication tests, the longest group
was shorter on average. I choose to look at group 3 as it was usually the
longest grouping:

* c6a.2xlarge: (14m51s, 14m48s)
* c6a.4xlarge: (14m14s, 14m15)
* c6a.8xlarge: (14m0s, 13m54s)
* m5.2xlarge: (15m36s, 15m35s)
* m5.4xlarge: (14m46s, 14m49s)
* m5.8xlarge: (14m25s, 14m25s)
* m6a.2xlarge: 14m51s, 14m53s)
* m6a.4xlarge: 14m16s, 14m16s)
* m6a.8xlarge: (14m2s, 13m57s)

Again, we see ~5% performance gains between the 2x and 8x instance classes
at quadruple the cost. The c6a and m6a families are almost identical, with
the c6a class being cheaper.

*Notes*
* UI and Go Test timing results: https://github.com/hashicorp/vault-enterprise/actions/runs/5556957460/jobs/10150759959
* Go Test with data race detection timing results: https://github.com/hashicorp/vault-enterprise/actions/runs/5558013192
* Go Test with replication broken up: https://github.com/hashicorp/vault-enterprise/actions/runs/5558490899

Signed-off-by: Ryan Cragun <me@ryan.ec>
This commit is contained in:
Ryan Cragun
2023-07-20 14:10:08 -06:00
committed by GitHub
parent 6b21994d76
commit 1a46088afb
6 changed files with 106 additions and 58 deletions

View File

@@ -0,0 +1,52 @@
---
name: Set up gotestsum from Github releases
description: Set up gotestsum from Github releases
inputs:
destination:
description: "Where to install the gotestsum binary (default: $HOME/bin/gotestsum)"
type: boolean
default: "$HOME/bin"
version:
description: "The version to install (default: latest)"
type: string
default: Latest
outputs:
destination:
description: Where the installed gotestsum binary is
value: ${{ steps.install.outputs.destination }}
destination-dir:
description: The directory where the installed gotestsum binary is
value: ${{ steps.install.outputs.destination-dir }}
version:
description: The installed version of gotestsum
value: ${{ steps.install.outputs.version }}
runs:
using: composite
steps:
- id: install
shell: bash
env:
GH_TOKEN: ${{ github.token }}
run: |
VERSION=$(gh release list -R gotestyourself/gotestsum --exclude-drafts --exclude-pre-releases | grep Latest | cut -f1)
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
mkdir -p "$HOME/bin"
DESTINATION="$(readlink -f "$HOME/bin")"
echo "destination=$DESTINATION" >> "GITHUB_OUTPUT"
DESTINATION_DIR="$(dirname "$DESTINATION")"
echo "$DESTINATION_DIR" >> "$GITHUB_PATH"
echo "destination-dir=$DESTINATION_DIR" >> "GITHUB_OUTPUT"
OS="$(echo "$RUNNER_OS" | tr '[:upper:]' '[:lower:]')"
ARCH="$(echo "$RUNNER_ARCH" | tr '[:upper:]' '[:lower:]')"
if [ "$ARCH" = "x64" ]; then
export ARCH="amd64"
fi
gh release download "$VERSION" -p "*${OS}_${ARCH}.tar.gz" -O gotestsum.tgz -R gotestyourself/gotestsum
tar -xvf gotestsum.tgz
mv gotestsum "${DESTINATION_DIR}/gotestsum"

View File

@@ -94,7 +94,7 @@ jobs:
key: ui-${{ steps.ui-hash.outputs.ui-hash }}
- if: steps.cache-ui-assets.outputs.cache-hit != 'true'
name: Set up node and yarn
uses: actions/setup-node@64ed1c7eab4cce3362f8c340dee64e5eaeef8f7c # v3.6.0
uses: actions/setup-node@e33196f7422957bea03ed53f6fbb155025ffc7b8 # v3.7.0
with:
node-version-file: ui/package.json
cache: yarn

View File

@@ -21,10 +21,11 @@ jobs:
name: Setup
runs-on: ubuntu-latest
outputs:
compute-tiny: ${{ steps.setup-outputs.outputs.compute-tiny }}
compute-standard: ${{ steps.setup-outputs.outputs.compute-standard }}
compute-larger: ${{ steps.setup-outputs.outputs.compute-larger }}
compute-huge: ${{ steps.setup-outputs.outputs.compute-huge }}
compute-small: ${{ steps.setup-outputs.outputs.compute-small }}
compute-medium: ${{ steps.setup-outputs.outputs.compute-medium }}
compute-large: ${{ steps.setup-outputs.outputs.compute-large }}
compute-largem: ${{ steps.setup-outputs.outputs.compute-largem }}
compute-xlarge: ${{ steps.setup-outputs.outputs.compute-xlarge }}
enterprise: ${{ steps.setup-outputs.outputs.enterprise }}
go-tags: ${{ steps.setup-outputs.outputs.go-tags }}
steps:
@@ -36,18 +37,20 @@ jobs:
if [ "${github_repository##*/}" == "vault-enterprise" ] ; then
# shellcheck disable=SC2129
echo 'compute-tiny=["self-hosted","ondemand","linux","type=m5.large"]' >> "$GITHUB_OUTPUT"
echo 'compute-standard=["self-hosted","ondemand","linux","type=m5.xlarge"]' >> "$GITHUB_OUTPUT"
echo 'compute-larger=["self-hosted","ondemand","linux","type=m5.2xlarge"]' >> "$GITHUB_OUTPUT"
echo 'compute-huge=["self-hosted","ondemand","linux","type=m5.4xlarge"]' >> "$GITHUB_OUTPUT"
echo 'compute-small=["self-hosted","ondemand","linux","type=c6a.large"]' >> "$GITHUB_OUTPUT" # 2x vCPUs, 4 GiB RAM,
echo 'compute-medium=["self-hosted","ondemand","linux","type=c6a.xlarge"]' >> "$GITHUB_OUTPUT" # 4x vCPUs, 8 GiB RAM,
echo 'compute-large=["self-hosted","ondemand","linux","type=c6a.2xlarge"]' >> "$GITHUB_OUTPUT" # 8x vCPUs, 16 GiB RAM,
echo 'compute-largem=["self-hosted","ondemand","linux","type=m6a.2xlarge"]' >> "$GITHUB_OUTPUT" # 8x vCPUs, 32 GiB RAM,
echo 'compute-xlarge=["self-hosted","ondemand","linux","type=c6a.4xlarge"]' >> "$GITHUB_OUTPUT" # 16x vCPUs, 32 GiB RAM,
echo 'enterprise=1' >> "$GITHUB_OUTPUT"
echo 'go-tags=ent,enterprise' >> "$GITHUB_OUTPUT"
else
# shellcheck disable=SC2129
echo 'compute-tiny="ubuntu-latest"' >> "$GITHUB_OUTPUT" # 2 cores, 7 GB RAM, 14 GB SSD
echo 'compute-standard="custom-linux-small-vault-latest"' >> "$GITHUB_OUTPUT" # 8 cores, 32 GB RAM, 300 GB SSD
echo 'compute-larger="custom-linux-medium-vault-latest"' >> "$GITHUB_OUTPUT" # 16 cores, 64 GB RAM, 600 GB SSD
echo 'compute-huge="custom-linux-xl-vault-latest"' >> "$GITHUB_OUTPUT" # 32-cores, 128 GB RAM, 1200 GB SSD
echo 'compute-small="ubuntu-latest"' >> "$GITHUB_OUTPUT" # 2x vCPUs, 7 GiB RAM, 14 GB SSD
echo 'compute-medium="custom-linux-small-vault-latest"' >> "$GITHUB_OUTPUT" # 8x vCPUs, 32 GiB RAM, 300 GB SSD
echo 'compute-large="custom-linux-medium-vault-latest"' >> "$GITHUB_OUTPUT" # 16x vCPUs, 64 GiB RAM, 600 GB SSD
echo 'compute-largem="custom-linux-medium-vault-latest"' >> "$GITHUB_OUTPUT" # 16x vCPUs, 64 GiB RAM, 600 GB SSD
echo 'compute-xlarge="custom-linux-xl-vault-latest"' >> "$GITHUB_OUTPUT" # 32x vCPUs, 128 GiB RAM, 1200 GB SSD
echo 'enterprise=' >> "$GITHUB_OUTPUT"
echo 'go-tags=' >> "$GITHUB_OUTPUT"
fi
@@ -62,7 +65,7 @@ jobs:
needs:
- setup
if: ${{ needs.setup.outputs.enterprise != '' && github.base_ref != '' }}
runs-on: ${{ fromJSON(needs.setup.outputs.compute-tiny) }}
runs-on: ${{ fromJSON(needs.setup.outputs.compute-small) }}
steps:
- uses: actions/checkout@c85c95e3d7251135ab7dc9ce3241c5835cc595a9 # v3.5.3
with:
@@ -113,7 +116,7 @@ jobs:
total-runners: 16
go-arch: amd64
go-tags: '${{ needs.setup.outputs.go-tags }},deadlock'
runs-on: ${{ needs.setup.outputs.compute-larger }}
runs-on: ${{ needs.setup.outputs.compute-large }}
enterprise: ${{ needs.setup.outputs.enterprise }}
secrets: inherit
@@ -137,7 +140,7 @@ jobs:
extra-flags: '-race'
go-arch: amd64
go-tags: ${{ needs.setup.outputs.go-tags }}
runs-on: ${{ needs.setup.outputs.compute-huge }}
runs-on: ${{ needs.setup.outputs.compute-large }}
enterprise: ${{ needs.setup.outputs.enterprise }}
name: "race"
secrets: inherit
@@ -162,7 +165,7 @@ jobs:
}
go-arch: amd64
go-tags: '${{ needs.setup.outputs.go-tags }},deadlock,cgo,fips,fips_140_2'
runs-on: ${{ needs.setup.outputs.compute-larger }}
runs-on: ${{ needs.setup.outputs.compute-large }}
enterprise: ${{ needs.setup.outputs.enterprise }}
name: "fips"
secrets: inherit
@@ -185,21 +188,21 @@ jobs:
permissions:
id-token: write
contents: read
runs-on: ${{ fromJSON(needs.setup.outputs.compute-larger) }}
runs-on: ${{ fromJSON(needs.setup.outputs.compute-largem) }}
steps:
- uses: actions/checkout@c85c95e3d7251135ab7dc9ce3241c5835cc595a9 # v3.5.3
- uses: ./.github/actions/set-up-go
with:
github-token: ${{ secrets.ELEVATED_GITHUB_TOKEN }}
# Setup node.js without caching to allow running npm install -g yarn (next step)
- uses: actions/setup-node@64ed1c7eab4cce3362f8c340dee64e5eaeef8f7c # v3.6.0
- uses: actions/setup-node@e33196f7422957bea03ed53f6fbb155025ffc7b8 # v3.7.0
with:
node-version-file: './ui/package.json'
- id: install-yarn
run: |
npm install -g yarn
# Setup node.js with caching using the yarn.lock file
- uses: actions/setup-node@64ed1c7eab4cce3362f8c340dee64e5eaeef8f7c # v3.6.0
- uses: actions/setup-node@e33196f7422957bea03ed53f6fbb155025ffc7b8 # v3.7.0
with:
node-version-file: './ui/package.json'
cache: yarn
@@ -271,7 +274,7 @@ jobs:
- test-go
- test-ui
if: always()
runs-on: ${{ fromJSON(needs.setup.outputs.compute-tiny) }}
runs-on: ${{ fromJSON(needs.setup.outputs.compute-small) }}
steps:
- run: |
tr -d '\n' <<< '${{ toJSON(needs.*.result) }}' | grep -q -v -E '(failure|cancelled)'

View File

@@ -19,7 +19,7 @@ jobs:
uses: actions/setup-go@fac708d6674e30b6ba41289acaab6d4b75aa0753 # v4.0.1
with:
cache: false # save cache space for vault builds: https://github.com/hashicorp/vault/pull/21764
go-version: 1.18
go-version-file: .go-version
- name: Set up Python
uses: actions/setup-python@bd6b4b6205c4dbad673328db7b31b7fab9e241c0 # v4.6.1
@@ -32,7 +32,7 @@ jobs:
repository: hashicorp/security-scanner
token: ${{ secrets.HASHIBOT_PRODSEC_GITHUB_TOKEN }}
path: security-scanner
ref: 5a491479f4131d343afe0a4f18f6fcd36639f3fa
ref: 52d94588851f38a416f11c1e727131b3c8b0dd4d
- name: Install dependencies
shell: bash
@@ -69,6 +69,8 @@ jobs:
#SEMGREP_BASELINE_REF: ${{ github.base_ref }}
with:
repository: "$PWD"
cache-build: true
cache-go-modules: false
- name: SARIF Output
shell: bash
@@ -78,6 +80,6 @@ jobs:
cat results.sarif
- name: Upload SARIF file
uses: github/codeql-action/upload-sarif@cdcdbb579706841c47f7063dda365e292e5cad7a # codeql-bundle-v2.13.4
uses: github/codeql-action/upload-sarif@46a6823b81f2d7c67ddf123851eea88365bc8a67 # codeql-bundle-v2.13.5
with:
sarif_file: results.sarif

View File

@@ -77,7 +77,7 @@ jobs:
- name: Set Up Git
run: git config --global url."https://${{ secrets.elevated_github_token }}:@github.com".insteadOf "https://github.com"
- name: Set Up Node
uses: actions/setup-node@64ed1c7eab4cce3362f8c340dee64e5eaeef8f7c # v3.6.0
uses: actions/setup-node@e33196f7422957bea03ed53f6fbb155025ffc7b8 # v3.7.0
with:
node-version-file: './ui/package.json'
- name: Set Up Terraform

View File

@@ -66,7 +66,6 @@ jobs:
- uses: ./.github/actions/set-up-go
with:
github-token: ${{ secrets.ELEVATED_GITHUB_TOKEN }}
no-restore: true # We don't need the vault Go modules when generating indices
- name: Authenticate to Vault
id: vault-auth
if: github.repository == 'hashicorp/vault-enterprise'
@@ -99,20 +98,12 @@ jobs:
if: github.repository != 'hashicorp/vault-enterprise'
run: |
git config --global url."https://${{ secrets.ELEVATED_GITHUB_TOKEN}}@github.com".insteadOf https://github.com
- run: go install gotest.tools/gotestsum@v1.9.0
- uses: ./.github/actions/set-up-gotestsum
- run: mkdir -p test-results/go-test
# We use a unique "read-" prefix to guarantee that we're not scribbling on
# the aggregated test data in the event of test failure. This key is
# unique for every test run and just used to restore the previous
# aggregated data. We persist all test data after a successful run and
# store that in the go-test-reports- cache.
- id: restore-from-cache
uses: actions/cache@88522ab9f39a2ea568f7027eddc7d8d8bc9d59c8 # v3.3.1
- uses: actions/cache/restore@88522ab9f39a2ea568f7027eddc7d8d8bc9d59c8 # v3.3.1
with:
path: test-results/go-test
key: read-go-test-reports-${{ github.run_number }}
key: go-test-reports-${{ github.run_number }}
restore-keys: go-test-reports-
- name: List cached results
id: list-cached-results
@@ -221,6 +212,7 @@ jobs:
env:
GOPRIVATE: github.com/hashicorp/*
run: time make ci-bootstrap dev
- uses: ./.github/actions/set-up-gotestsum
- id: run-go-tests
name: Run Go tests
timeout-minutes: ${{ fromJSON(env.TIMEOUT_IN_MINUTES) }}
@@ -275,7 +267,7 @@ jobs:
# shellcheck disable=SC2086 # can't quote RERUN_FAILS
GOARCH=${{ inputs.go-arch }} \
go run gotest.tools/gotestsum --format=short-verbose \
gotestsum --format=short-verbose \
--junitfile test-results/go-test/results-${{ matrix.id }}.xml \
--jsonfile test-results/go-test/results-${{ matrix.id }}.json \
--jsonfile-timing-events failure-summary-${{ matrix.id }}${{ inputs.name != '' && '-' || '' }}${{ inputs.name }}.json \
@@ -383,11 +375,6 @@ jobs:
needs: test-go
runs-on: ${{ fromJSON(inputs.runs-on) }}
steps:
- uses: actions/cache@88522ab9f39a2ea568f7027eddc7d8d8bc9d59c8 # v3.3.1
with:
path: test-results/go-test
key: go-test-reports-${{ github.run_number }}
restore-keys: go-test-reports-
- uses: actions/download-artifact@9bc31d5ccc31df68ecc42ccf4149144866c47d8a # v3.0.2
with:
name: test-results
@@ -396,3 +383,7 @@ jobs:
ls -lhR test-results/go-test
find test-results/go-test -mindepth 1 -mtime +3 -delete
ls -lhR test-results/go-test
- uses: actions/cache/save@88522ab9f39a2ea568f7027eddc7d8d8bc9d59c8 # v3.3.1
with:
path: test-results/go-test
key: go-test-reports-${{ github.run_number }}