[QT-590] Optimize the CI testing workflow (#21959)

We further optimize the CI workflow for better costs and speed. We tested the Go CI workflows across several instance classes and update our compute choices. We achieve an average execution speed improvement of 2-2.5 minutes per test workflow while reducing the infrastructure cost by about 20%. We also also save another ~2 minutes by installing `gotestsum` from the Github release instead of downloading the Go modules and compiling it every time. In addition to the speed improvements, we also further reduced our cache usage by updating the `security-scan` workflow to not cache Go modules. We also use the `cache/save` and `cache/restore` actions for timing caches. This results is saving half as many cache results for timing data. *UI test results* results for 2x runs: * c6a.2xlarge (12m54s, 11m55s) * c6a.4xlarge (10m47s, 11m6s) * c6a.8xlarge (11m32s, 10m51s) * m5.2xlarge (15m23s, 14m16s) * m5.4xlarge (14m48s, 12m54s) * m5.8xlarge (12m27s, 12m24s) * m6a.2xlarge (11m55s, 12m20s) * m6a.4xlarge (10m54s, 10m43s) * m6a.8xlarge (10m33s, 10m51s) Current runner: m5.2xlarge (15m23s, 14m16s, avg 14m50s) @ 0.448/hr = $0.11 Faster candidates * c6a.2xlarge (12m54s, 11m55s, avg 12m24s) @ 0.3816/hr = $0.078 * m6a.2xlarge (11m55s, 12m20s, avg 12m8s) @ 0.4032/hr = $0.081 * c6a.4xlarge (10m47s, 11m6s, avg 10m56s) @ 0.7632/hr = $0.139 * m6a.4xlarge (10m54s, 10m43s, avg 10m48s) @ 0.8064/hr = $0.140 Best bang for the buck for test-ui: m6a.2xlarge, > 25% cost savings from current and we save ~2.5 minutes. *Go test results* During testing the external replication tests, when not broken up, will always take the longest. Our original analysis focuses on this job. Most other tests groups will finish ~3m faster so we'll use subtract that time when estimating the cost for the whole job. external replication job results: * c6a.2xlarge (20m49s, 19m20s, avg 20m5s) * c6a.4xlarge (19m1s, 19m38s, avg 19m20s) * c6a.8xlarge (19m51s, 18m54s, avg 19m23s) * m5.2xlarge (22m12s, 20m29s, avg 21m20s) * m5.4xlarge (20m7s, 19m3s, avg 20m35s) * m5.8xlarge (20m24s, 19m42s, avg 20m3s) * m6a.2xlarge (21m10s, 19m37s, avg 20m23s) * m6a.4xlarge (18m58s, 19m51s, avg 19m24s) * m6a.8xlarge (19m27s, 18m47s, avg 19m7s) There is little separation in time when we increase class size. In the best case a class size increase yields about a ~5% performance increase and doubles the cost. For test-go our best bang for the buck is certainly going to be in the 2xlarge class. Current runner: m5.2xlarge (22m12s, 20m29s, avg 21m20s) @ 0.448/hr (16@avg-3m + 1@avg) = $2.35 Candidates in the same class * c6a.2xlarge (20m49s, 19m20s, avg 20m5s) @ 0.3816/hr (16@avg-3m + 1@avg) = $1.86 * m6a.2xlarge (21m10s, 19m37s, avg 20m23s) @ 0.4032/hr (16@avg-3m + 1@avg) = $2.00 Best bang for the buck for test-go: c6a.2xlarge: 20% cost savings and save about ~2.25 minutes. We ran the tests with similar instances and saw similar execution times as with test-go. Therefore we can use the same recommended instance sizes. After breaking up test-go's external replication tests, the longest group was shorter on average. I choose to look at group 3 as it was usually the longest grouping: * c6a.2xlarge: (14m51s, 14m48s) * c6a.4xlarge: (14m14s, 14m15) * c6a.8xlarge: (14m0s, 13m54s) * m5.2xlarge: (15m36s, 15m35s) * m5.4xlarge: (14m46s, 14m49s) * m5.8xlarge: (14m25s, 14m25s) * m6a.2xlarge: 14m51s, 14m53s) * m6a.4xlarge: 14m16s, 14m16s) * m6a.8xlarge: (14m2s, 13m57s) Again, we see ~5% performance gains between the 2x and 8x instance classes at quadruple the cost. The c6a and m6a families are almost identical, with the c6a class being cheaper. *Notes* * UI and Go Test timing results: https://github.com/hashicorp/vault-enterprise/actions/runs/5556957460/jobs/10150759959 * Go Test with data race detection timing results: https://github.com/hashicorp/vault-enterprise/actions/runs/5558013192 * Go Test with replication broken up: https://github.com/hashicorp/vault-enterprise/actions/runs/5558490899 Signed-off-by: Ryan Cragun <me@ryan.ec>
2025-11-02 19:47:54 +00:00 · 2023-07-20 14:10:08 -06:00
parent 6b21994d76
commit 1a46088afb
6 changed files with 106 additions and 58 deletions
--- a/.github/actions/set-up-gotestsum/action.yml
+++ b/.github/actions/set-up-gotestsum/action.yml
@@ -0,0 +1,52 @@
+---
+name: Set up gotestsum from Github releases
+description: Set up gotestsum from Github releases
+
+inputs:
+  destination:
+    description: "Where to install the gotestsum binary (default: $HOME/bin/gotestsum)"
+    type: boolean
+    default: "$HOME/bin"
+  version:
+    description: "The version to install (default: latest)"
+    type: string
+    default: Latest
+
+outputs:
+  destination:
+    description: Where the installed gotestsum binary is
+    value: ${{ steps.install.outputs.destination }}
+  destination-dir:
+    description: The directory where the installed gotestsum binary is
+    value: ${{ steps.install.outputs.destination-dir }}
+  version:
+    description: The installed version of gotestsum
+    value: ${{ steps.install.outputs.version }}
+
+runs:
+  using: composite
+  steps:
+    - id: install
+      shell: bash
+      env:
+        GH_TOKEN: ${{ github.token }}
+      run: |
+        VERSION=$(gh release list -R gotestyourself/gotestsum --exclude-drafts --exclude-pre-releases | grep Latest | cut -f1)
+        echo "version=$VERSION" >> "$GITHUB_OUTPUT"
+
+        mkdir -p "$HOME/bin"
+        DESTINATION="$(readlink -f "$HOME/bin")"
+        echo "destination=$DESTINATION" >> "GITHUB_OUTPUT"
+        DESTINATION_DIR="$(dirname "$DESTINATION")"
+        echo "$DESTINATION_DIR" >> "$GITHUB_PATH"
+        echo "destination-dir=$DESTINATION_DIR" >> "GITHUB_OUTPUT"
+
+        OS="$(echo "$RUNNER_OS" | tr '[:upper:]' '[:lower:]')"
+        ARCH="$(echo "$RUNNER_ARCH" | tr '[:upper:]' '[:lower:]')"
+        if [ "$ARCH" = "x64" ]; then
+          export ARCH="amd64"
+        fi
+
+        gh release download "$VERSION" -p "*${OS}_${ARCH}.tar.gz" -O gotestsum.tgz -R gotestyourself/gotestsum
+        tar -xvf gotestsum.tgz
+        mv gotestsum "${DESTINATION_DIR}/gotestsum"
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -94,7 +94,7 @@ jobs:
          key: ui-${{ steps.ui-hash.outputs.ui-hash }}
      - if: steps.cache-ui-assets.outputs.cache-hit != 'true'
        name: Set up node and yarn
-        uses: actions/setup-node@64ed1c7eab4cce3362f8c340dee64e5eaeef8f7c # v3.6.0
+        uses: actions/setup-node@e33196f7422957bea03ed53f6fbb155025ffc7b8 # v3.7.0
        with:
          node-version-file: ui/package.json
          cache: yarn
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -21,10 +21,11 @@ jobs:
    name: Setup
    runs-on: ubuntu-latest
    outputs:
-      compute-tiny: ${{ steps.setup-outputs.outputs.compute-tiny }}
-      compute-standard: ${{ steps.setup-outputs.outputs.compute-standard }}
-      compute-larger: ${{ steps.setup-outputs.outputs.compute-larger }}
-      compute-huge: ${{ steps.setup-outputs.outputs.compute-huge }}
+      compute-small: ${{ steps.setup-outputs.outputs.compute-small }}
+      compute-medium: ${{ steps.setup-outputs.outputs.compute-medium }}
+      compute-large: ${{ steps.setup-outputs.outputs.compute-large }}
+      compute-largem: ${{ steps.setup-outputs.outputs.compute-largem }}
+      compute-xlarge: ${{ steps.setup-outputs.outputs.compute-xlarge }}
      enterprise: ${{ steps.setup-outputs.outputs.enterprise }}
      go-tags: ${{ steps.setup-outputs.outputs.go-tags }}
    steps:
@@ -36,18 +37,20 @@ jobs:

          if [ "${github_repository##*/}" == "vault-enterprise" ] ; then
            # shellcheck disable=SC2129
-            echo 'compute-tiny=["self-hosted","ondemand","linux","type=m5.large"]' >> "$GITHUB_OUTPUT"
-            echo 'compute-standard=["self-hosted","ondemand","linux","type=m5.xlarge"]' >> "$GITHUB_OUTPUT"
-            echo 'compute-larger=["self-hosted","ondemand","linux","type=m5.2xlarge"]' >> "$GITHUB_OUTPUT"
-            echo 'compute-huge=["self-hosted","ondemand","linux","type=m5.4xlarge"]' >> "$GITHUB_OUTPUT"
+            echo 'compute-small=["self-hosted","ondemand","linux","type=c6a.large"]' >> "$GITHUB_OUTPUT"     #  2x vCPUs,  4 GiB RAM,
+            echo 'compute-medium=["self-hosted","ondemand","linux","type=c6a.xlarge"]' >> "$GITHUB_OUTPUT"   #  4x vCPUs,  8 GiB RAM,
+            echo 'compute-large=["self-hosted","ondemand","linux","type=c6a.2xlarge"]' >> "$GITHUB_OUTPUT"   #  8x vCPUs, 16 GiB RAM,
+            echo 'compute-largem=["self-hosted","ondemand","linux","type=m6a.2xlarge"]' >> "$GITHUB_OUTPUT"  #  8x vCPUs, 32 GiB RAM,
+            echo 'compute-xlarge=["self-hosted","ondemand","linux","type=c6a.4xlarge"]' >> "$GITHUB_OUTPUT"  # 16x vCPUs, 32 GiB RAM,
            echo 'enterprise=1' >> "$GITHUB_OUTPUT"
            echo 'go-tags=ent,enterprise' >> "$GITHUB_OUTPUT"
          else
            # shellcheck disable=SC2129
-            echo 'compute-tiny="ubuntu-latest"' >> "$GITHUB_OUTPUT"                         #  2 cores,   7 GB RAM,   14 GB SSD
-            echo 'compute-standard="custom-linux-small-vault-latest"' >> "$GITHUB_OUTPUT"   #  8 cores,  32 GB RAM,  300 GB SSD
-            echo 'compute-larger="custom-linux-medium-vault-latest"' >> "$GITHUB_OUTPUT"    # 16 cores,  64 GB RAM,  600 GB SSD
-            echo 'compute-huge="custom-linux-xl-vault-latest"' >> "$GITHUB_OUTPUT"          # 32-cores, 128 GB RAM, 1200 GB SSD
+            echo 'compute-small="ubuntu-latest"' >> "$GITHUB_OUTPUT"                      #  2x vCPUs,   7 GiB RAM,   14 GB SSD
+            echo 'compute-medium="custom-linux-small-vault-latest"' >> "$GITHUB_OUTPUT"   #  8x vCPUs,  32 GiB RAM,  300 GB SSD
+            echo 'compute-large="custom-linux-medium-vault-latest"' >> "$GITHUB_OUTPUT"   # 16x vCPUs,  64 GiB RAM,  600 GB SSD
+            echo 'compute-largem="custom-linux-medium-vault-latest"' >> "$GITHUB_OUTPUT"  # 16x vCPUs,  64 GiB RAM,  600 GB SSD
+            echo 'compute-xlarge="custom-linux-xl-vault-latest"' >> "$GITHUB_OUTPUT"      # 32x vCPUs, 128 GiB RAM, 1200 GB SSD
            echo 'enterprise=' >> "$GITHUB_OUTPUT"
            echo 'go-tags=' >> "$GITHUB_OUTPUT"
          fi
@@ -62,7 +65,7 @@ jobs:
    needs:
      - setup
    if: ${{ needs.setup.outputs.enterprise != '' && github.base_ref != '' }}
-    runs-on: ${{ fromJSON(needs.setup.outputs.compute-tiny) }}
+    runs-on: ${{ fromJSON(needs.setup.outputs.compute-small) }}
    steps:
      - uses: actions/checkout@c85c95e3d7251135ab7dc9ce3241c5835cc595a9 # v3.5.3
        with:
@@ -113,7 +116,7 @@ jobs:
      total-runners: 16
      go-arch: amd64
      go-tags: '${{ needs.setup.outputs.go-tags }},deadlock'
-      runs-on: ${{ needs.setup.outputs.compute-larger }}
+      runs-on: ${{ needs.setup.outputs.compute-large }}
      enterprise: ${{ needs.setup.outputs.enterprise }}
    secrets: inherit

@@ -137,7 +140,7 @@ jobs:
      extra-flags: '-race'
      go-arch: amd64
      go-tags: ${{ needs.setup.outputs.go-tags }}
-      runs-on: ${{ needs.setup.outputs.compute-huge }}
+      runs-on: ${{ needs.setup.outputs.compute-large }}
      enterprise: ${{ needs.setup.outputs.enterprise }}
      name: "race"
    secrets: inherit
@@ -162,7 +165,7 @@ jobs:
        }
      go-arch: amd64
      go-tags: '${{ needs.setup.outputs.go-tags }},deadlock,cgo,fips,fips_140_2'
-      runs-on: ${{ needs.setup.outputs.compute-larger }}
+      runs-on: ${{ needs.setup.outputs.compute-large }}
      enterprise: ${{ needs.setup.outputs.enterprise }}
      name: "fips"
    secrets: inherit
@@ -185,21 +188,21 @@ jobs:
    permissions:
      id-token: write
      contents: read
-    runs-on: ${{ fromJSON(needs.setup.outputs.compute-larger) }}
+    runs-on: ${{ fromJSON(needs.setup.outputs.compute-largem) }}
    steps:
      - uses: actions/checkout@c85c95e3d7251135ab7dc9ce3241c5835cc595a9 # v3.5.3
      - uses: ./.github/actions/set-up-go
        with:
          github-token: ${{ secrets.ELEVATED_GITHUB_TOKEN }}
      # Setup node.js without caching to allow running npm install -g yarn (next step)
-      - uses: actions/setup-node@64ed1c7eab4cce3362f8c340dee64e5eaeef8f7c # v3.6.0
+      - uses: actions/setup-node@e33196f7422957bea03ed53f6fbb155025ffc7b8 # v3.7.0
        with:
          node-version-file: './ui/package.json'
      - id: install-yarn
        run: |
          npm install -g yarn
      # Setup node.js with caching using the yarn.lock file
-      - uses: actions/setup-node@64ed1c7eab4cce3362f8c340dee64e5eaeef8f7c # v3.6.0
+      - uses: actions/setup-node@e33196f7422957bea03ed53f6fbb155025ffc7b8 # v3.7.0
        with:
          node-version-file: './ui/package.json'
          cache: yarn
@@ -271,7 +274,7 @@ jobs:
      - test-go
      - test-ui
    if: always()
-    runs-on: ${{ fromJSON(needs.setup.outputs.compute-tiny) }}
+    runs-on: ${{ fromJSON(needs.setup.outputs.compute-small) }}
    steps:
      - run: |
          tr -d '\n' <<< '${{ toJSON(needs.*.result) }}' | grep -q -v -E '(failure|cancelled)'
--- a/.github/workflows/security-scan.yml
+++ b/.github/workflows/security-scan.yml
@@ -19,7 +19,7 @@ jobs:
      uses: actions/setup-go@fac708d6674e30b6ba41289acaab6d4b75aa0753 # v4.0.1
      with:
        cache: false # save cache space for vault builds: https://github.com/hashicorp/vault/pull/21764
-        go-version: 1.18
+        go-version-file: .go-version

    - name: Set up Python
      uses: actions/setup-python@bd6b4b6205c4dbad673328db7b31b7fab9e241c0 # v4.6.1
@@ -32,7 +32,7 @@ jobs:
        repository: hashicorp/security-scanner
        token: ${{ secrets.HASHIBOT_PRODSEC_GITHUB_TOKEN }}
        path: security-scanner
-        ref: 5a491479f4131d343afe0a4f18f6fcd36639f3fa
+        ref: 52d94588851f38a416f11c1e727131b3c8b0dd4d

    - name: Install dependencies
      shell: bash
@@ -69,6 +69,8 @@ jobs:
        #SEMGREP_BASELINE_REF: ${{ github.base_ref }}
      with:
        repository: "$PWD"
+        cache-build: true
+        cache-go-modules: false

    - name: SARIF Output
      shell: bash
@@ -78,6 +80,6 @@ jobs:
        cat results.sarif

    - name: Upload SARIF file
-      uses: github/codeql-action/upload-sarif@cdcdbb579706841c47f7063dda365e292e5cad7a # codeql-bundle-v2.13.4
+      uses: github/codeql-action/upload-sarif@46a6823b81f2d7c67ddf123851eea88365bc8a67 # codeql-bundle-v2.13.5
      with:
        sarif_file: results.sarif
--- a/.github/workflows/test-enos-scenario-ui.yml
+++ b/.github/workflows/test-enos-scenario-ui.yml
@@ -77,7 +77,7 @@ jobs:
      - name: Set Up Git
        run: git config --global url."https://${{ secrets.elevated_github_token }}:@github.com".insteadOf "https://github.com"
      - name: Set Up Node
-        uses: actions/setup-node@64ed1c7eab4cce3362f8c340dee64e5eaeef8f7c # v3.6.0
+        uses: actions/setup-node@e33196f7422957bea03ed53f6fbb155025ffc7b8 # v3.7.0
        with:
          node-version-file: './ui/package.json'
      - name: Set Up Terraform
--- a/.github/workflows/test-go.yml
+++ b/.github/workflows/test-go.yml
@@ -66,7 +66,6 @@ jobs:
      - uses: ./.github/actions/set-up-go
        with:
          github-token: ${{ secrets.ELEVATED_GITHUB_TOKEN }}
-          no-restore: true # We don't need the vault Go modules when generating indices
      - name: Authenticate to Vault
        id: vault-auth
        if: github.repository == 'hashicorp/vault-enterprise'
@@ -99,20 +98,12 @@ jobs:
        if: github.repository != 'hashicorp/vault-enterprise'
        run: |
          git config --global url."https://${{ secrets.ELEVATED_GITHUB_TOKEN}}@github.com".insteadOf https://github.com
-      - run: go install gotest.tools/gotestsum@v1.9.0
-
+      - uses: ./.github/actions/set-up-gotestsum
      - run: mkdir -p test-results/go-test
-
-      # We use a unique "read-" prefix to guarantee that we're not scribbling on
-      # the aggregated test data in the event of test failure. This key is
-      # unique for every test run and just used to restore the previous
-      # aggregated data. We persist all test data after a successful run and
-      # store that in the go-test-reports- cache.
-      - id: restore-from-cache
-        uses: actions/cache@88522ab9f39a2ea568f7027eddc7d8d8bc9d59c8 # v3.3.1
+      - uses: actions/cache/restore@88522ab9f39a2ea568f7027eddc7d8d8bc9d59c8 # v3.3.1
        with:
          path: test-results/go-test
-          key: read-go-test-reports-${{ github.run_number }}
+          key: go-test-reports-${{ github.run_number }}
          restore-keys: go-test-reports-
      - name: List cached results
        id: list-cached-results
@@ -221,6 +212,7 @@ jobs:
        env:
          GOPRIVATE: github.com/hashicorp/*
        run: time make ci-bootstrap dev
+      - uses: ./.github/actions/set-up-gotestsum
      - id: run-go-tests
        name: Run Go tests
        timeout-minutes: ${{ fromJSON(env.TIMEOUT_IN_MINUTES) }}
@@ -275,7 +267,7 @@ jobs:

          # shellcheck disable=SC2086 # can't quote RERUN_FAILS
          GOARCH=${{ inputs.go-arch }} \
-            go run gotest.tools/gotestsum --format=short-verbose \
+            gotestsum --format=short-verbose \
              --junitfile test-results/go-test/results-${{ matrix.id }}.xml \
              --jsonfile test-results/go-test/results-${{ matrix.id }}.json \
              --jsonfile-timing-events failure-summary-${{ matrix.id }}${{ inputs.name != '' && '-' || '' }}${{ inputs.name }}.json \
@@ -383,11 +375,6 @@ jobs:
    needs: test-go
    runs-on: ${{ fromJSON(inputs.runs-on) }}
    steps:
-      - uses: actions/cache@88522ab9f39a2ea568f7027eddc7d8d8bc9d59c8 # v3.3.1
-        with:
-          path: test-results/go-test
-          key: go-test-reports-${{ github.run_number }}
-          restore-keys: go-test-reports-
      - uses: actions/download-artifact@9bc31d5ccc31df68ecc42ccf4149144866c47d8a # v3.0.2
        with:
          name: test-results
@@ -396,3 +383,7 @@ jobs:
          ls -lhR test-results/go-test
          find test-results/go-test -mindepth 1 -mtime +3 -delete
          ls -lhR test-results/go-test
+      - uses: actions/cache/save@88522ab9f39a2ea568f7027eddc7d8d8bc9d59c8 # v3.3.1
+        with:
+          path: test-results/go-test
+          key: go-test-reports-${{ github.run_number }}