Files
firezone/.github/workflows/_integration_tests.yml
Thomas Eizinger b11adfcfe4 feat(connlib): create flow on ICMP error "prohibited" (#10462)
In Firezone, a Client requests an "access authorization" for a Resource
on the fly when it sees the first packet for said Resource going through
the tunnel. If we don't have a connection to the Gateway yet, this is
also where we will establish a connection and create the WireGuard
tunnel.

In order for this to work, the access authorization state between the
Client and the Gateway MUST NOT get out of sync. If the Client thinks it
has access to a Resource, it will just route the traffic to the Gateway.
If the access authorization on the Gateway has expired or vanished
otherwise, the packets will be black-holed.

Starting with #9816, the Gateway sends ICMP errors back to the
application whenever it filters a packet. This can happen either because
the access authorization is gone or because the traffic wasn't allowed
by the specific filter rules on the Resource.

With this patch, the Client will attempt to create a new flow (i.e.
re-authorize) traffic for this resource whenever it sees such an ICMP
error, therefore acting as a way of synchronizing the view of the world
between Client and Gateway should they ever run out of sync.

Testing turned out to be a bit tricky. If we let the authorization on
the Gateway lapse naturally, we portal will also toggle the Resource off
and on on the Client, resulting in "flushing" the current
authorizations. Additionally, it the Client had only access to one
Resource, then the Gateway will gracefully close the connection, also
resulting in the Client creating a new flow for the next packet.

To actually trigger this new behaviour we need to:

- Access at least two resources via the same Gateway
- Directly send `reject_access` to the Gateway for this particular
resource

To achieve this, we dynamically eval some code on the API node and
instruct the Gateway channel to send `reject_access`. The connection
stays intact because there is still another active access authorization
but packets for the other resource are answered with ICMP errors.

To achieve a safe roll-out, the new behaviour is feature-flagged. In
order to still test it, we now also allow feature flags to be set via
env variables.

Resolves: #10074

---------

Co-authored-by: Mariusz Klochowicz <mariusz@klochowicz.com>
2025-09-30 08:23:39 +00:00

261 lines
9.3 KiB
YAML

name: Integration Tests
run-name: Triggered from ${{ github.event_name }} by ${{ github.actor }}
on:
workflow_call:
inputs:
domain_image:
required: false
type: string
default: "ghcr.io/firezone/domain"
domain_tag:
required: false
type: string
default: ${{ github.sha }}
api_image:
required: false
type: string
default: "ghcr.io/firezone/api"
api_tag:
required: false
type: string
default: ${{ github.sha }}
web_image:
required: false
type: string
default: "ghcr.io/firezone/web"
web_tag:
required: false
type: string
default: ${{ github.sha }}
elixir_image:
required: false
type: string
default: "ghcr.io/firezone/elixir"
elixir_tag:
required: false
type: string
default: ${{ github.sha }}
relay_image:
required: false
type: string
default: "ghcr.io/firezone/debug/relay"
relay_tag:
required: false
type: string
default: ${{ github.sha }}
gateway_image:
required: false
type: string
default: "ghcr.io/firezone/debug/gateway"
gateway_tag:
required: false
type: string
default: ${{ github.sha }}
client_image:
required: false
type: string
default: "ghcr.io/firezone/debug/client"
client_tag:
required: false
type: string
default: ${{ github.sha }}
http_test_server_image:
required: false
type: string
default: "ghcr.io/firezone/debug/http-test-server"
http_test_server_tag:
required: false
type: string
default: ${{ github.sha }}
env:
COMPOSE_PARALLEL_LIMIT: 1 # Temporary fix for https://github.com/docker/compose/pull/12752 until compose v2.36.0 lands on GitHub actions runners.
jobs:
integration-tests:
name: ${{ matrix.test.name || matrix.test.script }}
runs-on: ubuntu-24.04
permissions:
contents: read
id-token: write
pull-requests: write
env:
DOMAIN_IMAGE: ${{ inputs.domain_image }}
DOMAIN_TAG: ${{ inputs.domain_tag }}
API_IMAGE: ${{ inputs.api_image }}
API_TAG: ${{ inputs.api_tag }}
WEB_IMAGE: ${{ inputs.web_image }}
WEB_TAG: ${{ inputs.web_tag }}
RELAY_IMAGE: ${{ inputs.relay_image }}
RELAY_TAG: ${{ inputs.relay_tag }}
GATEWAY_IMAGE: ${{ inputs.gateway_image }}
GATEWAY_TAG: ${{ inputs.gateway_tag }}
CLIENT_IMAGE: ${{ inputs.client_image }}
CLIENT_TAG: ${{ inputs.client_tag }}
ELIXIR_IMAGE: ${{ inputs.elixir_image }}
ELIXIR_TAG: ${{ inputs.elixir_tag }}
HTTP_TEST_SERVER_IMAGE: ${{ inputs.http_test_server_image }}
HTTP_TEST_SERVER_TAG: ${{ inputs.http_test_server_tag }}
FIREZONE_INC_BUF: true
strategy:
fail-fast: false
matrix:
test:
- script: create-flow-from-icmp-error
min_client_version: 1.5.4
- script: curl-api-down
- script: curl-api-restart
- script: curl-ecn
- script: dns
- script: dns-api-down
- script: dns-nm
- script: dns-two-resources
- name: dns-systemd-resolved
script: systemd/dns-systemd-resolved
- script: tcp-dns
# Setting both client and gateway to random masquerade will force relay-relay candidate pair
- name: download-double-symmetric-nat
script: download
client_masquerade: random
gateway_masquerade: random
rust_log: debug
single_relay: true # Force single relay
- script: download-packet-loss
rust_log: debug
- script: download-roaming-network
# Too noisy can cause flaky tests due to the amount of data
rust_log: debug
steps:
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- uses: ./.github/actions/ghcr-docker-login
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
- name: Check minimum client version
id: version_check
if: ${{ matrix.test.min_client_version }}
continue-on-error: true
run: |
ACTUAL_VERSION=$(docker run ${{ inputs.client_image }}:${{ inputs.client_tag }} firezone-headless-client --version | awk '{print $2}')
MIN_VERSION="${{ matrix.test.min_client_version }}"
[ "$(printf '%s\n' "$MIN_VERSION" "$ACTUAL_VERSION" | sort --version-sort | head -n1)" == "$MIN_VERSION" ]
# We need at least Docker v28.1 which is not yet available on GitHub actions runners
- uses: docker/setup-docker-action@b60f85385d03ac8acfca6d9996982511d8620a19 # v4.3.0
- name: Seed database
run: docker compose run elixir /bin/sh -c 'cd apps/domain && mix ecto.migrate --migrations-path priv/repo/migrations --migrations-path priv/repo/manual_migrations && mix ecto.seed'
- name: Start docker compose in the background
run: |
set -xe
if [[ -n "${{ matrix.test.rust_log }}" ]]; then
export RUST_LOG="${{ matrix.test.rust_log }}"
fi
if [[ -n "${{ matrix.test.client_masquerade }}" ]]; then
export CLIENT_MASQUERADE="${{ matrix.test.client_masquerade }}"
fi
if [[ -n "${{ matrix.test.gateway_masquerade }}" ]]; then
export GATEWAY_MASQUERADE="${{ matrix.test.gateway_masquerade }}"
fi
docker compose build client-router gateway-router relay-1-router relay-2-router api-router
# Start one-by-one to avoid variability in service startup order
docker compose up -d dns.httpbin.search.test --no-build
docker compose up -d httpbin --no-build
docker compose up -d download.httpbin --no-build
docker compose up -d api web domain --no-build
docker compose up -d otel --no-build
docker compose up -d relay-1 --no-build
docker compose up -d relay-2 --no-build
docker compose up -d gateway --no-build
docker compose up -d client --no-build
docker compose up -d network-config
docker compose exec -d relay-1 /bin/sh -c 'xdpdump -i eth0 -w /tmp/packets.pcap --rx-capture entry,exit'
docker compose exec -d relay-2 /bin/sh -c 'xdpdump -i eth0 -w /tmp/packets.pcap --rx-capture entry,exit'
if [[ -n "${{ matrix.test.single_relay }}" ]]; then
docker compose stop relay-2
fi
sleep 3 # Let everything settle for a bit
- name: Disable checksum offloading
run: |
# Force checksum calculation on the host since some tests run on the host
sudo ethtool -K eth0 tx off
sudo ethtool -K docker0 tx off
- run: ./scripts/tests/${{ matrix.test.script }}.sh
if: ${{ steps.version_check.outcome != 'failure' }} # Run the script if version check succeeds or is skipped
- name: Ensure Client emitted no warnings
if: "!cancelled()"
run: |
# Disabling checksum offloading causes one or two "I/O error (os error 5)" warnings
docker compose logs client | \
grep --invert "I/O error (os error 5)" | \
grep "WARN" && exit 1 || exit 0
- name: Show Client logs
if: "!cancelled()"
run: docker compose logs client
- name: Show Relay-1 logs
if: "!cancelled()"
run: docker compose logs relay-1
- name: Show Relay-2 logs
if: "!cancelled()"
run: docker compose logs relay-2
- name: Ensure Gateway emitted no warnings
if: "!cancelled()"
run: |
# Disabling checksum offloading causes one or two "I/O error (os error 5)" warnings
docker compose logs gateway | \
grep --invert "I/O error (os error 5)" | \
grep "WARN" && exit 1 || exit 0
- name: Show Gateway logs
if: "!cancelled()"
run: docker compose logs gateway
- name: Show API logs
if: "!cancelled()"
run: docker compose logs api
- name: Ensure no eBPF checksum errors on relay-1
if: "!cancelled()"
run: |
set -xe
docker compose exec relay-1 pkill xdpdump
docker compose cp relay-1:/tmp/packets.pcap ./relay-1-packets.pcap
! tcpdump -nnnr ./relay-1-packets.pcap -v | grep "bad \w* cksum"
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
if: "!success()"
with:
overwrite: true
name: ${{ matrix.test.name || matrix.test.script }}-relay-1-xdpdump
path: ./relay-1-packets.pcap
- name: Ensure no eBPF checksum errors on relay-2
if: "!cancelled() && !matrix.test.single_relay"
run: |
set -xe
docker compose exec relay-2 pkill xdpdump
docker compose cp relay-2:/tmp/packets.pcap ./relay-2-packets.pcap
! tcpdump -nnnr ./relay-2-packets.pcap -v | grep "bad \w* cksum"
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
if: "!success() && !matrix.test.single_relay"
with:
overwrite: true
name: ${{ matrix.test.name || matrix.test.script }}-relay-2-xdpdump
path: ./relay-2-packets.pcap