mirror of
https://github.com/outbackdingo/firezone.git
synced 2026-01-27 18:18:55 +00:00
In CI, eBPF in driver mode actually functions just fine with no changes to our existing tests, given we apply a few workarounds and bugfixes: - The interface learning mechanism had two flaws: (1) it only learned per-CPU, which meant the risk for a missing entry grew as the core count of the relay host grew, and (2) it did not filter for unicast IPs, so it picked up broadcast and link-local addresses, causing cross-relay paths to fail occasionally - The `relay-relay` candidate where the two relays are the same relay causes packet drops / loops in the Docker bridge setup, and possibly in GCP too. I'm not sure this is a valid path that solves a real connectivity issue in the wild. I can understand relay-relay paths where two relays are different hosts, and the client and gateway both talk over their TURN channel to each other (i.e. WireGuard is blocked in each of their networks), but I can't think of an advantage for a relay-relay candidate where the traffic simply hairpins (or is dropped) off the nearest switch. This has been now detected with a new `PacketLoop` error that triggers whenever source_ip == dest_ip. - The relays in CI need a common next-hop to talk to for the MAC address swapping to work. A simple router service is added which functions as a basic L3 router (no NAT) that allows the MAC swapping to work. - The `veth` driver has some peculiar requirements to allow it to function with XDP_TX. If you send a packet out of one interface of a veth pair with XDP_TX, you need to either make sure both interfaces have GRO enabled, or you need to attach a dummy XDP program that simply does XDP_PASS to the other interface so that the sk_buff is allocated before going up the stack to the Docker bridge. The GRO method was unreliable and didn't work in our case, causing massive packet delays and unpredictable bursts that prevented ICE from working, so we use the XDP_PASS method instead. A simple docker image is built and lives at https://github.com/firezone/xdp-pass to handle this. Related: #10138 Related: #10260
196 lines
6.7 KiB
YAML
196 lines
6.7 KiB
YAML
name: Integration Tests
|
|
run-name: Triggered from ${{ github.event_name }} by ${{ github.actor }}
|
|
on:
|
|
workflow_call:
|
|
inputs:
|
|
domain_image:
|
|
required: false
|
|
type: string
|
|
default: "ghcr.io/firezone/domain"
|
|
domain_tag:
|
|
required: false
|
|
type: string
|
|
default: ${{ github.sha }}
|
|
api_image:
|
|
required: false
|
|
type: string
|
|
default: "ghcr.io/firezone/api"
|
|
api_tag:
|
|
required: false
|
|
type: string
|
|
default: ${{ github.sha }}
|
|
web_image:
|
|
required: false
|
|
type: string
|
|
default: "ghcr.io/firezone/web"
|
|
web_tag:
|
|
required: false
|
|
type: string
|
|
default: ${{ github.sha }}
|
|
elixir_image:
|
|
required: false
|
|
type: string
|
|
default: "ghcr.io/firezone/elixir"
|
|
elixir_tag:
|
|
required: false
|
|
type: string
|
|
default: ${{ github.sha }}
|
|
relay_image:
|
|
required: false
|
|
type: string
|
|
default: "ghcr.io/firezone/debug/relay"
|
|
relay_tag:
|
|
required: false
|
|
type: string
|
|
default: ${{ github.sha }}
|
|
gateway_image:
|
|
required: false
|
|
type: string
|
|
default: "ghcr.io/firezone/debug/gateway"
|
|
gateway_tag:
|
|
required: false
|
|
type: string
|
|
default: ${{ github.sha }}
|
|
client_image:
|
|
required: false
|
|
type: string
|
|
default: "ghcr.io/firezone/debug/client"
|
|
client_tag:
|
|
required: false
|
|
type: string
|
|
default: ${{ github.sha }}
|
|
http_test_server_image:
|
|
required: false
|
|
type: string
|
|
default: "ghcr.io/firezone/debug/http-test-server"
|
|
http_test_server_tag:
|
|
required: false
|
|
type: string
|
|
default: ${{ github.sha }}
|
|
|
|
env:
|
|
COMPOSE_PARALLEL_LIMIT: 1 # Temporary fix for https://github.com/docker/compose/pull/12752 until compose v2.36.0 lands on GitHub actions runners.
|
|
|
|
jobs:
|
|
integration-tests:
|
|
name: ${{ matrix.test.name }}
|
|
runs-on: ubuntu-22.04
|
|
permissions:
|
|
contents: read
|
|
id-token: write
|
|
pull-requests: write
|
|
env:
|
|
DOMAIN_IMAGE: ${{ inputs.domain_image }}
|
|
DOMAIN_TAG: ${{ inputs.domain_tag }}
|
|
API_IMAGE: ${{ inputs.api_image }}
|
|
API_TAG: ${{ inputs.api_tag }}
|
|
WEB_IMAGE: ${{ inputs.web_image }}
|
|
WEB_TAG: ${{ inputs.web_tag }}
|
|
RELAY_IMAGE: ${{ inputs.relay_image }}
|
|
RELAY_TAG: ${{ inputs.relay_tag }}
|
|
GATEWAY_IMAGE: ${{ inputs.gateway_image }}
|
|
GATEWAY_TAG: ${{ inputs.gateway_tag }}
|
|
CLIENT_IMAGE: ${{ inputs.client_image }}
|
|
CLIENT_TAG: ${{ inputs.client_tag }}
|
|
ELIXIR_IMAGE: ${{ inputs.elixir_image }}
|
|
ELIXIR_TAG: ${{ inputs.elixir_tag }}
|
|
HTTP_TEST_SERVER_IMAGE: ${{ inputs.http_test_server_image }}
|
|
HTTP_TEST_SERVER_TAG: ${{ inputs.http_test_server_tag }}
|
|
strategy:
|
|
fail-fast: false
|
|
matrix:
|
|
test:
|
|
- name: direct-curl-api-down
|
|
- name: direct-curl-api-restart
|
|
- name: direct-curl-ecn
|
|
- name: direct-download-packet-loss
|
|
- name: direct-dns-api-down
|
|
- name: direct-dns-two-resources
|
|
- name: direct-dns
|
|
- name: direct-download-roaming-network
|
|
# Too noisy can cause flaky tests due to the amount of data
|
|
rust_log: debug
|
|
- name: dns-nm
|
|
- name: tcp-dns
|
|
- name: relay-graceful-shutdown
|
|
- name: systemd/dns-systemd-resolved
|
|
steps:
|
|
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
|
|
- uses: ./.github/actions/ghcr-docker-login
|
|
with:
|
|
github_token: ${{ secrets.GITHUB_TOKEN }}
|
|
- name: Seed database
|
|
run: docker compose run elixir /bin/sh -c 'cd apps/domain && mix ecto.migrate --migrations-path priv/repo/migrations --migrations-path priv/repo/manual_migrations && mix ecto.seed'
|
|
- name: Start docker compose in the background
|
|
run: |
|
|
set -xe
|
|
|
|
if [[ -n "${{ matrix.test.rust_log }}" ]]; then
|
|
export RUST_LOG="${{ matrix.test.rust_log }}"
|
|
fi
|
|
|
|
# Start one-by-one to avoid variability in service startup order
|
|
docker compose up -d dns.httpbin.search.test --no-build
|
|
docker compose up -d httpbin --no-build
|
|
docker compose up -d download.httpbin --no-build
|
|
docker compose up -d api web domain --no-build
|
|
docker compose up -d otel --no-build
|
|
docker compose up -d relay-1 --no-build
|
|
docker compose up -d relay-2 --no-build
|
|
docker compose up -d gateway --no-build
|
|
docker compose up -d client --no-build
|
|
docker compose up veth-config
|
|
|
|
# Wait a few seconds for the services to fully start. GH runners are
|
|
# slow, so this gives the Client enough time to initialize its tun interface,
|
|
# for example.
|
|
# Intended to mitigate <https://github.com/firezone/firezone/issues/5830>
|
|
sleep 3
|
|
|
|
- name: Add 50ms simulated API latency
|
|
run: |
|
|
docker compose exec -T -u root api sh -c 'apk add --update --no-cache iproute2-tc'
|
|
docker compose exec -T -u root api sh -c 'tc qdisc add dev eth0 root netem delay 50ms'
|
|
|
|
- name: Add 10ms simulated gateway latency
|
|
run: |
|
|
# compatibility test images won't have the `tc` command
|
|
docker compose exec -T gateway sh -c 'apk add --update --no-cache iproute2-tc'
|
|
docker compose exec -T gateway sh -c 'tc qdisc add dev eth0 root netem delay 10ms'
|
|
|
|
- run: ./scripts/tests/${{ matrix.test.name }}.sh
|
|
|
|
- name: Ensure Client emitted no warnings
|
|
if: "!cancelled()"
|
|
run: |
|
|
# Disabling checksum offloading causes one or two "I/O error (os error 5)" warnings
|
|
docker compose logs client | \
|
|
grep --invert "I/O error (os error 5)" | \
|
|
grep "WARN" && exit 1 || exit 0
|
|
- name: Show Client logs
|
|
if: "!cancelled()"
|
|
run: docker compose logs client
|
|
|
|
- name: Show Relay-1 logs
|
|
if: "!cancelled()"
|
|
run: docker compose logs relay-1
|
|
|
|
- name: Show Relay-2 logs
|
|
if: "!cancelled()"
|
|
run: docker compose logs relay-2
|
|
|
|
- name: Ensure Gateway emitted no warnings
|
|
if: "!cancelled()"
|
|
run: |
|
|
# Disabling checksum offloading causes one or two "I/O error (os error 5)" warnings
|
|
docker compose logs gateway | \
|
|
grep --invert "I/O error (os error 5)" | \
|
|
grep "WARN" && exit 1 || exit 0
|
|
- name: Show Gateway logs
|
|
if: "!cancelled()"
|
|
run: docker compose logs gateway
|
|
|
|
- name: Show API logs
|
|
if: "!cancelled()"
|
|
run: docker compose logs api
|