Files
firezone/.github/workflows/_rust.yml
Thomas Eizinger 3c7ac084c0 feat(relay): MVP for routing channel data message in eBPF kernel (#8496)
## Abstract

This pull-request implements the first stage of off-loading routing of
TURN data channel messages to the kernel via an eBPF XDP program. In
particular, the eBPF kernel implemented here **only** handles the
decapsulation of IPv4 data channel messages into their embedded UDP
payload. Implementation of other data paths, such as the receiving of
UDP traffic on an allocation and wrapping it in a TURN channel data
message is deferred to a later point for reasons explained further down.
As it stands, this PR implements the bare minimum for us to start
experimenting and benefiting from eBPF. It is already massive as it is
due to the infrastructure required for actually doing this. Let's dive
into it!

## A refresher on TURN channel-data messages

TURN specifies a channel-data message for relaying data between two
peers. A channel data message has a fixed 4-byte header:

- The first two bytes specify the channel number
- The second two bytes specify the length of the encapsulated payload

Like all TURN traffic, channel data messages run over UDP by default,
meaning this header sits at the very front of the UDP payload. This will
be important later.

After making an allocation with a TURN server (i.e. reserving a port on
the TURN server's interfaces), a TURN client can bind channels on that
allocation. As such, channel numbers are scoped to a client's
allocation. Channel numbers are allocated by the client within a given
range (0x4000 - 0x4FFF). When binding a channel, the client specifies
the remote's peer address that they'd like the data sent on the channel
to be sent to.

Given this setup, when a TURN server receives a channel data message, it
first looks at the sender's IP + port to infer the allocation (a client
can only ever have 1 allocation at a time). Within that allocation, the
server then looks for the channel number and retrieves the target socket
address from that. The allocation itself is a port on the relay's
interface. With that, we can now "unpack" the payload of the channel
data message and rewrite it to the new receiver:

- The new source IP can be set from the old dst IP (when operating in
user-space mode this is irrelevant because we are working with the
socket API).
- The new source port is the client's allocation.
- The new destination IP is retrieved from the mapping retrieved via the
channel number.
- The new destination port is retrieved from the mapping retrieved via
the channel number.

Last but not least, all that is left is removing the channel data header
from the UDP payload and we can send out the packet. In other words, we
need to cut off the first 4 bytes of the UDP payload.

## User-space relaying

At present, we implement the above flow in user-space. This is tricky to
do because we need to bind _many_ sockets, one for each possible
allocation port (of which there can be 16383). The actual work to be
done on these packets is also extremely minimal. All we do is cut off
(or add on) the data-channel header. Benchmarks show that we spend
pretty much all of our time copying data between user-space and
kernel-space. Cutting this out should give us a massive increase in
performance.

## Implementing an eBPF XDP TURN router

eBPF has been shown to be a very efficient way of speeding up a TURN
server [0]. After many failed experiments (e.g. using TC instead of XDP)
and countless rabbit-holes, we have also arrived at the design
documented within the paper. Most notably:

- The eBPF program is entirely optional. We try to load it on startup,
but if that fails, we will simply use the user-space mode.
- Retaining the user-space mode is also important because under certain
circumstances, the eBPF kernel needs to pass on the packet, for example,
when receiving IPv4 packets with options. Those make the header
dynamically-sized which makes further processing difficult because the
eBPF verifier disallows indexing into the packet with data derived from
the packet itself.
- In order to add/remove the channel-data header, we shift the packet
headers backwards / forwards and leave the payload in place as the
packet headers are constant in size and can thus easily and cheaply be
copied out.

In order to perform the relaying flow explained above, we introduce maps
that are shared with user-space. These maps go from a tuple of
(client-socket, channel-number) to a tuple of (allocation-port,
peer-socket) and thus give us all the data necessary to rewrite the
packet.

## Integration with our relay

Last but not least, to actually integrate the eBPF kernel with our
relay, we need to extend the `Server` with two more events so we can
learn, when channel bindings are created and when they expire. Using
these events, we can then update the eBPF maps accordingly and therefore
influence the routing behaviour in the kernel.

## Scope

What is implemented here is only one of several possible data paths.
Implementing the others isn't conceptually difficult but it does
increase the scope. Landing something that already works allows us to
gain experience running it in staging (and possibly production).
Additionally, I've hit some issues with the eBPF verifier when adding
more codepaths to the kernel. I expect those to be possible to resolve
given sufficient debugging but I'd like to do so after merging this.

---

Depends-On: #8506
Depends-On: #8507
Depends-On: #8500
Resolves: #8501

[0]: https://dl.acm.org/doi/pdf/10.1145/3609021.3609296
2025-03-27 10:59:40 +00:00

205 lines
7.5 KiB
YAML

---
name: Rust
"on":
workflow_call:
defaults:
run:
working-directory: ./rust
permissions:
contents: "read"
id-token: "write"
# Never tolerate warnings. Duplicated in `_tauri.yml`
env:
RUSTFLAGS: "-Dwarnings --cfg tokio_unstable"
RUSTDOCFLAGS: "-D warnings"
jobs:
bench:
name: bench-${{ matrix.runs-on }}
strategy:
fail-fast: false
matrix:
runs-on: [
windows-2019, # Only platform with a benchmark right now
]
runs-on: ${{ matrix.runs-on }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: ./.github/actions/setup-rust
id: setup-rust
- run: cargo bench ${{ steps.setup-rust.outputs.bench-packages }}
env:
RUST_LOG: "debug"
name: "cargo bench"
shell: bash
static-analysis:
name: static-analysis-${{ matrix.runs-on }}
strategy:
fail-fast: false
matrix:
# TODO: https://github.com/rust-lang/cargo/issues/5220
runs-on: [ubuntu-22.04, macos-14, windows-2022]
runs-on: ${{ matrix.runs-on }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: ./.github/actions/setup-rust
id: setup-rust
- uses: ./.github/actions/setup-tauri-v2
timeout-minutes: 5
- uses: taiki-e/install-action@0b63bc859f7224657cf7e39426848cabaa36f456 # v2.49.9
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tool: cargo-udeps,cargo-deny
- uses: taiki-e/install-action@0b63bc859f7224657cf7e39426848cabaa36f456 # v2.49.9
if: ${{ runner.os == 'Linux' }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tool: bpf-linker
- run: |
cargo +${{ steps.setup-rust.outputs.nightly_version }} udeps --all-targets --all-features ${{ steps.setup-rust.outputs.packages }}
name: Check for unused dependencies
- run: cargo fmt -- --check
- run: cargo doc --all-features --no-deps --document-private-items ${{ steps.setup-rust.outputs.packages }}
name: "cargo doc"
shell: bash
- run: cargo clippy --all-targets --all-features ${{ steps.setup-rust.outputs.packages }}
name: "cargo clippy"
shell: bash
- run: cargo deny check --hide-inclusion-graph
shell: bash
test:
name: test-${{ matrix.runs-on }}
strategy:
fail-fast: false
matrix:
# TODO: https://github.com/rust-lang/cargo/issues/5220
runs-on:
[
ubuntu-22.04,
ubuntu-24.04,
macos-13,
macos-14,
macos-15,
windows-2019,
windows-2022,
]
runs-on: ${{ matrix.runs-on }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: ./.github/actions/setup-rust
id: setup-rust
- uses: ./.github/actions/setup-tauri-v2
- uses: taiki-e/install-action@0b63bc859f7224657cf7e39426848cabaa36f456 # v2.49.9
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tool: ripgrep
- uses: taiki-e/install-action@0b63bc859f7224657cf7e39426848cabaa36f456 # v2.49.9
if: ${{ runner.os == 'Linux' }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tool: bpf-linker
- name: "cargo test"
shell: bash
run: |
set -x
# First, run all tests.
cargo test --all-features ${{ steps.setup-rust.outputs.packages }} -- --include-ignored --nocapture
# Poor man's test coverage testing: Grep the generated logs for specific patterns / lines.
rg --count --no-ignore SendIcmpPacket $TESTCASES_DIR
rg --count --no-ignore SendUdpPacket $TESTCASES_DIR
rg --count --no-ignore SendTcpPayload $TESTCASES_DIR
rg --count --no-ignore SendDnsQueries $TESTCASES_DIR
rg --count --no-ignore "Packet for DNS resource" $TESTCASES_DIR
rg --count --no-ignore "Packet for CIDR resource" $TESTCASES_DIR
rg --count --no-ignore "Packet for Internet resource" $TESTCASES_DIR
rg --count --no-ignore "Performed IP-NAT46" $TESTCASES_DIR
rg --count --no-ignore "Performed IP-NAT64" $TESTCASES_DIR
rg --count --no-ignore "Truncating DNS response" $TESTCASES_DIR
rg --count --no-ignore "Destination is unreachable" $TESTCASES_DIR
rg --count --no-ignore "Forwarding query for DNS resource to corresponding site" $TESTCASES_DIR
env:
# <https://github.com/rust-lang/cargo/issues/5999>
# Needed to create tunnel interfaces in unit tests
CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUNNER: "sudo --preserve-env"
PROPTEST_VERBOSE: 0 # Otherwise the output is very long.
PROPTEST_CASES: 2000 # Default is only 256.
CARGO_PROFILE_TEST_OPT_LEVEL: 1 # Otherwise the tests take forever.
TESTCASES_DIR: "connlib/tunnel/testcases"
# Runs the Tauri client smoke test, built in debug mode. We can't run it in release
# mode because of a known issue: <https://github.com/firezone/firezone/blob/456e044f882c2bb314e19cc44c0d19c5ad817b7c/rust/windows-client/src-tauri/src/client.rs#L162-L164>
gui-smoke-test:
name: gui-smoke-test-${{ matrix.runs-on }}
strategy:
fail-fast: false
matrix:
runs-on: [ubuntu-22.04, ubuntu-24.04, windows-2019, windows-2022]
runs-on: ${{ matrix.runs-on }}
defaults:
run:
# Must be in this dir for `pnpm` to work
working-directory: ./rust/gui-client
# The Windows client ignores RUST_LOG because it uses a settings file instead
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: ./.github/actions/setup-node
- uses: ./.github/actions/setup-rust
- uses: ./.github/actions/setup-tauri-v2
timeout-minutes: 5
with:
runtime: true
# These steps must be synchronized with build.sh and build.bat in `rust/gui-client`
- name: pnpm install
run: |
pnpm install
cp "node_modules/flowbite/dist/flowbite.min.js" "src/"
- name: Compile Tailwind
run: pnpm tailwindcss -i src/input.css -o src/output.css
- name: Run Vite bundler
run: pnpm vite build
- name: Build client
run: cargo build -p firezone-gui-client --all-targets
- uses: taiki-e/install-action@0b63bc859f7224657cf7e39426848cabaa36f456 # v2.49.9
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tool: dump_syms
- name: Run smoke test
working-directory: ./rust
run: cargo run -p gui-smoke-test
headless-client:
name: headless-client-${{ matrix.test }}-${{ matrix.runs-on }}
strategy:
fail-fast: false
matrix:
include:
- { runs-on: windows-2019, test: token-path-windows.ps1 }
- { runs-on: windows-2022, test: token-path-windows.ps1 }
- { runs-on: ubuntu-22.04, test: linux-group.sh }
- { runs-on: ubuntu-24.04, test: linux-group.sh }
- { runs-on: ubuntu-22.04, test: token-path-linux.sh }
- { runs-on: ubuntu-24.04, test: token-path-linux.sh }
runs-on: ${{ matrix.runs-on }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: ./.github/actions/setup-rust
- uses: ./.github/actions/setup-tauri-v2
timeout-minutes: 5
- run: scripts/tests/${{ matrix.test }}
name: "test script"
working-directory: ./