ci: fix double symmetric NAT test failure (#10410)

As it turns out, the flaky test was caused by a bug in the eBPF kernel where we read the old channel data header from the wrong offset. This made us essentially read garbage data for the channel number, causing us to:

a. Compute a bad checksum
b. Send the packet on a completely wrong channel

The reason this caused a flaky test is that it requires on side to pick IPv4 to talk to the relay and the other side IPv6. The happy-eyeballs approach of the `allocation` module made that non-deterministic, only exposing this bug occasionally.

To ensure these kind of things are detected earlier in the future, I am adding an additional CI step that checks all packets emitted by the eBPF kernel for checksum errors.

Fixes: #10404

Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
This commit is contained in:
Thomas Eizinger
2025-09-25 07:53:17 +00:00
committed by GitHub
parent 6147110198
commit 9865e03343
5 changed files with 80 additions and 18 deletions

View File

@@ -25,6 +25,9 @@ docker network connect firezone_client-internal firezone-client-1 --ip 172.30.0.
client ip -4 route add 203.0.113.0/24 via 172.30.0.254
client ip -6 route add 203:0:113::/64 via 172:30:0::254
# Disable checksum offload again to calculate checksums in software so that checksum verification passes
client ethtool -K eth0 tx off
# Send SIGHUP, triggering `reconnect` internally
sudo kill -s HUP "$(ps -C firezone-headless-client -o pid=)"