firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 10:18:54 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	0b89959354	fix(relay): handle relay-relay candidate pairs in eBPF (#10286 ) Currently, the eBPF module can translate from channel data messages to UDP packets and vice versa. It can even do that across IP stacks, i.e. translate from an IPv6 UDP packet to an IPv4 channel data messages. What it cannot do is handle packets to itself. This can happen if both - Client and Gateway - pick the same relay to make an allocation. When exchanging candidates, ICE will then form pairs between both relay candidates, essentially requiring the relay to loop packets back to itself. In eBPF, we cannot do that. When sending a packet back out with `XDP_TX`, it will actually go out on the wire without an additional check whether they are for our own IP. Properly handling this in eBPF (by comparing the destination IP to our public IP) adds more cases we need to handle. The current module structure where everything is one file makes this quite hard to understand, which is why I opted to create four sub-modules: - `from_ipv4_channel` - `from_ipv4_udp` - `from_ipv6_channel` - `from_ipv6_udp` For traffic arriving via a data-channel, it is possible that we also need to send it back out via a data-channel if the peer address we are sending to is the relay itself. Therefore, the `from_ipX_channel` modules have four sub-modules: - `to_ipv4_channel` - `to_ipv4_udp` - `to_ipv6_channel` - `to_ipv6_udp` For the traffic arriving on an allocation port (`from_ipX_udp`), we always map to a data-channel and therefore can never get into a routing loop, resulting in only two modules: - `to_ipv4_channel` - `to_ipv6_channel` The actual implementation of the new code paths is rather simple and mostly copied from the existing ones. For half of them, we don't need to make any adjustments to the buffer size (i.e. IPv4 channel to IPv4 channel). For the other half, we need to adjust for the difference in the IP header size. To test these changes, we add a new integration test that makes use of the new docker-compose setup added in #10301 and configures masquerading for both Client and Gateway. To make this more useful, we also remove the `direct-` prefix from all tests as the test script itself no longer makes any decisions as to whether it is operating over a direct or relayed connection. Resolves: #7518	2025-09-11 07:19:23 +00:00
Jamil	0ccd4bbf24	feat(ci): enable relay eBPF offloading (#10160 ) In CI, eBPF in driver mode actually functions just fine with no changes to our existing tests, given we apply a few workarounds and bugfixes: - The interface learning mechanism had two flaws: (1) it only learned per-CPU, which meant the risk for a missing entry grew as the core count of the relay host grew, and (2) it did not filter for unicast IPs, so it picked up broadcast and link-local addresses, causing cross-relay paths to fail occasionally - The `relay-relay` candidate where the two relays are the same relay causes packet drops / loops in the Docker bridge setup, and possibly in GCP too. I'm not sure this is a valid path that solves a real connectivity issue in the wild. I can understand relay-relay paths where two relays are different hosts, and the client and gateway both talk over their TURN channel to each other (i.e. WireGuard is blocked in each of their networks), but I can't think of an advantage for a relay-relay candidate where the traffic simply hairpins (or is dropped) off the nearest switch. This has been now detected with a new `PacketLoop` error that triggers whenever source_ip == dest_ip. - The relays in CI need a common next-hop to talk to for the MAC address swapping to work. A simple router service is added which functions as a basic L3 router (no NAT) that allows the MAC swapping to work. - The `veth` driver has some peculiar requirements to allow it to function with XDP_TX. If you send a packet out of one interface of a veth pair with XDP_TX, you need to either make sure both interfaces have GRO enabled, or you need to attach a dummy XDP program that simply does XDP_PASS to the other interface so that the sk_buff is allocated before going up the stack to the Docker bridge. The GRO method was unreliable and didn't work in our case, causing massive packet delays and unpredictable bursts that prevented ICE from working, so we use the XDP_PASS method instead. A simple docker image is built and lives at https://github.com/firezone/xdp-pass to handle this. Related: #10138 Related: #10260	2025-08-31 23:37:03 +00:00
Jamil	618254cdfc	refactor(relay): use zero check for is_learned (#10209 ) Simplifies the interface map we store to use a zero-check instead of explicit bool. Related: https://github.com/firezone/firezone/pull/10200#discussion_r2281117072	2025-08-18 21:05:45 +00:00
Jamil	f47fb46cc7	feat(relay): learn interface addresses (#10200 ) In order to support cross-stack relaying, we need to know what the source IP is going to be to write the packets from. To know this, we can simply learn the destination IP address for incoming packets to our XDP program. A separate cache is used per IP stack in order be a bit more cache line friendly and prevent contention when only IP stack lookup is needed. Related: #10192	2025-08-18 00:33:58 +00:00
Thomas Eizinger	b4afd0bffb	refactor(eBPF): reduce size of maps (#8849 ) Whilst developing the eBPF module for the relay, I needed to manually add padding within the key and value structs used in the maps in order for the kernel to be able to correctly retrieve the data. For some reason, this seems no longer necessary as the integration test now passes without this as well. Being able to remove the padding drastically reduces the size of these maps for the current number of entries that we allow. This brings the overall memory usage of the relay down. Resolves: #8682	2025-04-19 11:46:58 +00:00
Thomas Eizinger	0079f76ebd	fix(eBPF): store allocation port-range in big-endian (#8804 ) Any communication between user-space and the eBPF kernel happens via maps. The keys and values in these maps are serialised to bytes, meaning the endianness of how these values are encoded matters! When debugging why the eBPF kernels were not performing as much as we thought they would, I noticed that only very small packets were getting relayed. In particular, only packets encoded as channel-data packets were getting unwrapped correctly. The reverse didn't happen at all. Turning the log-level up to TRACE did reveal that we do in fact see these packets but they don't get handled. Here is the relevant section that handles these packets: `74ccf8e0b2/rust/relay/ebpf-turn-router/src/main.rs (L127-L151)` We can see the `trace!` log in the logs and we know that it should be handled by the first `if`. But for some reason it doesn't. x86 systems like the machines running in GCP are typically little-endian. Network-byte ordering is big-endian. My current theory is that we are comparing the port range with the wrong endianness and therefore, this branch never gets hit, causing the relaying to be offloaded to user space. By storing the fields within `Config` in byte-arrays, we can take explicit control over which endianness is used to store these fields.	2025-04-18 04:51:40 +00:00
Thomas Eizinger	07a82d2254	chore(relay): remove feature flag for eBPF TURN router (#8681 ) The original idea of this feature flag was that we can easily disable the eBPF router in case it causes issues in production. However, something seems to be not working in reliably turning this on / off. Without an explicit toggle of the feature-flag, the eBPF program doesn't seem to be loaded correctly. The uncertainty in this makes me not the trust the metrics that we are seeing because we don't know, whether really all relays are using the eBPF router to relay TURN traffic. In order to draw truthful conclusions as too how much traffic we are relaying via eBPF, this patch removes the feature flag again. As of #8656, we can disable the eBPF program by not setting the `EBPF_OFFLOADING` env variable. This requires a re-deploy / restart of relays to take effect which isn't quite as fast as toggling a feature flag but much reliable and easier to maintain.	2025-04-07 03:31:22 +00:00
Thomas Eizinger	941ef6c668	feat(relay): introduce feature-flag for toggling eBPF program (#8650 ) This PR implements a feature-flag in PostHog that we can use to toggle the use of the eBPF data plane at runtime. At every tick of the event-loop, the relay will compare the (cached) configuration of the eBPF program with the (cached) value of the feature-flag. If they differ, the flag will be updated and upon the next packet, the eBPF program will act accordingly. Feature-flags are re-evaluated every 5 minutes, meaning there is some delay until this gets applied. The default value of our all our feature-flags is `false`, meaning if there is some problem with evaluating them, we'd turn the eBPF data plane off. Performing routing in userspace is slower but it is a safer default. Resolves: #8548	2025-04-04 02:51:52 +00:00
Thomas Eizinger	1d0ecf94b8	feat(relay): record metrics about bytes relayed via eBPF (#8556 ) Perf events are designed to be an extremely efficient way of transferring data from an eBPF kernel to the user-space program. In order to monitor, how much traffic we are actually relaying via eBPF, we introduce a dedicated `STATS` map that is a `PerfEventArray`. The events from that array are read asynchronously in user-space and fed into our OTEL metrics. They will show up in our Google Cloud metrics as `data_relayed_ebpf_bytes`. We already have a metric for the total relayed bytes. That counter is renamed to `data_relayed_userspace_bytes` so we can clearly differentiate the two.	2025-03-31 21:57:31 +00:00
Thomas Eizinger	b51a68def0	feat(relay): implement eBPF routing for IPv6 (#8554 ) This fills in the boilerplate for handling IPv6 packets in the eBPF code. Unfortunately, we cannot add an integration test for this because IPv6 doesn't have a checksum and thus doesn't allow the UDP checksum to be set to 0. Because Linux (and other OSs too I'd assume) offload UDP checksumming to the NIC yet on the loopback interface, the packets never get to the NIC, our eBPF code sees only a partial checksum and can thus updates the checksum incorrectly. Related: #7518 Related: #8502 --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-03-31 21:22:11 +00:00
Thomas Eizinger	a4851ee76f	feat(relay): implement the reverse IPv4 eBPF code path (#8544 ) This PR implements the "reverse path" of handling TURN traffic, i.e. UDP datagrams that arrive on an allocation port and need to be wrapped in a channel-data message to be sent to the TURN client. In order to achieve that, I had to rewrite most of the TURN code to not use the `etherparse` crate. I couldn't quite figure out the details but the eBPF verifier rejected my code in mysterious ways that I didn't understand. Commenting out random code-paths seemed to make it happy but all code-paths combined caused an error. Eventually, I decided that we simply have to use less abstractions to implement the same logic. All the "parsing" code is now using types inspired by `network-types`. The only modification here is that we use byte-arrays within our structs in order to directly receive them in big-endian ordering. `network-types` uses `u16`s and `u32`s which get interpreted as little-endian on x86. Instead of converting around between the endianness, constructing those values where we want them using the right endianness is deemed much simpler. I opened an issue with upstream which - if accepted - will allow us to remove our own structs and instead depend on upstream again. I also had to aggressively add `#[inline(always)]` to several functions, otherwise the compiler would not optimise away our function calls, causing the linker and / or eBPF verifier to fail. This PR also fixes numerous bugs that I've found in the already existing eBPF code. The number of bugs makes me question how this has been working so far at all! - We did not swap the Ethernet source and destination MAC address when re-routing the packet. The integration-test didn't catch this because it only operates on the loopback interface. Further testing on staging should allow us to confirm that this is indeed working now. - The UDP checksum update did not incorporate the new src and dst port. The integration-test didnt' catch that because it has UDP checksumming disabled. We need to have that disabled in the test because UDP checksumming is typically offloaded to the NIC and packets on the loopback interface never leave the device. Related: https://github.com/vadorovsky/network-types/issues/32. Related: #7518	2025-03-31 12:32:35 +00:00
Thomas Eizinger	3c7ac084c0	feat(relay): MVP for routing channel data message in eBPF kernel (#8496 ) ## Abstract This pull-request implements the first stage of off-loading routing of TURN data channel messages to the kernel via an eBPF XDP program. In particular, the eBPF kernel implemented here only handles the decapsulation of IPv4 data channel messages into their embedded UDP payload. Implementation of other data paths, such as the receiving of UDP traffic on an allocation and wrapping it in a TURN channel data message is deferred to a later point for reasons explained further down. As it stands, this PR implements the bare minimum for us to start experimenting and benefiting from eBPF. It is already massive as it is due to the infrastructure required for actually doing this. Let's dive into it! ## A refresher on TURN channel-data messages TURN specifies a channel-data message for relaying data between two peers. A channel data message has a fixed 4-byte header: - The first two bytes specify the channel number - The second two bytes specify the length of the encapsulated payload Like all TURN traffic, channel data messages run over UDP by default, meaning this header sits at the very front of the UDP payload. This will be important later. After making an allocation with a TURN server (i.e. reserving a port on the TURN server's interfaces), a TURN client can bind channels on that allocation. As such, channel numbers are scoped to a client's allocation. Channel numbers are allocated by the client within a given range (0x4000 - 0x4FFF). When binding a channel, the client specifies the remote's peer address that they'd like the data sent on the channel to be sent to. Given this setup, when a TURN server receives a channel data message, it first looks at the sender's IP + port to infer the allocation (a client can only ever have 1 allocation at a time). Within that allocation, the server then looks for the channel number and retrieves the target socket address from that. The allocation itself is a port on the relay's interface. With that, we can now "unpack" the payload of the channel data message and rewrite it to the new receiver: - The new source IP can be set from the old dst IP (when operating in user-space mode this is irrelevant because we are working with the socket API). - The new source port is the client's allocation. - The new destination IP is retrieved from the mapping retrieved via the channel number. - The new destination port is retrieved from the mapping retrieved via the channel number. Last but not least, all that is left is removing the channel data header from the UDP payload and we can send out the packet. In other words, we need to cut off the first 4 bytes of the UDP payload. ## User-space relaying At present, we implement the above flow in user-space. This is tricky to do because we need to bind _many_ sockets, one for each possible allocation port (of which there can be 16383). The actual work to be done on these packets is also extremely minimal. All we do is cut off (or add on) the data-channel header. Benchmarks show that we spend pretty much all of our time copying data between user-space and kernel-space. Cutting this out should give us a massive increase in performance. ## Implementing an eBPF XDP TURN router eBPF has been shown to be a very efficient way of speeding up a TURN server [0]. After many failed experiments (e.g. using TC instead of XDP) and countless rabbit-holes, we have also arrived at the design documented within the paper. Most notably: - The eBPF program is entirely optional. We try to load it on startup, but if that fails, we will simply use the user-space mode. - Retaining the user-space mode is also important because under certain circumstances, the eBPF kernel needs to pass on the packet, for example, when receiving IPv4 packets with options. Those make the header dynamically-sized which makes further processing difficult because the eBPF verifier disallows indexing into the packet with data derived from the packet itself. - In order to add/remove the channel-data header, we shift the packet headers backwards / forwards and leave the payload in place as the packet headers are constant in size and can thus easily and cheaply be copied out. In order to perform the relaying flow explained above, we introduce maps that are shared with user-space. These maps go from a tuple of (client-socket, channel-number) to a tuple of (allocation-port, peer-socket) and thus give us all the data necessary to rewrite the packet. ## Integration with our relay Last but not least, to actually integrate the eBPF kernel with our relay, we need to extend the `Server` with two more events so we can learn, when channel bindings are created and when they expire. Using these events, we can then update the eBPF maps accordingly and therefore influence the routing behaviour in the kernel. ## Scope What is implemented here is only one of several possible data paths. Implementing the others isn't conceptually difficult but it does increase the scope. Landing something that already works allows us to gain experience running it in staging (and possibly production). Additionally, I've hit some issues with the eBPF verifier when adding more codepaths to the kernel. I expect those to be possible to resolve given sufficient debugging but I'd like to do so after merging this. --- Depends-On: #8506 Depends-On: #8507 Depends-On: #8500 Resolves: #8501 [0]: https://dl.acm.org/doi/pdf/10.1145/3609021.3609296	2025-03-27 10:59:40 +00:00

12 Commits