firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 10:18:54 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	b7dc897eea	refactor(rust): introduce `libs/` directory (#10964 ) The current Rust workspace isn't as consistent as it could be. To make navigation a bit easier, we move a few crates around. Generally, we follow the idea that entry-points should be at the top-level. `rust/` now looks like this (directories only): ``` . ├── cli # Firezone CLI ├── client-ffi # Entry point for Apple & Android ├── gateway # Gateway ├── gui-client # GUI client ├── headless-client # Headless client ├── libs # Library crates ├── relay # Relay ├── target # Compile artifacts ├── tests # Crates for testing └── tools # Local tools ``` To further enforce this structure, we also drop the `firezone-` prefix from all crates that are not top-level binary crates.	2025-11-25 10:59:11 +00:00
Thomas Eizinger	bcf4ccf817	fix(rust): introduce dedicated downcast functions for `anyhow` (#10966 ) The downcasting abilities of `anyhow` are pretty powerful. Unfortunately, they can also be a bit tricky to get right. Whilst `is` and `downcast` work fine for any errors that are within the `anyhow` error chain, they don't check the chain of errors prior to that. In other words, if we already have a nested `std::error::Error` with several causes, `anyhow` cannot downcast to these causes directly. In order to avoid this footgun, we create a thin-layer on top of the `anyhow` crate with some downcasting functions that always try to do the right thing.	2025-11-25 04:14:17 +00:00
Thomas Eizinger	d09bab3d0c	test(relay): go back to the future before healthcheck (#10961 ) The health-check tests for the relay use `Instant::elapsed` which implicitly uses `Instant::now`. On a freshly booted Windows machine, these tests might easily fail because we are subtracting 15 minutes from `Instant::now` which might result in an underflow as Windows cannot represent `Instant`s prior to the boot time. Related: #10927	2025-11-25 00:48:24 +00:00
Thomas Eizinger	9016ffc9dc	build(rust): bump to Rust 1.91.0 (#10767 ) Rust 1.91 has been released and brings with it a few new lints that we need to tidy up. In addition, it also stabilizes `BTreeMap::extract_if`: A really nifty std-lib function that allows us to conditionally take elements from a map. We need that in a bunch of places.	2025-11-03 01:56:12 +00:00
dependabot[bot]	941f6f3d1c	build(deps): bump secrecy from 0.8.0 to 0.10.3 in /rust (#10631 ) Bumps [secrecy](https://github.com/iqlusioninc/crates) from 0.8.0 to 0.10.3. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/iqlusioninc/crates/commits">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=secrecy&package-manager=cargo&previous-version=0.8.0&new-version=0.10.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2025-10-30 01:17:10 +00:00
dependabot[bot]	7a0a3a050f	build(deps): bump network-types from 0.0.8 to 0.1.0 in /rust (#10644 ) Bumps [network-types](https://github.com/vadorovsky/network-types) from 0.0.8 to 0.1.0. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/vadorovsky/network-types/blob/main/CHANGELOG.md">network-types's changelog</a>.</em></p> <blockquote> <h1>Changelog</h1> <p>All notable changes to this project will be documented in this file.</p> <p>The format is based on <a href="https://keepachangelog.com/en/1.0.0/">Keep a Changelog</a>, and this project adheres to <a href="https://semver.org/spec/v2.0.0.html">Semantic Versioning</a>.</p> <h2>[Unreleased]</h2> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`f0f60f40a9`"><code>f0f60f4</code></a> Release 0.1.0</li> <li><a href="`ed8976abfd`"><code>ed8976a</code></a> perf: Reduce number of instructions in setters and getters</li> <li><a href="`8db6af28ea`"><code>8db6af2</code></a> feat: full arp support (<a href="https://redirect.github.com/vadorovsky/network-types/issues/66">#66</a>)</li> <li><a href="`62bfb9da66`"><code>62bfb9d</code></a> test: Multi-target CI Stages (<a href="https://redirect.github.com/vadorovsky/network-types/issues/62">#62</a>)</li> <li><a href="`976026462f`"><code>9760264</code></a> chore: Fix rustfmt errors (<a href="https://redirect.github.com/vadorovsky/network-types/issues/70">#70</a>)</li> <li><a href="`993395c4b1`"><code>993395c</code></a> Adds support for GENEVE (<a href="https://redirect.github.com/vadorovsky/network-types/issues/67">#67</a>)</li> <li><a href="`1f1a75dbc3`"><code>1f1a75d</code></a> Update vxlan.rs (<a href="https://redirect.github.com/vadorovsky/network-types/issues/58">#58</a>)</li> <li><a href="`c77073b396`"><code>c77073b</code></a> feat: MPLS header structure support (<a href="https://redirect.github.com/vadorovsky/network-types/issues/51">#51</a>)</li> <li><a href="`e6d7c50c37`"><code>e6d7c50</code></a> feat: Add Logical Link Control (LLC) header support (<a href="https://redirect.github.com/vadorovsky/network-types/issues/49">#49</a>)</li> <li><a href="`b82ea45981`"><code>b82ea45</code></a> feat: icmp v4 and icmp v6 support (<a href="https://redirect.github.com/vadorovsky/network-types/issues/48">#48</a>)</li> <li>Additional commits viewable in <a href="https://github.com/vadorovsky/network-types/compare/v0.0.8...v0.1.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=network-types&package-manager=cargo&previous-version=0.0.8&new-version=0.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2025-10-29 23:30:14 +00:00
Thomas Eizinger	b854b7407c	chore: fix clippy warnings in eBPF code (#10660 ) This code appears to be configured out in CI and thus we don't run clippy there. My IDE pointed these out however so it seems fair enough to fix them. It is just unnecessary references, doesn't actually have an impact on the functionality.	2025-10-21 05:19:07 +00:00
Thomas Eizinger	20d0298a8a	chore: fix clippy warnings about HashMap iteration (#10661 ) Not quite sure how these didn't get picked up by CI but they showed in my local IDE. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-21 02:54:20 +00:00
Thomas Eizinger	685acdac3a	feat: add more specific component type to user-agent header (#10457 ) In order to allow the portal to more easily classify, what kind of component is connecting, we extend the `get_user_agent` header to include a component type instead of the generic `connlib/`. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2025-09-26 00:18:36 +00:00
Thomas Eizinger	9865e03343	ci: fix double symmetric NAT test failure (#10410 ) As it turns out, the flaky test was caused by a bug in the eBPF kernel where we read the old channel data header from the wrong offset. This made us essentially read garbage data for the channel number, causing us to: a. Compute a bad checksum b. Send the packet on a completely wrong channel The reason this caused a flaky test is that it requires on side to pick IPv4 to talk to the relay and the other side IPv6. The happy-eyeballs approach of the `allocation` module made that non-deterministic, only exposing this bug occasionally. To ensure these kind of things are detected earlier in the future, I am adding an additional CI step that checks all packets emitted by the eBPF kernel for checksum errors. Fixes: #10404 Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>	2025-09-25 17:53:17 +10:00
Thomas Eizinger	94a56fc6bc	build(deps): update `aya` to latest `main` (#10424 ) We haven't updated `aya` in a while. Unfortunately, the update is not without problems. For one, the logging infrastructure changed, requiring us to drop the error details from `xdp_adjust_head`. See https://github.com/aya-rs/aya/issues/1348. Two, the `tokio` feature flag got removed but luckily that can be worked around quite easily. Resolves: #10344	2025-09-23 17:45:59 +10:00
Thomas Eizinger	69afe71215	refactor(connlib): remove concept of "ReplyMessages" (#10361 ) In earlier versions of Firezone, the WebSocket protocol with the portal was using the request-response semantics built into Phoenix. This however is quite cumbersome to work with to due to the polymorphic nature of the protocol design. We ended up moving away from it and instead only use one-way messages where each event directly corresponds to a message type. However, we have never removed the capability reply messages from the `phoenix-channel` module, instead all usages just set it to `()`. We can simplify the code here by always setting this to `()`. Resolves: #7091	2025-09-17 04:10:56 +00:00
Jamil	1f130ad562	fix(relay): XDP_PASS DNS replies (#10330 ) DNS replies are UDP packets often arriving to our ephemeral range. As such, these get dropped because we attempt to look up a channel map for them and fail to find anything. To fix this, we assume all UDP packets arriving with a source port of 53 are DNS packets, and pass them up the stack. There are likely other types of UDP traffic this could be problematic for (QUIC comes to mind), but this fixes the immediate issue at hand for now, as detecting STUN probes is somewhat complex. Fixes #10329	2025-09-11 21:52:53 +00:00
Thomas Eizinger	0b89959354	fix(relay): handle relay-relay candidate pairs in eBPF (#10286 ) Currently, the eBPF module can translate from channel data messages to UDP packets and vice versa. It can even do that across IP stacks, i.e. translate from an IPv6 UDP packet to an IPv4 channel data messages. What it cannot do is handle packets to itself. This can happen if both - Client and Gateway - pick the same relay to make an allocation. When exchanging candidates, ICE will then form pairs between both relay candidates, essentially requiring the relay to loop packets back to itself. In eBPF, we cannot do that. When sending a packet back out with `XDP_TX`, it will actually go out on the wire without an additional check whether they are for our own IP. Properly handling this in eBPF (by comparing the destination IP to our public IP) adds more cases we need to handle. The current module structure where everything is one file makes this quite hard to understand, which is why I opted to create four sub-modules: - `from_ipv4_channel` - `from_ipv4_udp` - `from_ipv6_channel` - `from_ipv6_udp` For traffic arriving via a data-channel, it is possible that we also need to send it back out via a data-channel if the peer address we are sending to is the relay itself. Therefore, the `from_ipX_channel` modules have four sub-modules: - `to_ipv4_channel` - `to_ipv4_udp` - `to_ipv6_channel` - `to_ipv6_udp` For the traffic arriving on an allocation port (`from_ipX_udp`), we always map to a data-channel and therefore can never get into a routing loop, resulting in only two modules: - `to_ipv4_channel` - `to_ipv6_channel` The actual implementation of the new code paths is rather simple and mostly copied from the existing ones. For half of them, we don't need to make any adjustments to the buffer size (i.e. IPv4 channel to IPv4 channel). For the other half, we need to adjust for the difference in the IP header size. To test these changes, we add a new integration test that makes use of the new docker-compose setup added in #10301 and configures masquerading for both Client and Gateway. To make this more useful, we also remove the `direct-` prefix from all tests as the test script itself no longer makes any decisions as to whether it is operating over a direct or relayed connection. Resolves: #7518	2025-09-11 07:19:23 +00:00
Thomas Eizinger	f96cc3d583	feat(relay): remove graceful shutdown (#10322 ) Initially, we added the graceful shutdown functionality to the relay to better deal with deploys and achieve as minimal downtime as possible. With the split of app and infrastructure that we now have, this functionality is no longer necessary as portal deploys don't touch the relay infra at all. Thus, we can remove this functionality which will actually speed-up deploys of the relays as systemd no longer has to time-out after sending the SIGTERM to the binary.	2025-09-10 07:00:20 +00:00
Thomas Eizinger	4a612da189	fix(relay): filter traces by log filter (#10317 ) We want to control which traces are collected and sent to OTEL with the log filter. To do that, we need to also apply the supplied log filter to the tracer.	2025-09-09 23:32:57 +00:00
Jamil	5e0ca45c67	fix(relay): XDP_PASS non-STUN UDP traffic (#10292 ) To prevent userspace relaying, all traffic that seemingly looked like STUN/TURN but we couldn't handle via the eBPF codepath we would `XDP_DROP`. This turned out to be too heavy-handed of an approach since it end up matching DNS query responses as well due to them arriving within the TURN ephemeral port range. To fix this, we `XDP_PASS` the traffic up the stack so that the kernel is able to match it to existing conntrack entries. We've identified a minor race condition where the first few channel data packets might be dropped when a channel is first being bound, but fixing this will be saved for a later PR. Related: https://github.com/firezone/infra/pull/132	2025-09-05 13:24:02 -07:00
Thomas Eizinger	c891d9c864	fix(relay): re-add eBPF channel map entry on refresh (#10291 ) TURN channels have a 5 minute cooldown period after they expire where they cannot be rebound to another peer but can be refreshed and thus "reactivated". To stop routing packets when the channel expires, we remove it from the channel map of the eBPF code. The client however knows that it still has a channel that it can reactivate for another 5min. In case it chooses to do so, we refresh the channel in userspace but until now, forget to re-populate the eBPF map. This effectively blocks this communication path from working because the relay reports the channel from being refreshed successfully, yet the new eBPF kernel drops all packets without a map entry.	2025-09-05 01:29:50 +00:00
Thomas Eizinger	ec0c7c148b	chore(eBPF): minor polish (#10282 ) Some follow-up polish for the eBPF module: - Changes the cfg's to also include Linux, allowing rust-analyzer to assist with auto-complete etc. - Moves code to sub-modules of `try_handle_turn`, removing the need for making them conditional. - Move all maps to sub-modules to allow for a single place to put comments: In the module documentation at the top. - Removes interface IP learning, these are now configured via env variables.	2025-09-03 03:18:46 +00:00
Mariusz Klochowicz	d07e32d91f	chore: Build whole workspace on macos (#10228 ) - Add some macos stubs to gui-smoke-test. - Hide `ebpf-turn-router` binary functionality behind `#[cfg(target_arch = "bpf")]` Signed-off-by: Mariusz Klochowicz <mariusz@klochowicz.com>	2025-09-02 06:55:53 +00:00
Thomas Eizinger	9cddfe59fa	fix(rust): don't require Internet on startup (#10264 ) With the introduction of the pre-resolved Sentry host, all Firezone clients now require Internet on startup. That is a signficant usability hit that we can easily fix by simply falling back to resolving the host on-demand.	2025-09-01 01:31:05 +00:00
Jamil	0ccd4bbf24	feat(ci): enable relay eBPF offloading (#10160 ) In CI, eBPF in driver mode actually functions just fine with no changes to our existing tests, given we apply a few workarounds and bugfixes: - The interface learning mechanism had two flaws: (1) it only learned per-CPU, which meant the risk for a missing entry grew as the core count of the relay host grew, and (2) it did not filter for unicast IPs, so it picked up broadcast and link-local addresses, causing cross-relay paths to fail occasionally - The `relay-relay` candidate where the two relays are the same relay causes packet drops / loops in the Docker bridge setup, and possibly in GCP too. I'm not sure this is a valid path that solves a real connectivity issue in the wild. I can understand relay-relay paths where two relays are different hosts, and the client and gateway both talk over their TURN channel to each other (i.e. WireGuard is blocked in each of their networks), but I can't think of an advantage for a relay-relay candidate where the traffic simply hairpins (or is dropped) off the nearest switch. This has been now detected with a new `PacketLoop` error that triggers whenever source_ip == dest_ip. - The relays in CI need a common next-hop to talk to for the MAC address swapping to work. A simple router service is added which functions as a basic L3 router (no NAT) that allows the MAC swapping to work. - The `veth` driver has some peculiar requirements to allow it to function with XDP_TX. If you send a packet out of one interface of a veth pair with XDP_TX, you need to either make sure both interfaces have GRO enabled, or you need to attach a dummy XDP program that simply does XDP_PASS to the other interface so that the sk_buff is allocated before going up the stack to the Docker bridge. The GRO method was unreliable and didn't work in our case, causing massive packet delays and unpredictable bursts that prevented ICE from working, so we use the XDP_PASS method instead. A simple docker image is built and lives at https://github.com/firezone/xdp-pass to handle this. Related: #10138 Related: #10260	2025-08-31 23:37:03 +00:00
Jamil	d9f1b42595	feat(relay): handle ipv4-ipv6 relaying in eBPF (#10226 ) Data has shown that we are doing a significant amount of relaying in userspace because the latency of which candidates establish first matters - if an IPv6 to IPv4 path establishes first, we could often pick that, which would bypass the eBPF relaying altogether. To address this, we now perform address translation when relaying so these paths are covered. Preliminary benchmarking on Azure has shown this performs around ~1.5 Gbps for a single client - gateway path, scaling linearly with the number clients up to the core count. On GCP, performance will be a fraction of that because we need to attach the program in SKB_MODE (generic) based on the fact the `gve` driver there does not support the needed `bpf_xdp_adjust_head` call. To keep the verifier happy (and make the verifier error trace log usable) throughout this large refactor, we unfortunately had to drop down to pointer arithmetic in this process. This however means that we have full control (and visibility) over how the bytes are loaded, stored, and copied. Each struct / abstraction adds a little bit of overhead on the stack which pushed us over the 512-byte limit. Since we are generally loading only one set of packet headers onto the stack to then copy into their new locations, our actual stack usage should be well the 512-byte limit. Further performance analysis is required to push past the current per-core 1.5 Gbps limit. This, along with CI support for integration testing these codepaths is left for a later date as this PR is already quite large and needs to soak test for a bit in a live environment before we push to prod. Fixes #10192	2025-08-22 06:18:49 +00:00
Thomas Eizinger	c70c88c856	build(deps): upgrade to opentelemetry 0.30 (#10239 )	2025-08-21 22:47:39 +00:00
Thomas Eizinger	46afa52f78	feat(telemetry): pre-resolve Sentry ingest host (#10206 ) Our Sentry client needs to resolve DNS before being able to send logs or errors to the backend. Currently, this DNS resolution happens on-demand as we don't take any control of the underlying HTTP client. In addition, this will use HTTP/1.1 by default which isn't as efficient as it could be, especially with concurrent requests. Finally, if we decide to ever proxy all Sentry for traffic through our own domain, we have to take control of the underlying client anyway. To resolve all of the above, we create a custom `TransportFactory` where we reuse the existing `ReqwestHttpTransport` but provide an already configured `reqwest::Client` that always uses HTTP/2 with a pre-configured set of DNS records for the given ingest host.	2025-08-21 03:28:05 +00:00
Jamil	618254cdfc	refactor(relay): use zero check for is_learned (#10209 ) Simplifies the interface map we store to use a zero-check instead of explicit bool. Related: https://github.com/firezone/firezone/pull/10200#discussion_r2281117072	2025-08-18 21:05:45 +00:00
Thomas Eizinger	da00848549	build(deps): bump to Rust 1.89 (#10208 ) Rust 1.89 comes with a new lint that wants us to use explicitly refer to lifetimes, even if they are elided.	2025-08-18 05:04:55 +00:00
Jamil	f47fb46cc7	feat(relay): learn interface addresses (#10200 ) In order to support cross-stack relaying, we need to know what the source IP is going to be to write the packets from. To know this, we can simply learn the destination IP address for incoming packets to our XDP program. A separate cache is used per IP stack in order be a bit more cache line friendly and prevent contention when only IP stack lookup is needed. Related: #10192	2025-08-18 00:33:58 +00:00
Thomas Eizinger	70a930e45d	chore(relay): use existing `ebpf` module import (#10202 )	2025-08-17 23:45:36 +00:00
Jamil	b07fa341cf	feat(relay): XDP driver (native) mode for gVNIC (#10177 ) This updates our eBPF module to use DRV_MODE for less CPU overhead and better performance for all same-stack TURN relaying. Notably, gVNIC does not seem to support the `bpf_xdp_adjust_head` helper, so unfortunately we need to extend / shrink the packet tail and move the payload instead. Comprehensive benchmarks have not been performed, but early results show that we can saturate about 1 Gbps per E2 core on GCP: ``` [SUM] 0.00-30.04 sec 3.16 GBytes 904 Mbits/sec 12088 sender [SUM] 0.00-30.00 sec 3.12 GBytes 894 Mbits/sec receiver ``` This is with 64 TCP streams. More streams will better utilize all available RX queues, and lead to better performance. Related: #10138 Fixes: #8633	2025-08-17 15:04:19 +00:00
Jamil	46ffe8fe45	docs(relay): add note on channel map safety (#10194 ) A fair bit of time was spent validating these map accesses are thread-safe, so just documenting that for the next reader to find. Related: https://github.com/firezone/firezone/issues/10138#issuecomment-3186074350	2025-08-14 17:27:12 -07:00
Jamil	3e3f555c1e	fix(relay): swap MACs for relayed traffic (#10193 ) In nearly all environments, we can safely assume that we will always use the same network gateway for forwarding relayed packets as the one we received them from. By leveraging this assumption, we can simply swap the SRC and DST MAC addresses, removing the need to keep a HaspMap for these, which eliminates the need to worry about thread-safety for this particular functionality. Related: #10138	2025-08-13 14:40:26 -07:00
Jamil	92137ee76b	fix(relay): don't inline hotpath loop calls (#10185 ) When inlining large(ish) functions that are on the hot-path, it creates a much longer program for the eBPF verifier to validate since the verifier is working through all packet sizes and types. We're hitting an issue on GCP (in the 8-core dev VM, XDP-generic) where verification fails on `main` due to the inlining of some hot-path functions. This PR is the smallest possible change that gets the program to load, highlighting the issue. In practice, I'm not there is a detectable performance difference between having these inlined vs not (especially in DRV_MODE) so I'm not sure it's worth the potential debugging headaches later on.	2025-08-11 22:08:12 +00:00
Thomas Eizinger	2dde3b8573	fix(relay): read from most-recently-ready socket first (#10148 ) The relay uses `mio` to react to readiness events from multiple sockets at once. Including the control port 3478, the relay needs to also send and receive traffic from up to 16384 sockets (one for each possible allocation). We need to process readiness events from these sockets as fairly as possible. Under high-load, it may otherwise happen that we don't read packets from an allocation socket, resulting in ICE timeouts of the connection being relayed. To achieve this fairness, we collect all readiness tokens into a set and store it with the number of packets we have read so far from this socket. Then, we always read from the socket next that we have so far read the least amount of packets from.	2025-08-06 09:13:05 +00:00
Thomas Eizinger	f27683760a	fix(relay): check for ANSI support on stdout (#10149 )	2025-08-06 07:42:54 +00:00
Thomas Eizinger	0e32f1c4f2	fix(relay): increase nonce usage to 10000 (#10128 ) On a Gateway with a busy connections, only being able to use a nonce 100 times causes unnecessary churn. We increase this to 10000 to be able to handle bursts of messages such as channel bindings better.	2025-08-05 02:00:57 +00:00
Thomas Eizinger	fbf96c261e	chore(relay): remove spans (#9962 ) These are flooding our monitoring infra and don't really add that much value. Pretty much all of the processing the relay does is request in and out and none of the spans are nested. We can therefore almost 1-to-1 replicate the logging we do with spans by adding the fields to each log message. Resolves: #9954	2025-07-22 13:24:58 +00:00
Thomas Eizinger	0f1c5f2818	refactor(relay): simplify auth module (#9873 ) Whilst looking through the auth module of the relay, I noticed that we unnecessarily convert back and forth between expiry timestamps and username formats when we could just be using the already parsed version.	2025-07-15 14:14:51 +00:00
Thomas Eizinger	2b70596636	fix(rust): only apply filter to select tracing layers (#9872 ) Applying a filter globally to the entire subscriber means it filters events for all layers. This prevents the Sentry layer from uploading DEBUG logs if configured.	2025-07-15 13:44:53 +00:00
Thomas Eizinger	d01701148b	fix(rust): remove jemalloc (#9849 ) I am no longer able to compile `jemalloc` on my system in a debug build. It fails with the following error: ``` src/malloc_io.c: In function ‘buferror’: src/malloc_io.c:107:16: error: returning ‘char *’ from a function with return type ‘int’ makes integer from pointer without a cast [-Wint-conversion] 107 \| return strerror_r(err, buf, buflen); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` This appears to be a problem with modern versions of clang/gcc. I believe this started happening when I recently upgraded my system. The upstream [`jemalloc`](https://github.com/jemalloc/jemalloc) repository is now archived and thus unmaintained. I am not sure if we ever measured a significant benefit in using `jemalloc`. Related: https://github.com/servo/servo/issues/31059	2025-07-12 19:22:06 +00:00
Thomas Eizinger	d6805d7e48	chore(rust): bump to Rust 1.88 (#9714 ) Rust 1.88 has been released and brings with it a quite exciting feature: let-chains! It allows us to mix-and-match `if` and `let` expressions, therefore often reducing the "right-drift" of the relevant code, making it easier to read. Rust.188 also comes with a new clippy lint that warns when creating a mutable reference from an immutable pointer. Attempting to fix this revealed that this is exactly what we are doing in the eBPF kernel. Unfortunately, it doesn't seem to be possible to design this in a way that is both accepted by the borrow-checker AND by the eBPF verifier. Hence, we simply make the function `unsafe` and document for the programmer, what needs to be upheld.	2025-07-12 06:42:50 +00:00
Thomas Eizinger	3b972643b1	feat(rust): stream logs to Sentry when enabled in PostHog (#9635 ) Sentry has a new "Logs" feature where we can stream logs directly to Sentry. Doing this for all Clients and Gateways would be way too much data to collect though. In order to aid debugging from customer installations, we add a PostHog-managed feature flag that - if set to `true` - enables the streaming of logs to Sentry. This feature flag is evaluated every time the telemetry context is initialised: - For all FFI usages of connlib, this happens every time a new session is created. - For the Windows/Linux Tunnel service, this also happens every time we create a new session. - For the Headless Client and Gateway, it happens on startup and afterwards, every minute. The feature-flag context itself is only checked every 5 minutes though so it might take up to 5 minutes before this takes effect. The default value - like all feature flags - is `false`. Therefore, if there is any issue with the PostHog service, we will fallback to the previous behaviour where logs are simply stored locally. Resolves: #9600	2025-06-25 16:14:14 +00:00
Thomas Eizinger	fccf5021e6	fix(relay): don't fail event-loop on interrupt (#9592 ) When profiling the relay, certain syscalls may get interrupted by the kernel. At present, this crashes the relay which makes profiling impossible. Co-authored-by: Antoine Labarussias <antoinelabarussias@gmail.com>	2025-06-20 18:42:57 +00:00
Thomas Eizinger	e05c98bfca	ci: update to new `cargo sort` release (#9354 ) The latest release now also sorts workspace dependencies, as well as different dependency sections. Keeping these things sorted reduces the chances of merge conflicts when multiple PRs edit these files.	2025-06-02 02:01:09 +00:00
Thomas Eizinger	cee4be9e24	build(deps): bump Rust dependencies (#9192 ) A mass upgrade of our Rust dependencies. Most crucially, these remove several duplicated dependencies from our tree. - The Tauri plugins have been stuck on `windows v0.60` for a while. They are now updated to use `windows v0.61` which is what the rest of our dependency tree uses. - By bumping `axum`, can also bump `reqwest` which reduces a few more duplicated dependencies. - By removing `env_logger`, we can get rid of a few dependencies.	2025-05-22 13:15:01 +00:00
Thomas Eizinger	37529803ce	build(rust): bump otel ecosystem crates to 0.29 (#9029 )	2025-05-05 12:33:07 +00:00
Thomas Eizinger	6114bb274f	chore(rust): make most of the Rust code compile on MacOS (#8924 ) When working on the Rust code of Firezone from a MacOS computer, it is useful to have pretty much all of the code at least compile to ensure detect problems early. Eventually, once we target features like a headless MacOS client, some of these stubs will actually be filled in an be functional.	2025-04-29 11:20:09 +00:00
Thomas Eizinger	bcbc8cd212	build(rust): bump `aya` to include BTF information feature (#8883 ) The latest version of `aya-build` automatically builds our eBPF program with BTF information enabled. Related: https://github.com/aya-rs/aya/pull/1250	2025-04-22 00:36:41 +00:00
Thomas Eizinger	1af7f4f8c1	fix(rust): don't use jemalloc on ARMv7 (#8859 ) Doesn't compile on ARMv7 so we just fallback to the default allocator there.	2025-04-19 22:20:05 +00:00
Thomas Eizinger	a41395a165	feat(eBPF): embed BTF information in eBPF kernel (#8842 ) It turns out that the Rust compiler doesn't always say that it is adding debug information to a binary even when it does! The build output only displays `[optimized]` when in fact it does actually emit debug information. Adding an additional linker flag configures `bpf-linker` to include the necessary BTF information in our kernel. This makes debugging verifier errors much easier as the program output contains source code annotiations. It also should make it easier to debug issues using `xdpdump` which relies on BTF information. Resolves: #8503	2025-04-19 12:38:59 +00:00

1 2 3 4 5 ...

352 Commits