Commit Graph

349 Commits

Author SHA1 Message Date
Thomas Eizinger
9016ffc9dc build(rust): bump to Rust 1.91.0 (#10767)
Rust 1.91 has been released and brings with it a few new lints that we
need to tidy up. In addition, it also stabilizes `BTreeMap::extract_if`:
A really nifty std-lib function that allows us to conditionally take
elements from a map. We need that in a bunch of places.
2025-11-03 01:56:12 +00:00
dependabot[bot]
941f6f3d1c build(deps): bump secrecy from 0.8.0 to 0.10.3 in /rust (#10631)
Bumps [secrecy](https://github.com/iqlusioninc/crates) from 0.8.0 to
0.10.3.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/iqlusioninc/crates/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=secrecy&package-manager=cargo&previous-version=0.8.0&new-version=0.10.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2025-10-30 01:17:10 +00:00
dependabot[bot]
7a0a3a050f build(deps): bump network-types from 0.0.8 to 0.1.0 in /rust (#10644)
Bumps [network-types](https://github.com/vadorovsky/network-types) from
0.0.8 to 0.1.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/vadorovsky/network-types/blob/main/CHANGELOG.md">network-types's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<p>All notable changes to this project will be documented in this
file.</p>
<p>The format is based on <a
href="https://keepachangelog.com/en/1.0.0/">Keep a Changelog</a>,
and this project adheres to <a
href="https://semver.org/spec/v2.0.0.html">Semantic Versioning</a>.</p>
<h2>[Unreleased]</h2>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f0f60f40a9"><code>f0f60f4</code></a>
Release 0.1.0</li>
<li><a
href="ed8976abfd"><code>ed8976a</code></a>
perf: Reduce number of instructions in setters and getters</li>
<li><a
href="8db6af28ea"><code>8db6af2</code></a>
feat: full arp support (<a
href="https://redirect.github.com/vadorovsky/network-types/issues/66">#66</a>)</li>
<li><a
href="62bfb9da66"><code>62bfb9d</code></a>
test: Multi-target CI Stages (<a
href="https://redirect.github.com/vadorovsky/network-types/issues/62">#62</a>)</li>
<li><a
href="976026462f"><code>9760264</code></a>
chore: Fix rustfmt errors (<a
href="https://redirect.github.com/vadorovsky/network-types/issues/70">#70</a>)</li>
<li><a
href="993395c4b1"><code>993395c</code></a>
Adds support for GENEVE (<a
href="https://redirect.github.com/vadorovsky/network-types/issues/67">#67</a>)</li>
<li><a
href="1f1a75dbc3"><code>1f1a75d</code></a>
Update vxlan.rs (<a
href="https://redirect.github.com/vadorovsky/network-types/issues/58">#58</a>)</li>
<li><a
href="c77073b396"><code>c77073b</code></a>
feat: MPLS header structure support (<a
href="https://redirect.github.com/vadorovsky/network-types/issues/51">#51</a>)</li>
<li><a
href="e6d7c50c37"><code>e6d7c50</code></a>
feat: Add Logical Link Control (LLC) header support (<a
href="https://redirect.github.com/vadorovsky/network-types/issues/49">#49</a>)</li>
<li><a
href="b82ea45981"><code>b82ea45</code></a>
feat: icmp v4 and icmp v6 support (<a
href="https://redirect.github.com/vadorovsky/network-types/issues/48">#48</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/vadorovsky/network-types/compare/v0.0.8...v0.1.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=network-types&package-manager=cargo&previous-version=0.0.8&new-version=0.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2025-10-29 23:30:14 +00:00
Thomas Eizinger
b854b7407c chore: fix clippy warnings in eBPF code (#10660)
This code appears to be configured out in CI and thus we don't run
clippy there. My IDE pointed these out however so it seems fair enough
to fix them. It is just unnecessary references, doesn't actually have an
impact on the functionality.
2025-10-21 05:19:07 +00:00
Thomas Eizinger
20d0298a8a chore: fix clippy warnings about HashMap iteration (#10661)
Not quite sure how these didn't get picked up by CI but they showed in
my local IDE.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-21 02:54:20 +00:00
Thomas Eizinger
685acdac3a feat: add more specific component type to user-agent header (#10457)
In order to allow the portal to more easily classify, what kind of
component is connecting, we extend the `get_user_agent` header to
include a component type instead of the generic `connlib/`.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2025-09-26 00:18:36 +00:00
Thomas Eizinger
9865e03343 ci: fix double symmetric NAT test failure (#10410)
As it turns out, the flaky test was caused by a bug in the eBPF kernel where we read the old channel data header from the wrong offset. This made us essentially read garbage data for the channel number, causing us to:

a. Compute a bad checksum
b. Send the packet on a completely wrong channel

The reason this caused a flaky test is that it requires on side to pick IPv4 to talk to the relay and the other side IPv6. The happy-eyeballs approach of the `allocation` module made that non-deterministic, only exposing this bug occasionally.

To ensure these kind of things are detected earlier in the future, I am adding an additional CI step that checks all packets emitted by the eBPF kernel for checksum errors.

Fixes: #10404

Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
2025-09-25 17:53:17 +10:00
Thomas Eizinger
94a56fc6bc build(deps): update aya to latest main (#10424)
We haven't updated `aya` in a while. Unfortunately, the update is not without problems. For one, the logging infrastructure changed, requiring us to drop the error details from `xdp_adjust_head`. See https://github.com/aya-rs/aya/issues/1348. Two, the `tokio` feature flag got removed but luckily that can be worked around quite easily.

Resolves: #10344
2025-09-23 17:45:59 +10:00
Thomas Eizinger
69afe71215 refactor(connlib): remove concept of "ReplyMessages" (#10361)
In earlier versions of Firezone, the WebSocket protocol with the portal
was using the request-response semantics built into Phoenix. This
however is quite cumbersome to work with to due to the polymorphic
nature of the protocol design.

We ended up moving away from it and instead only use one-way messages
where each event directly corresponds to a message type. However, we
have never removed the capability reply messages from the
`phoenix-channel` module, instead all usages just set it to `()`.

We can simplify the code here by always setting this to `()`.

Resolves: #7091
2025-09-17 04:10:56 +00:00
Jamil
1f130ad562 fix(relay): XDP_PASS DNS replies (#10330)
DNS replies are UDP packets often arriving to our ephemeral range. As
such, these get dropped because we attempt to look up a channel map for
them and fail to find anything.

To fix this, we assume all UDP packets arriving with a source port of 53
are DNS packets, and pass them up the stack.

There are likely other types of UDP traffic this could be problematic
for (QUIC comes to mind), but this fixes the immediate issue at hand for
now, as detecting STUN probes is somewhat complex.

Fixes #10329
2025-09-11 21:52:53 +00:00
Thomas Eizinger
0b89959354 fix(relay): handle relay-relay candidate pairs in eBPF (#10286)
Currently, the eBPF module can translate from channel data messages to
UDP packets and vice versa. It can even do that across IP stacks, i.e.
translate from an IPv6 UDP packet to an IPv4 channel data messages.

What it cannot do is handle packets to itself. This can happen if both -
Client and Gateway - pick the same relay to make an allocation. When
exchanging candidates, ICE will then form pairs between both relay
candidates, essentially requiring the relay to loop packets back to
itself.

In eBPF, we cannot do that. When sending a packet back out with
`XDP_TX`, it will actually go out on the wire without an additional
check whether they are for our own IP.

Properly handling this in eBPF (by comparing the destination IP to our
public IP) adds more cases we need to handle. The current module
structure where everything is one file makes this quite hard to
understand, which is why I opted to create four sub-modules:

- `from_ipv4_channel`
- `from_ipv4_udp`
- `from_ipv6_channel`
- `from_ipv6_udp`

For traffic arriving via a data-channel, it is possible that we also
need to send it back out via a data-channel if the peer address we are
sending to is the relay itself. Therefore, the `from_ipX_channel`
modules have four sub-modules:

- `to_ipv4_channel`
- `to_ipv4_udp`
- `to_ipv6_channel`
- `to_ipv6_udp`

For the traffic arriving on an allocation port (`from_ipX_udp`), we
always map to a data-channel and therefore can never get into a routing
loop, resulting in only two modules:

- `to_ipv4_channel`
- `to_ipv6_channel`

The actual implementation of the new code paths is rather simple and
mostly copied from the existing ones. For half of them, we don't need to
make any adjustments to the buffer size (i.e. IPv4 channel to IPv4
channel). For the other half, we need to adjust for the difference in
the IP header size.

To test these changes, we add a new integration test that makes use of
the new docker-compose setup added in #10301 and configures masquerading
for both Client and Gateway. To make this more useful, we also remove
the `direct-` prefix from all tests as the test script itself no longer
makes any decisions as to whether it is operating over a direct or
relayed connection.

Resolves: #7518
2025-09-11 07:19:23 +00:00
Thomas Eizinger
f96cc3d583 feat(relay): remove graceful shutdown (#10322)
Initially, we added the graceful shutdown functionality to the relay to
better deal with deploys and achieve as minimal downtime as possible.
With the split of app and infrastructure that we now have, this
functionality is no longer necessary as portal deploys don't touch the
relay infra at all.

Thus, we can remove this functionality which will actually speed-up
deploys of the relays as systemd no longer has to time-out after sending
the SIGTERM to the binary.
2025-09-10 07:00:20 +00:00
Thomas Eizinger
4a612da189 fix(relay): filter traces by log filter (#10317)
We want to control which traces are collected and sent to OTEL with the
log filter. To do that, we need to also apply the supplied log filter to
the tracer.
2025-09-09 23:32:57 +00:00
Jamil
5e0ca45c67 fix(relay): XDP_PASS non-STUN UDP traffic (#10292)
To prevent userspace relaying, all traffic that seemingly looked like
STUN/TURN but we couldn't handle via the eBPF codepath we would
`XDP_DROP`.

This turned out to be too heavy-handed of an approach since it end up
matching DNS query responses as well due to them arriving within the
TURN ephemeral port range.

To fix this, we `XDP_PASS` the traffic up the stack so that the kernel
is able to match it to existing conntrack entries.

We've identified a minor race condition where the first few channel data
packets might be dropped when a channel is first being bound, but fixing
this will be saved for a later PR.

Related: https://github.com/firezone/infra/pull/132
2025-09-05 13:24:02 -07:00
Thomas Eizinger
c891d9c864 fix(relay): re-add eBPF channel map entry on refresh (#10291)
TURN channels have a 5 minute cooldown period after they expire where
they cannot be rebound to another peer but can be refreshed and thus
"reactivated".

To stop routing packets when the channel expires, we remove it from the
channel map of the eBPF code. The client however knows that it still has
a channel that it can reactivate for another 5min. In case it chooses to
do so, we refresh the channel in userspace but until now, forget to
re-populate the eBPF map. This effectively blocks this communication
path from working because the relay reports the channel from being
refreshed successfully, yet the new eBPF kernel drops all packets
without a map entry.
2025-09-05 01:29:50 +00:00
Thomas Eizinger
ec0c7c148b chore(eBPF): minor polish (#10282)
Some follow-up polish for the eBPF module:

- Changes the cfg's to also include Linux, allowing rust-analyzer to
assist with auto-complete etc.
- Moves code to sub-modules of `try_handle_turn`, removing the need for
making them conditional.
- Move all maps to sub-modules to allow for a single place to put
comments: In the module documentation at the top.
- Removes interface IP learning, these are now configured via env
variables.
2025-09-03 03:18:46 +00:00
Mariusz Klochowicz
d07e32d91f chore: Build whole workspace on macos (#10228)
- Add some macos stubs to gui-smoke-test.
- Hide `ebpf-turn-router` binary functionality behind `#[cfg(target_arch
= "bpf")]`

Signed-off-by: Mariusz Klochowicz <mariusz@klochowicz.com>
2025-09-02 06:55:53 +00:00
Thomas Eizinger
9cddfe59fa fix(rust): don't require Internet on startup (#10264)
With the introduction of the pre-resolved Sentry host, all Firezone
clients now require Internet on startup. That is a signficant usability
hit that we can easily fix by simply falling back to resolving the host
on-demand.
2025-09-01 01:31:05 +00:00
Jamil
0ccd4bbf24 feat(ci): enable relay eBPF offloading (#10160)
In CI, eBPF in driver mode actually functions just fine with no changes
to our existing tests, given we apply a few workarounds and bugfixes:

- The interface learning mechanism had two flaws: (1) it only learned
per-CPU, which meant the risk for a missing entry grew as the core count
of the relay host grew, and (2) it did not filter for unicast IPs, so it
picked up broadcast and link-local addresses, causing cross-relay paths
to fail occasionally
- The `relay-relay` candidate where the two relays are the same relay
causes packet drops / loops in the Docker bridge setup, and possibly in
GCP too. I'm not sure this is a valid path that solves a real
connectivity issue in the wild. I can understand relay-relay paths where
two relays are different hosts, and the client and gateway both talk
over their TURN channel to each other (i.e. WireGuard is blocked in each
of their networks), but I can't think of an advantage for a relay-relay
candidate where the traffic simply hairpins (or is dropped) off the
nearest switch. This has been now detected with a new `PacketLoop` error
that triggers whenever source_ip == dest_ip.
- The relays in CI need a common next-hop to talk to for the MAC address
swapping to work. A simple router service is added which functions as a
basic L3 router (no NAT) that allows the MAC swapping to work.
- The `veth` driver has some peculiar requirements to allow it to
function with XDP_TX. If you send a packet out of one interface of a
veth pair with XDP_TX, you need to either make sure both interfaces have
GRO enabled, or you need to attach a dummy XDP program that simply does
XDP_PASS to the other interface so that the sk_buff is allocated before
going up the stack to the Docker bridge. The GRO method was unreliable
and didn't work in our case, causing massive packet delays and
unpredictable bursts that prevented ICE from working, so we use the
XDP_PASS method instead. A simple docker image is built and lives at
https://github.com/firezone/xdp-pass to handle this.

Related: #10138 
Related: #10260
2025-08-31 23:37:03 +00:00
Jamil
d9f1b42595 feat(relay): handle ipv4-ipv6 relaying in eBPF (#10226)
Data has shown that we are doing a significant amount of relaying in
userspace because the latency of which candidates establish first
matters - if an IPv6 to IPv4 path establishes first, we could often pick
that, which would bypass the eBPF relaying altogether.

To address this, we now perform address translation when relaying so
these paths are covered. Preliminary benchmarking on Azure has shown
this performs around ~1.5 Gbps for a single client - gateway path,
scaling linearly with the number clients up to the core count.

On GCP, performance will be a fraction of that because we need to attach
the program in SKB_MODE (generic) based on the fact the `gve` driver
there does not support the needed `bpf_xdp_adjust_head` call.

To keep the verifier happy (and make the verifier error trace log
usable) throughout this large refactor, we unfortunately had to drop
down to pointer arithmetic in this process. This however means that we
have full control (and visibility) over how the bytes are loaded,
stored, and copied. Each struct / abstraction adds a little bit of
overhead on the stack which pushed us over the 512-byte limit. Since we
are generally loading only one set of packet headers onto the stack to
then copy into their new locations, our actual stack usage should be
well the 512-byte limit.

Further performance analysis is required to push past the current
per-core 1.5 Gbps limit. This, along with CI support for integration
testing these codepaths is left for a later date as this PR is already
quite large and needs to soak test for a bit in a live environment
before we push to prod.

Fixes #10192
2025-08-22 06:18:49 +00:00
Thomas Eizinger
c70c88c856 build(deps): upgrade to opentelemetry 0.30 (#10239) 2025-08-21 22:47:39 +00:00
Thomas Eizinger
46afa52f78 feat(telemetry): pre-resolve Sentry ingest host (#10206)
Our Sentry client needs to resolve DNS before being able to send logs or
errors to the backend. Currently, this DNS resolution happens on-demand
as we don't take any control of the underlying HTTP client.

In addition, this will use HTTP/1.1 by default which isn't as efficient
as it could be, especially with concurrent requests.

Finally, if we decide to ever proxy all Sentry for traffic through our
own domain, we have to take control of the underlying client anyway.

To resolve all of the above, we create a custom `TransportFactory` where
we reuse the existing `ReqwestHttpTransport` but provide an already
configured `reqwest::Client` that always uses HTTP/2 with a
pre-configured set of DNS records for the given ingest host.
2025-08-21 03:28:05 +00:00
Jamil
618254cdfc refactor(relay): use zero check for is_learned (#10209)
Simplifies the interface map we store to use a zero-check instead of
explicit bool.

Related:
https://github.com/firezone/firezone/pull/10200#discussion_r2281117072
2025-08-18 21:05:45 +00:00
Thomas Eizinger
da00848549 build(deps): bump to Rust 1.89 (#10208)
Rust 1.89 comes with a new lint that wants us to use explicitly refer to
lifetimes, even if they are elided.
2025-08-18 05:04:55 +00:00
Jamil
f47fb46cc7 feat(relay): learn interface addresses (#10200)
In order to support cross-stack relaying, we need to know what the
source IP is going to be to write the packets from. To know this, we can
simply learn the destination IP address for incoming packets to our XDP
program.

A separate cache is used per IP stack in order be a bit more cache line
friendly and prevent contention when only IP stack lookup is needed.

Related: #10192
2025-08-18 00:33:58 +00:00
Thomas Eizinger
70a930e45d chore(relay): use existing ebpf module import (#10202) 2025-08-17 23:45:36 +00:00
Jamil
b07fa341cf feat(relay): XDP driver (native) mode for gVNIC (#10177)
This updates our eBPF module to use DRV_MODE for less CPU overhead and
better performance for all same-stack TURN relaying.

Notably, gVNIC does not seem to support the `bpf_xdp_adjust_head`
helper, so unfortunately we need to extend / shrink the packet tail and
move the payload instead.

Comprehensive benchmarks have not been performed, but early results show
that we can saturate about 1 Gbps per E2 core on GCP:

```
[SUM]   0.00-30.04  sec  3.16 GBytes   904 Mbits/sec  12088             sender
[SUM]   0.00-30.00  sec  3.12 GBytes   894 Mbits/sec                  receiver
```

This is with 64 TCP streams. More streams will better utilize all
available RX queues, and lead to better performance.

Related: #10138
Fixes: #8633
2025-08-17 15:04:19 +00:00
Jamil
46ffe8fe45 docs(relay): add note on channel map safety (#10194)
A fair bit of time was spent validating these map accesses are
thread-safe, so just documenting that for the next reader to find.

Related:
https://github.com/firezone/firezone/issues/10138#issuecomment-3186074350
2025-08-14 17:27:12 -07:00
Jamil
3e3f555c1e fix(relay): swap MACs for relayed traffic (#10193)
In nearly all environments, we can safely assume that we will always use
the same network gateway for forwarding relayed packets as the one we
received them from.

By leveraging this assumption, we can simply swap the SRC and DST MAC
addresses, removing the need to keep a HaspMap for these, which
eliminates the need to worry about thread-safety for this particular
functionality.

Related: #10138
2025-08-13 14:40:26 -07:00
Jamil
92137ee76b fix(relay): don't inline hotpath loop calls (#10185)
When inlining large(ish) functions that are on the hot-path, it creates
a much longer program for the eBPF verifier to validate since the
verifier is working through all packet sizes and types. We're hitting an
issue on GCP (in the 8-core dev VM, XDP-generic) where verification
fails on `main` due to the inlining of some hot-path functions.

This PR is the smallest possible change that gets the program to load,
highlighting the issue.

In practice, I'm not there is a detectable performance difference
between having these inlined vs not (especially in DRV_MODE) so I'm not
sure it's worth the potential debugging headaches later on.
2025-08-11 22:08:12 +00:00
Thomas Eizinger
2dde3b8573 fix(relay): read from most-recently-ready socket first (#10148)
The relay uses `mio` to react to readiness events from multiple sockets
at once. Including the control port 3478, the relay needs to also send
and receive traffic from up to 16384 sockets (one for each possible
allocation).

We need to process readiness events from these sockets as fairly as
possible. Under high-load, it may otherwise happen that we don't read
packets from an allocation socket, resulting in ICE timeouts of the
connection being relayed.

To achieve this fairness, we collect all readiness tokens into a set and
store it with the number of packets we have read so far from this
socket. Then, we always read from the socket next that we have so far
read the least amount of packets from.
2025-08-06 09:13:05 +00:00
Thomas Eizinger
f27683760a fix(relay): check for ANSI support on stdout (#10149) 2025-08-06 07:42:54 +00:00
Thomas Eizinger
0e32f1c4f2 fix(relay): increase nonce usage to 10000 (#10128)
On a Gateway with a busy connections, only being able to use a nonce 100
times causes unnecessary churn. We increase this to 10000 to be able to
handle bursts of messages such as channel bindings better.
2025-08-05 02:00:57 +00:00
Thomas Eizinger
fbf96c261e chore(relay): remove spans (#9962)
These are flooding our monitoring infra and don't really add that much
value. Pretty much all of the processing the relay does is request in
and out and none of the spans are nested.

We can therefore almost 1-to-1 replicate the logging we do with spans by
adding the fields to each log message.

Resolves: #9954
2025-07-22 13:24:58 +00:00
Thomas Eizinger
0f1c5f2818 refactor(relay): simplify auth module (#9873)
Whilst looking through the auth module of the relay, I noticed that we
unnecessarily convert back and forth between expiry timestamps and
username formats when we could just be using the already parsed version.
2025-07-15 14:14:51 +00:00
Thomas Eizinger
2b70596636 fix(rust): only apply filter to select tracing layers (#9872)
Applying a filter globally to the entire subscriber means it filters
events for all layers. This prevents the Sentry layer from uploading
DEBUG logs if configured.
2025-07-15 13:44:53 +00:00
Thomas Eizinger
d01701148b fix(rust): remove jemalloc (#9849)
I am no longer able to compile `jemalloc` on my system in a debug build.
It fails with the following error:

```
src/malloc_io.c: In function ‘buferror’:
src/malloc_io.c:107:16: error: returning ‘char *’ from a function with return type ‘int’ makes integer from pointer without a cast [-Wint-conversion]
  107 |         return strerror_r(err, buf, buflen);
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

This appears to be a problem with modern versions of clang/gcc. I
believe this started happening when I recently upgraded my system. The
upstream [`jemalloc`](https://github.com/jemalloc/jemalloc) repository
is now archived and thus unmaintained. I am not sure if we ever measured
a significant benefit in using `jemalloc`.

Related: https://github.com/servo/servo/issues/31059
2025-07-12 19:22:06 +00:00
Thomas Eizinger
d6805d7e48 chore(rust): bump to Rust 1.88 (#9714)
Rust 1.88 has been released and brings with it a quite exciting feature:
let-chains! It allows us to mix-and-match `if` and `let` expressions,
therefore often reducing the "right-drift" of the relevant code, making
it easier to read.

Rust.188 also comes with a new clippy lint that warns when creating a
mutable reference from an immutable pointer. Attempting to fix this
revealed that this is exactly what we are doing in the eBPF kernel.
Unfortunately, it doesn't seem to be possible to design this in a way
that is both accepted by the borrow-checker AND by the eBPF verifier.
Hence, we simply make the function `unsafe` and document for the
programmer, what needs to be upheld.
2025-07-12 06:42:50 +00:00
Thomas Eizinger
3b972643b1 feat(rust): stream logs to Sentry when enabled in PostHog (#9635)
Sentry has a new "Logs" feature where we can stream logs directly to
Sentry. Doing this for all Clients and Gateways would be way too much
data to collect though.

In order to aid debugging from customer installations, we add a
PostHog-managed feature flag that - if set to `true` - enables the
streaming of logs to Sentry. This feature flag is evaluated every time
the telemetry context is initialised:

- For all FFI usages of connlib, this happens every time a new session
is created.
- For the Windows/Linux Tunnel service, this also happens every time we
create a new session.
- For the Headless Client and Gateway, it happens on startup and
afterwards, every minute. The feature-flag context itself is only
checked every 5 minutes though so it might take up to 5 minutes before
this takes effect.

The default value - like all feature flags - is `false`. Therefore, if
there is any issue with the PostHog service, we will fallback to the
previous behaviour where logs are simply stored locally.

Resolves: #9600
2025-06-25 16:14:14 +00:00
Thomas Eizinger
fccf5021e6 fix(relay): don't fail event-loop on interrupt (#9592)
When profiling the relay, certain syscalls may get interrupted by the
kernel. At present, this crashes the relay which makes profiling
impossible.

Co-authored-by: Antoine Labarussias <antoinelabarussias@gmail.com>
2025-06-20 18:42:57 +00:00
Thomas Eizinger
e05c98bfca ci: update to new cargo sort release (#9354)
The latest release now also sorts workspace dependencies, as well as
different dependency sections. Keeping these things sorted reduces the
chances of merge conflicts when multiple PRs edit these files.
2025-06-02 02:01:09 +00:00
Thomas Eizinger
cee4be9e24 build(deps): bump Rust dependencies (#9192)
A mass upgrade of our Rust dependencies. Most crucially, these remove
several duplicated dependencies from our tree.

- The Tauri plugins have been stuck on `windows v0.60` for a while. They
are now updated to use `windows v0.61` which is what the rest of our
dependency tree uses.
- By bumping `axum`, can also bump `reqwest` which reduces a few more
duplicated dependencies.
- By removing `env_logger`, we can get rid of a few dependencies.
2025-05-22 13:15:01 +00:00
Thomas Eizinger
37529803ce build(rust): bump otel ecosystem crates to 0.29 (#9029) 2025-05-05 12:33:07 +00:00
Thomas Eizinger
6114bb274f chore(rust): make most of the Rust code compile on MacOS (#8924)
When working on the Rust code of Firezone from a MacOS computer, it is
useful to have pretty much all of the code at least compile to ensure
detect problems early. Eventually, once we target features like a
headless MacOS client, some of these stubs will actually be filled in an
be functional.
2025-04-29 11:20:09 +00:00
Thomas Eizinger
bcbc8cd212 build(rust): bump aya to include BTF information feature (#8883)
The latest version of `aya-build` automatically builds our eBPF program
with BTF information enabled.

Related: https://github.com/aya-rs/aya/pull/1250
2025-04-22 00:36:41 +00:00
Thomas Eizinger
1af7f4f8c1 fix(rust): don't use jemalloc on ARMv7 (#8859)
Doesn't compile on ARMv7 so we just fallback to the default allocator
there.
2025-04-19 22:20:05 +00:00
Thomas Eizinger
a41395a165 feat(eBPF): embed BTF information in eBPF kernel (#8842)
It turns out that the Rust compiler doesn't always say that it is adding
debug information to a binary even when it does! The build output only
displays `[optimized]` when in fact it does actually emit debug
information. Adding an additional linker flag configures `bpf-linker` to
include the necessary BTF information in our kernel.

This makes debugging verifier errors much easier as the program output
contains source code annotiations. It also should make it easier to
debug issues using `xdpdump` which relies on BTF information.

Resolves: #8503
2025-04-19 12:38:59 +00:00
Thomas Eizinger
34f28e2ae6 feat(rust): use jemalloc for Gateway and Relay (#8846)
`jemalloc` is a modern allocator that is designed for multi-threaded
systems and can better handle smaller allocations that may otherwise
fragment the heap. Firezone's components, especially Relays and Gateways
are intended to operate with a long uptime and therefore need to handle
memory efficiently.
2025-04-19 12:25:46 +00:00
Thomas Eizinger
b4afd0bffb refactor(eBPF): reduce size of maps (#8849)
Whilst developing the eBPF module for the relay, I needed to manually
add padding within the key and value structs used in the maps in order
for the kernel to be able to correctly retrieve the data.

For some reason, this seems no longer necessary as the integration test
now passes without this as well.

Being able to remove the padding drastically reduces the size of these
maps for the current number of entries that we allow. This brings the
overall memory usage of the relay down.

Resolves: #8682
2025-04-19 11:46:58 +00:00
Thomas Eizinger
f51fd53708 chore(eBPF): use RangeInclusive::contains again (#8812)
Now that we have figured out what the problem was with the eBPF kernel
not routing certain packets, we can undo the manual implementation of
the allocation range checking again and use the more concise
`RangeInclusive::contains`.

Related: #8809
Related: #8807
2025-04-18 15:49:23 +00:00