Commit Graph

1217 Commits

Author SHA1 Message Date
Thomas Eizinger
685acdac3a feat: add more specific component type to user-agent header (#10457)
In order to allow the portal to more easily classify, what kind of
component is connecting, we extend the `get_user_agent` header to
include a component type instead of the generic `connlib/`.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2025-09-26 00:18:36 +00:00
Thomas Eizinger
0310bafbcd feat(clients): gracefully close connections on shutdown (#10400)
In #10076, connlib gained the ability to gracefully close connections
between peers. The Gateway already uses this when it is being gracefully
shutdown such as during an upgrade. This allows Clients to immediately
fail-over to a different Gateway instead of waiting for an ICE timeout.

When a Client signs out, we currently just drop all the state, resulting
in an ICE timeout on the Gateway ~15 seconds later. This makes it
difficult for us to analyze, whether an ICE timeout in the logs presents
an actual problem where a network connection got cut or whether the
Client simply signed out.

Whilst not water-tight, attempting to gracefully close our connections
when the Client signs out is better than nothing so we implement this
here.

All Clients use the `Session` abstraction from `client-shared` which
spawns the event-loop into a dedicated task.

- For the Linux and Windows GUI client, the already present tokio
runtime instance of the tunnel service is used for this.
- For Android and Apple, we create a dedicated, single-threaded runtime
instance for connlib.
- For the headless client, we also reuse the already existing tokio
runtime instance of the binary.

In case of Android, Apple and the headless client, this means we need to
ensure the tokio runtime instances stays alive long enough to actually
complete the graceful shutdown task. We achieve this by draining the
`EventStream` returned from `Session`. The `EventStream` is a wrapper
around a channel connected to the event-loop. This stream only finishes
once the event-loop is entirely dropped (and therefore completed the
graceful shutdown) as it holds the sender-end of the channel.

In case of the Linux and Windows GUI client, the runtime outlives the
`Session` because it is scoped to the entire tunnel process. Therefore,
no additional measures are necessary there to ensure the graceful
shutdown task completes.
2025-09-23 03:40:52 +00:00
Thomas Eizinger
8e00870942 refactor(gateway): close connections on error (#10401)
Previously, the Gateway would only proactively close connections to its
peers when it was shutdown gracefully via a SIGTERM or SIGINT signal. By
copying the same design for the event-loop as I've implemented in
#10400, we can now also initiate the graceful shutdown in case the
event-loop exits with an error.
2025-09-20 20:55:48 +00:00
Thomas Eizinger
e20929ad73 build(deps): bump Rust version to 1.90 (#10380)
One of the more quiet Rust releases with no new clippy lints that would
require code updates.
2025-09-20 04:28:03 +00:00
Thomas Eizinger
9c8101a3ee chore: render contextual information more Sentry-friendly (#10386)
Sentry can group issues together that have unique identifiers in their
message. Unfortunately, it does that only well for integers and UUIDs
and not so much for hex-values. To avoid alert fatigue, we render the
public key as a u256 which hopefully allows Sentry to group these
together.
2025-09-20 12:08:03 +10:00
Thomas Eizinger
90d10a8634 refactor(connlib): improve fairness of event-loop (#10347)
The event-loop inside `Tunnel` processes input according to a certain
priority. We only take input from lower priority sources when the higher
priority sources are not ready. The current priorities are:

- Flush all buffers
- Read from UDP sockets
- Read from TUN device
- Read from DNS servers
- Process recursive DNS queries
- Check timeout

The idea of this priority ordering is to keep all kinds of processing
bounded and "finish" any kind of work that is on-going before taking on
new work. Anything that sits in a buffer is basically done with
processing and just needs to be written out to the network / device.
Arriving UDP packets have already traversed the network and been
encrypted on the other end, meaning they are higher priority than
reading from the TUN device. Packets from the TUN device still need to
be encrypted and sent to the remote.

Whilst there is merit in this design, it also bears the potential of
starving input sources further down if the top ones are extremely busy.
To prevent this, we refactor `Io` to read from all input sources and
present it to the event-loop as a batch, allowing all sources to make
progress before looping around. Since this event-loop has first been
conceived, we have refactored `Io` to use background threads for the UDP
sockets and TUN device, meaning they will make progress by themselves
anyway until the channels to the main-thread fill up. As such, there
shouldn't be any latency increase in processing packets even though we
are performing slightly more work per event-loop tick.

This kind of batch-processing highlights a problem: Bailing out with an
error midway through processing a batch leaves the remainder of the
batch unprocessed, essentially dropping packets. To fix this, we
introduce a new `TunnelError` type that presents a collection of errors
that we encountered while processing the batch. This might actually also
be a problem with what is currently in `main` because we are already
batch-processing packets there but possibly are bailing out midway
through the batch.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Mariusz Klochowicz <mariusz@klochowicz.com>
2025-09-17 23:28:36 +00:00
Thomas Eizinger
3e6094af8d feat(linux): try to set rmem_max and wmem_max on startup (#10349)
The default send and receive buffer sizes on Linux are too small (only
~200 KB). Checking `nstat` after an iperf run revealed that the number
of dropped packets in the first interval directly correlates with the
number of receive buffer errors reported by `nstat`.

We already try to increase the send and receive buffer sizes for our UDP
socket but unfortunately, we cannot increase them beyond what the system
limits them to. To workaround this, we try to set `rmem_max` and
`wmem_max` during startup of the Linux headless client and Gateway. This
behaviour can be disabled by setting `FIREZONE_NO_INC_BUF=true`.

This doesn't work in Docker unfortunately, so we set the values manually
in the CI perf tests and verify after the test that we didn't encounter
any send and receive buffer errors.

It is yet to be determined how we should deal with this problem for all
the GUI clients. See #10350 as an issue tracking that.

Unfortunately, this doesn't fix all packet drops during the first iperf
interval. With this PR, we now see packet drops on the interface itself.
2025-09-17 23:05:01 +00:00
Thomas Eizinger
7222167b13 fix(connlib): limit the number of optimistic candidates (#10367)
To facilitate direct connections, `connlib` generates "optimistic"
candidates that combine the port of the host candidate with the IP of
the server-reflexive candidate. This allows sysadmins to port-forward
the Firezone port 52625 on the Gateway, allowing for direct connections
to happen behind symmetric NAT.

This feature is only really useful for IPv4 as IPv6 doesn't need
symmetric NAT due to the larger address space. It is also quite common
that users have multiple IPv6 addresses on a single interface. The
combination of the two can result in CPU spikes on the Gateway if a
client connects and sends over e.g. 10 IPv6 host candidates and various
IPv6 server-reflexive candidates. The Gateway then ends up in a loop
where it creates an NxM matrix of all these candidates.

To mitigate this, we disable optimistic candidates for IPv6 altogether
and limit the number of IPv4 optimistic candidates to 2.
2025-09-17 19:52:29 +00:00
Thomas Eizinger
69afe71215 refactor(connlib): remove concept of "ReplyMessages" (#10361)
In earlier versions of Firezone, the WebSocket protocol with the portal
was using the request-response semantics built into Phoenix. This
however is quite cumbersome to work with to due to the polymorphic
nature of the protocol design.

We ended up moving away from it and instead only use one-way messages
where each event directly corresponds to a message type. However, we
have never removed the capability reply messages from the
`phoenix-channel` module, instead all usages just set it to `()`.

We can simplify the code here by always setting this to `()`.

Resolves: #7091
2025-09-17 04:10:56 +00:00
Thomas Eizinger
a66a18782e chore(connlib): add context to IP packet parse errors (#10337)
We are seeing some very strange IP packet parse errors coming from MacOS
devices. To better understand these, we extend the error messages with
the src and dst IP as well as the L4 header.

Related: #10335
2025-09-12 14:11:12 +00:00
Thomas Eizinger
33a75f6fee chore(headless-client): don't make failures look like crashes (#10290)
Returning an error from `main` by default prints a backtrace. This may
lead users to believe that the program is crashing when in fact it is
exiting in a controlled way but with an error (such as when we don't
have Internet during startup).

Printing the chain of errors ourselves resolves this.
2025-09-10 01:08:32 +00:00
Thomas Eizinger
03ac73ac00 fix(gateway): reset DNS resource NAT if proxy IPs change (#10310)
In #10040, we decided to persist a peer's routing state on the Gateway
across ICE sessions. This routing state also includes the DNS resource
NAT.

Prior to #10104 (which is not released yet), when a Client signs out and
back in, it resets the proxy IP mapping for DNS resources and will start
numbering them again from the front, i.e. starting from 100.96.0.1. With
the state still being preserved on the Gateway, this represents a
problem: We keep existing mappings around if there is still a NAT
session for this proxy IP. However, if the proxy IP is actually for a
different domain, this NAT session is meaningless. In fact, not
replacing the IP is problematic as we will now route packets for the new
proxy IP to the wrong destination.

The persistent DNS resource mapping from #10104 fixes this. In this PR,
we add an additional check to the Gateway where we detect whether the
Client has started to re-assign proxy IPs and if so, we completely reset
the DNS resource NAT state including all existing NAT sessions.

Fixes #10268
2025-09-09 02:08:26 +00:00
Thomas Eizinger
ead1f40101 chore(gateway): only log skipped NAT entry if IP differs (#10285)
When we resolve a DNS resource domain name on the Gateway, we establish
the mapping between proxy IPs and resolved IPs in order to correctly NAT
traffic. These domains are re-resolved every time the Client sees a DNS
query for it. Thus, established connections could be interrupted if the
IPs returned by consecutive DNS queries are different.

Many SaaS products (GitHub for example) use DNS to load balance between
different IPs. In order to not interrupt those connections, we check
whether we have an open NAT session for an existing mapping every time
we re-resolve DNS.

This log is currently printed too often though because it doesn't take
into account whether the IPs actually changed. If the IP is the same, we
don't need to print this because the update is a no-op.
2025-09-04 21:12:46 +00:00
Thomas Eizinger
fb7b001cbf chore(rust): fix unused variable warning (#10283) 2025-09-03 01:17:11 +00:00
Thomas Eizinger
d718c5de8e fix(connlib): retry packets on IO error 5 (#10279)
Unfortunately, it isn't very easy to detect whether a socket supports
GSO on Linux. Hence, `quinn-udp` simply probes for its support by trying
to send GSO batches and effectively disables GSO by setting the
`max-gso-segments` state variable to 1 if it encounters either EINVAL
(-22) or EIO (-5).

For EINVAL, `quinn-udp` has an internal retry mechanism. For EIO, the
`Transmit` which is passed to `quinn-udp` needs to be re-chunked and
thus cannot be automatically retried.

In order to avoid dropping packets, we therefore add a once-off retry
step to sending a datagram whenever we hit EIO on Linux or Android. If
the error was due to GSO not being supported, the 2nd attempt should be
successful and going forward, even the first one should be until we roam
the socket (where this state variable gets reset).

These packet drops have been causing flakiness in CI ever since we
merged the eBPF tests. Those disable checksum offloading which appears
to trigger these errors.
2025-09-02 21:31:57 +00:00
Thomas Eizinger
e84bdc5566 refactor(connlib): periodically record queue depths (#10242)
Instead of recording the queue depths on every event-loop tick, we now
record them once a second by setting a Gauge. Not only is that a simpler
instrument to work with but it is significantly more performant. The
current version - when metrics are enabled - takes on quite a bit of CPU
time.

Resolves: #10237
2025-09-02 02:57:36 +00:00
Thomas Eizinger
a9e1b0fbfb chore(connlib): print full error when failing to read IP packet (#10275)
The error returned from `IpPacket::new` is an `anyhow::Error` but in
order to return it from `async_io`, we need to wrap it in an
`io::Error`. Printing an `io::Error` only prints the top-level error. To
fix this, we re-wrap the `io::Error` in an `anyhow::Error` again and
toggle "alternate" printing mode to see the full error chain.
2025-09-01 13:39:26 +00:00
Thomas Eizinger
0c2e54f54c feat(connlib): persistent DNS resource records across sessions (#10104)
When we receive a DNS query for a DNS resource in Firezone, we take the
next available 4 IPs from the CG-NAT range and assign them to the domain
name. For example, if `example.com` is a DNS resource and it is the
first resource being queried in a Firezone session, we will assigned the
IPs `100.96.0.1` - `100.96.0.4` to it. If the user now restarts Firezone
or signs out and back in, this state is lost and we assign those same
IPs to the next DNS query coming in.

This creates a problem for applications that do not re-query DNS very
often or never. They expect these IPs to not change. Restarting software
or signing out and back in is a common approach to fixing software
problems, yet in this specific case, doing so may create even more
problems for the user.

To mitigate this, `ClientState` introduce a new event
`DnsRecordsChanged` that gets emitted to the event-loop every time we
assign new records. The event-loop then caches this in memory and reuses
it in case a new session is initiated. The records are only stored
in-memory and not on disk. Most likely, the tunnel process will be alive
for the entire OS session.

To verify this behaviour, we add a new `RestartClient` transition to our
proptests. In the proptests, we already keep a mapping of all DNS names
we ever resolved, including DNS resources. When generating IP traffic,
we sample from this list of IPs and then expect the packet to be routed.
By replacing the `ClientState` as part of this transition and re-seeding
it with the previously exported DNS records, we can verify that packets
to IPs resolved from a previous session still get successfully routed to
the resource.

Related: #5498
2025-09-01 07:29:28 +00:00
Thomas Eizinger
533f4c319b feat(connlib): gracefully shutdown connections (#10076)
Right now, connections cannot be actively closed in Firezone. The
WireGuard tunnel and the ICE agent are coupled together, meaning only if
either one of them fails will we clean up the connection. One exception
here is when the Client roams. In that case, the Client simply clears
its local memory completely and then re-establishes all necessary
connections by re-requesting access.

There are three cases where gracefully closing a connection is useful:

1. If an access authorization is revoked or expires and this was the
last resource authorisation for that peer, we don't currently remove the
connection on the Gateway. Instead, the Client is still able to send
packets by they'll be dropped because we don't have a peer state
anymore.
1. If a Gateway gets restarted due to e.g. an upgrade or other
maintenance work, it loses all its connections and every Client needs to
wait for the ICE timeout (~15 seconds) before it can establish a new
one.
1. If a Client has its access revoked for all resources it has access to
in a particular site we also don't remove this connection, even though
it has become practically useless.

All of these cases are fixed with this PR. Here we introduce a way to
gracefully shutdown a connection without forcing the other side into an
ICE timeout. The graceful connection shutdown works by introducing a new
"goodbye" p2p control protocol message. Like all our p2p control
protocol messages, this is based on IP and therefore delivery is not
guaranteed. In other words, this "goodbye" message is sent on a
best-effort basis.

In the case of shutdown, the Gateway will wait for all UDP packets to be
flushed but will not resend them or wait for an ACK.

If either end receives such a "goodbye" message, they simply remove the
local peer and connection state just as if the connection would have
failed due to either ICE or WireGuard. For the Client, this means that
the next packet for a resource will trigger a new access authorization
request.
2025-09-01 06:30:13 +00:00
Thomas Eizinger
544ba11f21 chore(rust): allow too_many_arguments repo-wide (#10236)
We always end up allow this lint when it pops up so we can also just
allow it for the whole repo in general. Most of the time, the reason for
too many arguments are borrow-checker limitations of Rust where mutable
references need to be tracked explicitly.
2025-08-22 13:21:07 +00:00
Thomas Eizinger
c70c88c856 build(deps): upgrade to opentelemetry 0.30 (#10239) 2025-08-21 22:47:39 +00:00
Thomas Eizinger
99155490c5 chore(connlib): make UDP buffer sizes tunable at runtime (#10234)
For easier benchmarking, we make the UDP socket send and receive buffers
runtime-tunable.

Related: #7452
2025-08-21 18:18:14 +00:00
Thomas Eizinger
f85ae75ae0 refactor(connlib): increase UDP queues on desktop platforms (#10235)
On desktop platforms, we can easily afford to have larger queues here
despite each item in there being 65k. Benchmarking showed that we do
sometimes fill these up.

Related: #7452
2025-08-21 08:56:14 +00:00
Thomas Eizinger
a109c1a2ef feat(connlib): discard intermediate resource and TUN updates (#10223)
Right now, the Client event-loops have a channel with 1000 items for
sending new resource lists and updates to the TUN device to the host
app. This is kind of unnecessary as we always only care about the last
version of these. Intermediate updates that the host app doesn't process
are effectively irrelevant.

We've had an issue before where a bug in the portal caused us to receive
many updates to resources which ended up crashing Client apps because
this channel filled up.

To be more resilient on this front, we refactor the Client event loop to
use a `watch` channel for this. Watch channels only retain the last
value that got sent into them.
2025-08-21 05:42:54 +00:00
Thomas Eizinger
4e11112d9b feat(connlib): improve throughput on higher latencies (#10231)
Turns out the multi-threaded access of the TUN device on the Gateway
causes packet reordering which makes the TCP congestion controller
throttle the connection. Additionally, the default TX queue length of a
TUN device on Linux is only 500 packets.

With just a single thread and an increased TX queue length, we get a
throughput performance of just over 1 GBit/s for a 20ms link between
Client and Gateway with basically no packet drops:

```
Connecting to host 172.20.0.110, port 5201
[  5] local 100.79.130.70 port 49546 connected to 172.20.0.110 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   116 MBytes   977 Mbits/sec    0   6.40 MBytes       
[  5]   1.00-2.00   sec   137 MBytes  1.15 Gbits/sec    0   6.40 MBytes       
[  5]   2.00-3.00   sec   134 MBytes  1.13 Gbits/sec    0   6.40 MBytes       
[  5]   3.00-4.00   sec   136 MBytes  1.14 Gbits/sec   47   6.40 MBytes       
[  5]   4.00-5.00   sec   137 MBytes  1.15 Gbits/sec    0   6.40 MBytes       
[  5]   5.00-6.00   sec   138 MBytes  1.16 Gbits/sec    0   6.40 MBytes       
[  5]   6.00-7.00   sec   138 MBytes  1.15 Gbits/sec    0   6.40 MBytes       
[  5]   7.00-8.00   sec   138 MBytes  1.15 Gbits/sec    0   6.40 MBytes       
[  5]   8.00-9.00   sec   138 MBytes  1.16 Gbits/sec    0   6.40 MBytes       
[  5]   9.00-10.00  sec   138 MBytes  1.15 Gbits/sec    0   6.40 MBytes       
[  5]  10.00-11.00  sec   139 MBytes  1.17 Gbits/sec    0   6.40 MBytes       
[  5]  11.00-12.00  sec   139 MBytes  1.17 Gbits/sec    0   6.40 MBytes       
[  5]  12.00-13.00  sec   136 MBytes  1.14 Gbits/sec    0   6.40 MBytes       
[  5]  13.00-14.00  sec   139 MBytes  1.17 Gbits/sec    0   6.40 MBytes       
[  5]  14.00-15.00  sec   140 MBytes  1.17 Gbits/sec    0   6.40 MBytes       
[  5]  15.00-16.00  sec   138 MBytes  1.16 Gbits/sec    0   6.40 MBytes       
[  5]  16.00-17.00  sec   137 MBytes  1.15 Gbits/sec    0   6.40 MBytes       
[  5]  17.00-18.00  sec   139 MBytes  1.17 Gbits/sec    0   6.40 MBytes       
[  5]  18.00-19.00  sec   138 MBytes  1.16 Gbits/sec    0   6.40 MBytes       
[  5]  19.00-20.00  sec   136 MBytes  1.14 Gbits/sec    0   6.40 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-20.00  sec  2.67 GBytes  1.15 Gbits/sec   47             sender
[  5]   0.00-20.02  sec  2.67 GBytes  1.15 Gbits/sec                  receiver

iperf Done.

```

For further debugging in the future, we are now recording the send and
receive queue depths of both the TUN device and the UDP sockets. Neither
of those showed to be full in my testing which leads me to conclude that
it isn't any buffer inside Firezone that is too small here.

Related: #7452

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2025-08-20 23:08:56 +00:00
Thomas Eizinger
da00848549 build(deps): bump to Rust 1.89 (#10208)
Rust 1.89 comes with a new lint that wants us to use explicitly refer to
lifetimes, even if they are elided.
2025-08-18 05:04:55 +00:00
Thomas Eizinger
507a8957c2 chore(connlib): only debug-assert non-retransmitted DNS queries (#10136)
When we receive the same TCP DNS query twice, we currently wrongly hit a
debug assert.
2025-08-06 11:26:51 +00:00
Thomas Eizinger
2841fd0017 chore(connlib): spawn dedicated tasks for UDP send/recv (#10147)
At the moment, `connlib`'s UDP thread spawns a single task for reading
and writing to the UDP socket. It will always first try to write data
before reading new data. To avoid scheduling issues, we split this into
two dedicated tasks and insert

```rust
tokio::task::yield_now().await;
```

into each loop. This allows the `tokio` runtime to schedule each of the
tasks fairly even if one of them is very busy.

For example, if we are very busy writing data (because we are receiving
a lot of IP traffic), this ensures that we will occasionally also read
from our socket to receive STUN control messages from our peers.
2025-08-06 07:38:01 +00:00
Thomas Eizinger
3e46727362 chore(snownet): improve logging of boringtun session index (#10135)
Previously, boringtun's sender/receiver index of a session would just be
rendered as a full u32. In reality, this u32 contains two pieces of
information: The higher 24 bits identify the peer and the lower 8 bits
identify the session with that peer. With the update to boringtun in
https://github.com/firezone/boringtun/pull/112, we encode this logic in
a dedicated type that has prints this information separately. Here is
what the logs now look like:

```
2025-08-05T07:38:37.742Z DEBUG boringtun::noise: Received handshake_response local_idx=(3428714|1) remote_idx=(1937676|1)
2025-08-05T07:38:37.743Z DEBUG boringtun::noise: New session idx=(3428714|1)
2025-08-05T07:38:37.743Z DEBUG boringtun::noise: Sending keepalive local_idx=(3428714|1)
```
2025-08-05 13:08:32 +00:00
Thomas Eizinger
96579483d8 fix(phoenix-channel): timeout room join after 5s (#10130)
If we fail to join a given room for longer than 5s, we fail the
WebSocket connection and reconnect.
2025-08-05 02:00:26 +00:00
Thomas Eizinger
d1cbf4f76d chore(snownet): fix relay sampling spam (#10127)
When we disconnect from a relay, we currently spam `Failed to sample new
relay for connection` until we connect to a new one.
2025-08-05 00:16:28 +00:00
Thomas Eizinger
27de29fee7 test(connlib): downgrade error (#10123)
We can run into this when multiple DNS queries all need to be sent to
the same Gateway and we don't have a connection yet. Hence, downgrade
this error to a debug log.
2025-08-04 13:30:24 +00:00
Thomas Eizinger
1222be8fc9 fix(snownet): de-multiplex packets based on WG session index (#10109)
Right now, `snownet` de-multiplexes WireGuard packets based on their
source tuple (IP + port) to the _first_ connection that would like to
handle this traffic. What appears to be happening based on observation
from customer logs is that we sometimes dispatch the traffic to the
wrong connection.

The WireGuard packet format uses session indices to declare, which
session a packet is for. The local session index is selected during the
handshake for a particular session.

By associating the different session indices (we can have up to 8 in
parallel per peer) with our Firezone-specific connection ID, we can
change our de-multiplexing scheme to uses these indices instead of the
source tuple. This is especially important for Gateways as those talk to
multiple different clients.

The session index is a 32-bit integer where the top 24 bits identify the
connection and the bottom 8 bits are used in a round-robin fashion to
identify individual sessions within the connection. Thus, to find the
correct connection, we right-shift the session index of an incoming
packet to arrive back at the 24-bit connection identifier.

In environments with a limited number of ports outside the NAT, a
connection from a new Client may come from a source tuple of a previous
Client. In such a case, we'd dispatch the packets to the wrong
connection, causing the Client to not be able to handshake a tunnel.
2025-08-04 23:35:48 +10:00
Thomas Eizinger
324edce754 feat(connlib): roll-over WireGuard session on connection upsert (#10102)
When a Client upserts a connection to a Gateway, we currently assume
that the connection is still intact. After all, it hasn't hit an ICE
timeout, otherwise the connection would not be present in memory. If
however the Gateway restarted or somehow lost its connection state and
the Client hasn't noticed yet, then the upsert will be an _insert_ for
the Gateway and ICE will create a new connection for us.

In order to ensure that the WireGuard tunnel state and ICE are
synchronized at all times, we also need to handshake a new session.

`boringtun` maintains up to 8 concurrent sessions for us. This allows
for a smooth roll-over where packets encrypted with the keys from
previous sessions can still be decrypted. Thus, we can easily roll-over
the session on every connection upsert without any trouble.

To ensure that this doesn't happen _very_ rapidly, we debounce these
proactive session roll-overs to happen at most every 20s.

This follows the idea of MADR-0017.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2025-08-04 00:20:28 +00:00
Thomas Eizinger
cd177a6448 fix(gateway): don't remove peer state on disconnect (#10040)
When the connection to a Client disappears, the Gateway currently clears
all state related to this peer. Whilst eagerly cleaning up memory can be
good, in this case, it may lead to the Client thinking it has access to
a resource when in reality it doesn't.

Just because the connection to a Client failed doesn't mean their access
authorizations are invalid. In case the Client reconnects, it should be
able to just continue sending traffic.

At the moment, this only works if the connection also failed on the
Client and therefore, its view of the world in regards to "which
resources do I have access to" was also reset.

What we are seeing in Sentry reports though is that Clients are
attempting to access these resources, thinking they have access but the
Gateway denies it because it has lost the access authorization state.
2025-08-02 08:27:49 +00:00
Thomas Eizinger
a4eb6509c6 chore(snownet): fix buffering log message (#10060)
What we are actually buffering here are unencrypted IP packets that are
waiting for the tunnel to be established.
2025-08-01 22:51:49 +00:00
Thomas Eizinger
17a18fdfbb feat(connlib): always use candidates in order of priority (#10063)
To make things easier to debug, we enforce the order that candidates are
processed in. We want candidates to be processed in the order of their
inverse priority as higher priorities are better. For example, a host
candidate has a higher priority than a relay candidate.

This will make our logs more consistent because a `0-0` candidate pair
is always a `host-host` pair.

We enforce this with our own `IceCandidate` type which implements
`PartialOrd` and `Ord`. This now moves the deserialisation for the
portal messages to a `Deserialise` impl on this type. In order to ensure
that a single faulty candidate doesn't invalidate the entire list, we
use `serde_with` to skip over those elements that cannot be
deserialised.
2025-08-01 01:57:29 +00:00
Thomas Eizinger
52a9079d6a feat(snownet): use in-flight channels to relay data (#10062)
In #7548, we added a feature to Firezone where TURN channels get bound
on-demand as they are needed. To ensure many communication paths work,
we also proactively bind them as soon as we receive a candidate from a
remote.

When a new remote candidate gets added, str0m forms pairs with all the
existing local candidates and starts testing these candidate pairs. For
local relay candidates, this means sending a channel data message from
the allocation.

At the moment, this results in the following pattern in the logs:

```
Received candidate from remote cid=20af9d29-c973-4d77-909a-abed5d7a0234 candidate=Candidate(relay=[3231E680683CFC98E69A12A60F426AA5E5F110CB]:62759/udp raddr=[59A533B0D4D3CB3717FD3D655E1D419E1C9C0772]:0 prio=37492735)
No channel to peer, binding new one active_socket=462A7A508E3C99875E69C2519CA020330A6004EC:3478 peer=[3231E680683CFC98E69A12A60F426AA5E5F110CB]:62759
Already binding a channel to peer active_socket=Some(462A7A508E3C99875E69C2519CA020330A6004EC:3478) peer=[3231E680683CFC98E69A12A60F426AA5E5F110CB]:62759
class=success response from=462A7A508E3C99875E69C2519CA020330A6004EC:3478 method=channel bind rtt=9.928424ms tid=042F52145848D6C1574BB997 
```

What happens here is:

1. We receive a new candidate and proactively bind a channel (this is a
silent operation and therefore not visible in the logs).
2. str0m formed new pairs for these candidates and starts testing them,
triggering a new channel binding because the previous one isn't
completed yet.
3. We refuse to make another channel binding because we see that we
already have one in-flight.
4. The channel binding succeeds.

What we do now is:

If we want to send data to a peer through a channel, we check whether we
have a connected OR an in-flight channel and send it in both cases. If
the channel binding is still in-flight, we therefore just pipeline the
channel data message just after it. Chances are that - assuming no
packet re-orderings on the network - by the time our channel data
message arrives at the relay that binding is active and can be relayed.

This allows the very first binding attempt from str0m to already succeed
instead of waiting for the timeout and sending another binding request.
In addition, it makes these logs less confusing.
2025-07-31 04:10:38 +00:00
Thomas Eizinger
e07e45ed29 chore(snownet): allow filtering TURN traffic in logs (#10061)
Our TURN traffic is fairly minimal for this to be okay on DEBUG (instead
of TRACE). However, it can be quite noisy when one is just scanning
through the logs. Putting it on another target allows us to filter those
out later.

Note that these only concern the TURN control protocol. Channel data
messages are separate from this and **not** logged.
2025-07-31 03:46:59 +00:00
Thomas Eizinger
5753b72a5e chore(snownet): fix typo in PeerSocket formatting (#10049) 2025-07-30 22:58:22 +00:00
Thomas Eizinger
6c1c42ea22 chore(snownet): fix handle_timeout span (#10046)
Spans only attach to logs of lower severity, i.e. a DEBUG span is only
visible for DEBUG and TRACE statements. In order to see the connection
ID here with our INFO statements, we need to make it an INFO span.

Also, a span does nothing unless it is entered 🤦‍♂️
2025-07-30 03:12:11 +00:00
Thomas Eizinger
69f9a03ee8 refactor(connlib): simplify IpPacket struct (#9795)
With the removal of the NAT64/46 modules, we can now simplify the
internals of our `IpPacket` struct. The requirements for our `IpPacket`
struct are somewhat delicate.

On the one hand, we don't want to be overly restrictive in our parsing /
validation code because there is a lot of broken software out there that
doesn't necessarily follow RFCs. Hence, we want to be as lenient as
possible in what we accept.

On the other hand, we do need to verify certain aspects of the packet,
like the payload lengths. At the moment, we are somewhat too lenient
there which causes errors on the Gateway where we have to NAT or
otherwise manipulate the packets. See #9567 or #9552 for example.

To fix this, we make the parsing in the `IpPacket` constructor more
restrictive. If it is a UDP, TCP or ICMP packet, we attempt to fully
parse its headers and validate the payload lengths.

This parsing allows us to then rely on the integrity of the packet as
part of the implementation. This does create several code paths that can
in theory panic but in practice, should be impossible to hit. To ensure
that this does in fact not happen, we also tackle an issue that is long
overdue: Fuzzing.

Resolves: #6667 
Resolves: #9567
Resolves: #9552
2025-07-29 04:42:57 +00:00
Thomas Eizinger
879f68cf73 refactor(connlib): use extract_if to expire resources (#10039)
Rust 1.88 shipped a new std-function on `HashMap` to conditionally
extract elements from a `HashMap`. This is handy for time-based expiry
of resources on the Gateway.
2025-07-29 03:33:47 +00:00
Thomas Eizinger
5c3b15c1a9 chore(connlib): harmonise naming of IDs (#10038)
When filtering through logs in Sentry, it is useful to narrow them down
by context of a client, gateway or resource. Currently, these fields are
sometimes called `client`, `cid`, `client_id` etc and the same for the
Gateway and Resources.

To make this filtering easier, name all of them `cid` for Client IDs,
`gid` for Gateway IDs and `rid` for Resource IDs.
2025-07-29 03:33:09 +00:00
Thomas Eizinger
e81dc452f7 refactor(connlib): use a lock-free queue for the buffer pool (#9989)
We use several buffer pools across `connlib` that are all backed by the
same buffer-pool library. Within that library, we currently use another
object-pool library to provide the actual pooling functionality.

Benchmarking has shown that spend quite a bit of time (a few % of total
CPU time), fighting for the lock to either add or remote a buffer from
the pool. This is unnecessary. By using a queue, we can remove buffers
from the front and add buffers at the back, both of which can be
implemented in a lock-free way such that they don't contend.

Using the well-known `crossbeam-queue` library, we have such a queue
directly available.

I wasn't able to directly measure a performance gain in terms of
throughput. What we can measure though, is how much time we spend
dealing with our buffer pool vs everything else. If we compare the
`perf` outputs that were recorded during an `iperf` run each, we can see
that we spend about 60% less time dealing with the buffer pool than we
did before.

|Before|After|
|---|---|
|<img width="1982" height="553" alt="Screenshot From 2025-07-24
20-27-50"
src="https://github.com/user-attachments/assets/1698f28b-5821-456f-95fa-d6f85d901920"
/>|<img width="1982" height="553" alt="Screenshot From 2025-07-24
20-27-53"
src="https://github.com/user-attachments/assets/4f26a2d1-03e3-4c0d-84da-82c53b9761dd"
/>|

The number in the thousands on the left is how often the respective
function was the currently executing function during the profiling run.

Resolves: #9972
2025-07-28 21:39:11 +00:00
Thomas Eizinger
55304b3d2a refactor(snownet): learn host candidates from TURN traffic (#9998)
Presently, for each UDP packet that we process in `snownet`, we check if
we have already seen this local address of ours and if not, add it to
our list of host candidates. This is a safe way for ensuring that we
consider all addresses that we receive data on as ones that we tell our
peers that they should try and contact us on.

Performance profiling has shown that hashing the socket address of each
packet that is coming in is quite wasteful. We spend about 4-5% of our
main thread time doing this. For comparison, decrypting packets is only
about 30%.

Most of the time, we will already know about this address and therefore,
spending all this CPU time is completely pointless. At the same time
though, we need to be sure that we do discover our local address
correctly.

Inspired by STUN, we therefore move this responsibility to the
`allocation` module. The `allocation` module is responsible for
interacting with our TURN servers and will yield server-reflexive and
relay candidates as a result. It also knows, what the local address is
that it received traffic on so we simply extend that to yield host
candidates as well in addition to server-reflexive and relay candidates.

On my local machine, this bumps us across the 3.5 Gbits/sec mark:

```
Connecting to host 172.20.0.110, port 5201
[  5] local 100.93.174.92 port 57890 connected to 172.20.0.110 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   319 MBytes  2.67 Gbits/sec   18    548 KBytes       
[  5]   1.00-2.00   sec   413 MBytes  3.46 Gbits/sec    4    884 KBytes       
[  5]   2.00-3.00   sec   417 MBytes  3.50 Gbits/sec    4   1.10 MBytes       
[  5]   3.00-4.00   sec   425 MBytes  3.56 Gbits/sec  415    785 KBytes       
[  5]   4.00-5.00   sec   430 MBytes  3.60 Gbits/sec  154    820 KBytes       
[  5]   5.00-6.00   sec   434 MBytes  3.64 Gbits/sec  251    793 KBytes       
[  5]   6.00-7.00   sec   436 MBytes  3.66 Gbits/sec  123    811 KBytes       
[  5]   7.00-8.00   sec   435 MBytes  3.65 Gbits/sec    2    788 KBytes       
[  5]   8.00-9.00   sec   423 MBytes  3.55 Gbits/sec    0   1.06 MBytes       
[  5]   9.00-10.00  sec   433 MBytes  3.63 Gbits/sec    8   1017 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-20.00  sec  8.21 GBytes  3.53 Gbits/sec  1728             sender
[  5]   0.00-20.00  sec  8.21 GBytes  3.53 Gbits/sec                  receiver

iperf Done.
```
2025-07-28 21:38:39 +00:00
Thomas Eizinger
9c71026416 chore(connlib): gate more trace logs on debug_assertions (#10026)
These are otherwise hit pretty often in the hot-path and slow packet
routing down because tracing needs to evaluate whether it should log the
statement.
2025-07-28 21:38:23 +00:00
Thomas Eizinger
fb9a142a9e chore(snownet): add back span in handle_timeout (#10025)
Whilst entering and leaving a span for every packet is very expensive,
doing the same whenever we make timeout related changes is just fine.
Thus, we re-introduce a span removed in #9949 but only for the
`handle_timeout` function.

This gives us the context of the connection ID for not just our own
logs, but also the ones from `boringtun`.
2025-07-28 04:14:39 +00:00
Thomas Eizinger
bfa77bf7fc chore(snownet): log connection ID in more places (#10023)
With the removal of the span in #9949, we now need to explicitly log the
connection ID in a few more places to have the necessary context.
2025-07-28 02:01:01 +00:00
Thomas Eizinger
ce5650b554 fix(snownet): compare preshared_key on connection upsert (#9999)
By chance, I've discovered in a CI failure that we won't be able to
handshake a new session if the `preshared_key` changes. This makes a lot
of sense. The `preshared_key` needs to be the same on both ends as it is
a shared secret that gets mixed into the Noise handshake.

In following sequence of events, we would thus previously run into a
"failed to decrypt handshake packet" scenario:

1. Client requests a connection.
2. Gateway authorizes the connection.
3. Portal restarts / gets deployed. To my knowledge, this will rotate
the `preshared_key` to a new secret. Restarting the portal also cuts all
WebSockets and therefore, the Gateways response never arrives.
4. Client reconnects to the WebSocket, requests a new connection.
5. Gateway reuses the local connection but this connection still uses
the old `preshared_key`!
6. Client needs to wait for the Gateway's ICE timeout before it can
establish a new connection.

How exactly (3) happens doesn't matter. There are probably other
conditions as to where the WebSocket connections get cut and we cannot
complete our connection handshake.
2025-07-25 21:14:58 +00:00