Commit Graph

922 Commits

Author SHA1 Message Date
Thomas Eizinger
bc2febed99 fix(connlib): use correct constant for truncating DNS responses (#7551)
In case an upstream DNS server responds with a payload that exceeds the
available buffer space of an IP packet, we need to truncate the
response. Currently, this truncation uses the **wrong** constant to
check for the maximum allowed length. Instead of the
`MAX_DATAGRAM_PAYLOAD`, we actually need to check against a limit that
is less than the MTU as the IP layer and the UDP layer both add an
overhead.

To fix this, we introduce such a constant and provide additional
documentation on the remaining ones to hopefully avoid future errors.
2024-12-19 17:15:43 +00:00
Thomas Eizinger
a1cf409af3 fix(connlib): clear all in-flight upstream DNS queries on reset (#7552)
When a Firezone Client roams, we reset all network connections and
rebind our local sockets. Doing that enables us to start from a clean
state and establish new connections to Gateways. What we are currently
not clearing are in-flight DNS queries. Those are all very likely to
fail because our network connection is changing. There is no point in us
keeping those around. Additionally, as part of roaming, it may also be
that our upstream DNS server changes and thus, we may suddenly receive a
response from a DNS server that we no longer know about.

Clear all in-flight DNS queries on reset solves this.
2024-12-18 20:35:30 +00:00
Thomas Eizinger
992b97e6a9 fix(connlib): bind new channel to peer if needed (#7548)
Initially, when we receive a new candidate from a remote peer, we bind a
channel for each remote address on the relay that we sampled. This
ensures that every possible communication path is actually functioning.
In ICE, all candidates are tried against each other, meaning the remote
will attempt to send from each of their candidates to every one of ours,
including our relay candidates. To allow this traffic, a channel needs
to be bound first.

For various reasons, an allocation might become stale or needs to be
otherwise invalidated. In that case, all the channel bindings are lost
but there might still be an active connection that wants to utilise
them. In that case, we will see "No channel" warnings like
https://firezone-inc.sentry.io/issues/6036662614/events/f8375883fd3243a4afbb27c36f253e23/.

To fix this, we use the attempt to encode a message for a channel as an
intent to bind a new one. This is deemed safe because wanting to encode
a message to a peer as a channel data message means we want such a
channel to exist. The first message here is still dropped but that is
better than not establishing the channel at all.
2024-12-18 17:15:17 +00:00
Thomas Eizinger
a80abec4ff refactor(connlib): remove unused branch in match (#7550)
When deciding what to do with a certain DNS query, we check whether the
domain name in question corresponds to any of the (wildcard) DNS
resource addresses. If yes, we resolve it to the resource ID of that
resource. The source of those resource IDs is the `dns_resources` map.

If we have looked up a `ResourceId` in that map, it is impossible for it
to not be "known" which means the branch deleted in this PR is
completely redundant and already covered by the catch-all branch where
`maybe_resource` is `None`.
2024-12-18 15:47:15 +00:00
Thomas Eizinger
62dfe65679 chore(connlib): improve error messages for failed translations (#7540) 2024-12-18 04:47:26 +00:00
Thomas Eizinger
8a1b6f26b4 fix(connlib): don't log warnings for unreachable errors (#7537)
When a Gateway or Client is running in an environment without IPv4 or
IPv6 connectivity, our initial probes for sending packets to the relays
will fail with network unreachable. That isn't a very big concern and
happens a lot in the wild. There is no need to report these as telemetry
events.

Resolves: #7514.
2024-12-17 17:59:20 +00:00
Thomas Eizinger
aa8c53a20d refactor(rust): use a buffer pool for network packets (#7489)
In order to achieve concurrency within `connlib`, we needed to create a
way for IP packets to own the piece of memory they are sitting in. This
allows us to concurrently read IP packets and them batch-process them
(as opposed to have a dedicated buffer and reference it). At the moment,
those IP packets are defined on the stack. With a size of ~1300 bytes
that isn't very large but still causes _some_ amount of copying.

We can avoid this copying by relying on a buffer pool:

1. When reading a new IP packet, we request a new buffer from the pool.
2. When the IP packet gets dropped, the buffer gets returned to the
pool.

This allows us to reuse an allocation for a packet once it finished
processing, resulting in less CPU time spent on copying around memory.

This causes us to make more _individual_ heap-allocations in the
beginning: Each packet is being processed by `connlib` is allocated on
the heap somewhere. At some point during the lifetime of the tunnel,
this will settle in an ideal state where we have allocated enough slots
to cover new packets whilst also reusing memory from packets that
finished processing already.

The actual `IpPacket` data type is now just a pointer. As a result, the
channels to and from the TUN thread (where we were holding multiple of
these packets) are now significantly smaller, leading to roughly the
same memory usage overall.

In my local testing on Linux, the client still only uses about ~15MB of
RAM even with multiple concurrent speedtests running.
2024-12-16 01:02:17 +00:00
Thomas Eizinger
0861ccaf06 chore(connlib): improve logging on missing flow (#7508)
Normally, there always be exactly on pending flow per resource. It
appears though that it can sometimes happen that we first request a flow
for a resource but by the time it is authorised, we've already cleared
its local state.

Regardless, this isn't a concerning error and not worth logging on WARN
(which happens one layer up).
2024-12-13 18:03:53 +00:00
Thomas Eizinger
61d6eceb29 chore(connlib): downgrade warning about missing DNS servers (#7509)
There is nothing we can do if the user doesn't have any DNS servers
defined. The default log level is INFO so a user reading the logs will
still come across this message in case they are trying to debug what is
happening.

Long term, problems like these would probably warrant some kind of
notification channel from `connlib` to the GUI where we can display
messages to the user.
2024-12-13 17:53:36 +00:00
Thomas Eizinger
7a33146997 chore(connlib): downgrade warning when disconnecting from relay (#7512)
There are several reasons why we can disconnect from a relay at runtime:

- STUN is blocked
- We have invalid credentials
- The TURN server is not protocol-conform

The first two are very much possible in production and there is nothing
we can do about them. When relays reboot, their credentials change and
if the Internet connection of a user / gateway gets cut, we may
disconnect from the relay because the messages get lost.

The last one should never happen if we are connected to our own relays.
Firezone can be self-hosted so ultimately, we don't have control over
what we are talking to. That error however is more of a safe-guard for
`connlib` itself to disconnect from the server as soon as it detects
that it is behaving weirdly.

None of these reasons are worth reporting to Sentry as a problem because
they aren't really fixable as such. It is more important that the user
sees them in the logs if they decide to dig into them which they will
still do on INFO level.
2024-12-13 17:52:59 +00:00
Thomas Eizinger
f30cc3226d fix(gateway): don't return error when client disconnected (#7504)
When a client disconnects, we clear up the connection on the gateway.
There might still be packets arriving from resources that we then cannot
route. This isn't worth returning an error.
2024-12-13 04:54:07 +00:00
Thomas Eizinger
7a478634a8 feat(connlib): buffer packets during connection and NAT setup (#7477)
At present, `connlib` will always drop all IP packets until a connection
is established and the DNS resource NAT is created. This causes an
unnecessary delay until the connection is working because we need to
wait for retransmission timers of the host's network stack to resend
those packets.

With the new idempotent control protocol, it is now much easier to
buffer these packets and send them to the gateway once the connection is
established.

The buffer sizes are chosen somewhat conservatively to ensure we don't
consume a lot of memory. The hypothesis here is that every protocol -
even if the transport layer is unreliable like UDP - will start with a
handshake involving only one or at most a few packets and waiting for a
reply before sending more. Thus, as long as we can set up a connection
quicker than the re-transmit timer in the host's network stack,
buffering those packets should result in no packet loss. Typically,
setting up a new connection takes at most 500ms which should be fast
enough to not trigger any re-transmits.

Resolves: #3246.
2024-12-12 11:40:38 +00:00
Thomas Eizinger
7e38d3caee chore(connlib): downgrade warning about failed flow (#7480) 2024-12-11 19:01:37 +00:00
Thomas Eizinger
81f71cba62 fix(telemetry): use package@version notation for releases (#7466)
In order for Sentry to parse our releases as semver, they need to be in
the form of `package@version` [0]. Without this, the feature of "Mark
this issue as resolved in the _next_ version" doesn't work properly
because Sentry compares the versions as to when it first saw them vs
parsing the semver string itself. We test versions prior to releasing
them, meaning Sentry learns about a 1.4.0 version before it is actually
released. This causes false-positive "regressions" even though they are
fixed in a later (as per semver) release.

This create some redundancy with the different DSNs that we are already
using. I think it would make sense to consider merging the two projects
we have for the GUI client for example. That is really just one project
that happens to run as two binaries.

For all other projects, I think the separation still makes sense because
we e.g. may add Sentry to the "host" applications of Android and
MacOS/iOS as well. For those, we would reuse the DSN and thus funnel the
issues into the same Sentry project.

As per Sentry's docs, releases are organisation-wide and therefore need
a package identifier to be grouped correctly.

[0]:
https://docs.sentry.io/platforms/javascript/configuration/releases/#bind-the-version
2024-12-09 05:04:45 +00:00
Thomas Eizinger
ddce9312ea fix(android): apply new log-filter on repeated connect call (#7461)
Related: #7460.
Resolves: #5634.
2024-12-06 04:45:28 +00:00
Thomas Eizinger
6115f662cf fix(apple): only initialise global logger once (#7460)
From within the FFI code, we have no control over the lifecycle of the
host application and `connect` may be called multiple times from within
the same process. Therefore, we cannot rely on the global logger state
to **not** be set when `connect` gets called.

To fix this, we cache the handles for the file logger and a
reload-handle for the log filter in a `static` variable. This allows us
to apply the new log-filter of a repeated `connect` call to the existing
logger, even if `connect` is called multiple times from the same
process.
2024-12-06 04:44:41 +00:00
Thomas Eizinger
90cf191a7c feat(linux): multi-threaded TUN device operations (#7449)
## Context

At present, we only have a single thread that reads and writes to the
TUN device on all platforms. On Linux, it is possible to open the file
descriptor of a TUN device multiple times by setting the
`IFF_MULTI_QUEUE` option using `ioctl`. Using multi-queue, we can then
spawn multiple threads that concurrently read and write to the TUN
device. This is critical for achieving a better throughput.

## Solution

`IFF_MULTI_QUEUE` is a Linux-only thing and therefore only applies to
headless-client, GUI-client on Linux and the Gateway (it may also be
possible on Android, I haven't tried). As such, we need to first change
our internal abstractions a bit to move the creation of the TUN thread
to the `Tun` abstraction itself. For this, we change the interface of
`Tun` to the following:

- `poll_recv_many`: An API, inspired by tokio's `mpsc::Receiver` where
multiple items in a channel can be batch-received.
- `poll_send_ready`: Mimics the API of `Sink` to check whether more
items can be written.
- `send`: Mimics the API of `Sink` to actually send an item.

With these APIs in place, we can implement various (performance)
improvements for the different platforms.

- On Linux, this allows us to spawn multiple threads to read and write
from the TUN device and send all packets into the same channel. The `Io`
component of `connlib` then uses `poll_recv_many` to read batches of up
to 100 packets at once. This ties in well with #7210 because we can then
use GSO to send the encrypted packets in single syscalls to the OS.
- On Windows, we already have a dedicated recv thread because `WinTun`'s
most-convenient API uses blocking IO. As such, we can now also tie into
that by batch-receiving from this channel.
- In addition to using multiple threads, this API now also uses correct
readiness checks on Linux, Darwin and Android to uphold backpressure in
case we cannot write to the TUN device.

## Configuration

Local testing has shown that 2 threads give the best performance for a
local `iperf3` run. I suspect this is because there is only so much
traffic that a single application (i.e. `iperf3`) can generate. With
more than 2 threads, the throughput actually drops drastically because
`connlib`'s main thread is too busy with lock-contention and triggering
`Waker`s for the TUN threads (which mostly idle around if there are 4+
of them). I've made it configurable on the Gateway though so we can
experiment with this during concurrent speedtests etc.

In addition, switching `connlib` to a single-threaded tokio runtime
further increased the throughput. I suspect due to less task / context
switching.

## Results

Local testing with `iperf3` shows some very promising results. We now
achieve a throughput of 2+ Gbit/s.

```
Connecting to host 172.20.0.110, port 5201
Reverse mode, remote host 172.20.0.110 is sending
[  5] local 100.80.159.34 port 57040 connected to 172.20.0.110 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   274 MBytes  2.30 Gbits/sec
[  5]   1.00-2.00   sec   279 MBytes  2.34 Gbits/sec
[  5]   2.00-3.00   sec   216 MBytes  1.82 Gbits/sec
[  5]   3.00-4.00   sec   224 MBytes  1.88 Gbits/sec
[  5]   4.00-5.00   sec   234 MBytes  1.96 Gbits/sec
[  5]   5.00-6.00   sec   238 MBytes  2.00 Gbits/sec
[  5]   6.00-7.00   sec   229 MBytes  1.92 Gbits/sec
[  5]   7.00-8.00   sec   222 MBytes  1.86 Gbits/sec
[  5]   8.00-9.00   sec   223 MBytes  1.87 Gbits/sec
[  5]   9.00-10.00  sec   217 MBytes  1.82 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.30 GBytes  1.98 Gbits/sec  22247             sender
[  5]   0.00-10.00  sec  2.30 GBytes  1.98 Gbits/sec                  receiver

iperf Done.
```

This is a pretty solid improvement over what is in `main`:

```
Connecting to host 172.20.0.110, port 5201
[  5] local 100.65.159.3 port 56970 connected to 172.20.0.110 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  90.4 MBytes   758 Mbits/sec  1800    106 KBytes
[  5]   1.00-2.00   sec  93.4 MBytes   783 Mbits/sec  1550   51.6 KBytes
[  5]   2.00-3.00   sec  92.6 MBytes   777 Mbits/sec  1350   76.8 KBytes
[  5]   3.00-4.00   sec  92.9 MBytes   779 Mbits/sec  1800   56.4 KBytes
[  5]   4.00-5.00   sec  93.4 MBytes   783 Mbits/sec  1650   69.6 KBytes
[  5]   5.00-6.00   sec  90.6 MBytes   760 Mbits/sec  1500   73.2 KBytes
[  5]   6.00-7.00   sec  87.6 MBytes   735 Mbits/sec  1400   76.8 KBytes
[  5]   7.00-8.00   sec  92.6 MBytes   777 Mbits/sec  1600   82.7 KBytes
[  5]   8.00-9.00   sec  91.1 MBytes   764 Mbits/sec  1500   70.8 KBytes
[  5]   9.00-10.00  sec  92.0 MBytes   771 Mbits/sec  1550   85.1 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   917 MBytes   769 Mbits/sec  15700             sender
[  5]   0.00-10.00  sec   916 MBytes   768 Mbits/sec                  receiver

iperf Done.
```
2024-12-05 00:18:20 +00:00
Thomas Eizinger
48bd0f9804 chore: bump client versions to 1.4.0 (#7092)
In order to release the new control protocol to users, we need to bump
the versions of the clients to 1.4.0. The portal has a version gate to
only select gateways with version >= 1.4.0 for clients >= 1.4.0. Thus,
bumping these versions can only happen once testing has completed and
the gateway has actually been released as 1.4.0.

Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
2024-12-04 19:48:51 +00:00
Thomas Eizinger
b802021cc4 feat(connlib): implement idempotent control protocol for client (#6942)
Building on top of the gateway PR (#6941), this PR transitions the
clients to the new control protocol. Clients are **not**
backwards-compatible with old gateways. As a result, a certain customer
environment MUST have at least one gateway with the above PR running in
order for clients to be able to establish connections.

With this transition, Clients send explicit events to Gateways whenever
they assign IPs to a DNS resource name. The actual assignment only
happens once and the IPs then remain stable for the duration of the
client session.

When the Gateway receives such an event, it will perform a DNS
resolution of the requested domain name and set up the NAT between the
assigned proxy IPs and the IPs the domain actually resolves to. In order
to support self-healing of any problems that happen during this process,
the client will send an "Assigned IPs" event every time it receives a
DNS query for a particular domain. This in turn will trigger another DNS
resolution on the Gateway. Effectively, this means that DNS queries for
DNS resources propagate to the Gateway, triggering a DNS resolution
there. In case the domain resolves to the same set of IPs, no state is
changed to ensure existing connections are not interrupted.

With this new functionality in place, we can delete the old logic around
detecting "expired" IPs. This is considered a bugfix as this logic isn't
currently working as intended. It has been observed multiple times that
the Gateway can loop on this behaviour and resolving the same domain
over and over again. The only theoretical "incompatibility" here is that
pre-1.4.0 clients won't have access to this functionality of triggering
DNS refreshes on a Gateway 1.4.2+ Gateway. However, as soon as this PR
merges, we expect all admins to have already upgraded to a 1.4.0+
Gateway anyway which already mandates clients to be on 1.4.0+.

Resolves: #7391.
Resolves: #6828.
2024-12-04 12:05:35 +00:00
Jamil
15e75f80ba fix(apple/ios): Expose IPHONEOS_DEPLOYMENT_TARGET to tell rustc our iOS version (#7453)
Fixes a similar issue as #7443 where we were deleting the
`IPHONEOS_DEPLOYMENT_TARGET` variable in our Rust build script, which
caused lots of warnings about building for a different OS than being
linked against.
2024-12-03 14:12:20 -08:00
Thomas Eizinger
dd6b52b236 chore(rust): share edition key via workspace table (#7451) 2024-12-03 00:28:06 +00:00
Thomas Eizinger
9073bddaef fix(gateway): translate ICMP destination unreachable errors (#7398)
## Context

The Gateway implements a stateful NAT that translates the destination IP
and source protocol of every packet that targets a DNS resource IP. This
is necessary because the IPs for DNS resources are generated on the
client without actually performing a DNS lookup, instead it always
generates 4 IPv4 and 4 IPv6 addresses. On the Gateway, these IPs are
then assigned in a round-robin fashion to the actual IPs that the domain
resolves to, necessitating a NAT64/46 translation in case a domain only
resolves to IPs of one family.

A domain may resolve to a set of IPs but not all of these IPs may be
routable. Whilst an arguably poor practise of the domain administrator,
routing problems can occur for all kinds of reasons and are well handled
on the wider Internet.

When an IP packet cannot be routed further, the current routing node
generates an ICMP error describing the routing failure and sends it back
to the original sender. ICMP is a layer 4 protocol itself, same as TCP
and UDP. As such, sending out a UDP packet may result in receiving an
ICMP response. In order to allow the sender to learn, which packet
failed to route, the ICMP error embeds parts of the original packet in
its payload [0] [1].

The Gateway's NAT table uses parts of the layer 4 protocol as part of
its key; the UDP and TCP source port and the ICMP echo request
identifier (further referred to as "source protocol"). An ICMP error
message doesn't have any of these, meaning the lookup in the NAT table
currently fails and the ICMP error is silently dropped.

A lot of software implements a happy-eyeballs approach and probs for
IPv6 and IPv4 connectivity simulataneously. The absence of the ICMP
errors confuses that algorithm as it detects the packet loss and starts
retransmits instead of giving up.

## Solution

Upon receiving an ICMP error on the Gateway, we now extract the
partially embedded packet in the ICMP error payload. We use the
destination IP and source protocol of _that_ packet for the lookup in
the NAT table. This returns us the original (client-assigned)
destination IP and source protocol. In order for the Gateway's NAT to be
transparent, we need to patch the packet embedded in the ICMP error to
use the original destination and source protocol. We also have to
account for the fact that the original packet may have been translated
with NAT64/46 and translate it back. Finally, we generate an ICMP error
with the appropriate code and embed the patched packet in its payload.

## Test implementation

To test that this works for all kind of combinations, we extend
`tunnel_test` to sample a list of unreachable IPs from all IPs sampled
for DNS resources. Upon receiving a packet for one of these IPs, the
Gateway will send an ICMP error back instead of invoking its regular
echo reply logic. On the client-side, upon receiving an ICMP error, we
extract the originally failed packet from the body and treat it as a
successful response.

This may seem a bit hacky at first but is actually how operating systems
would treat ICMP errors as well. For example, a `TcpSocket::connect`
call (triggering a TCP SYN packet) may fail with an IO error if we
receive an ICMP error packet. Thus, in a way, the original packet got
answered, just not with what we expected.

In addition, by treating these ICMP errors as responses to the original
packet, we automatically perform other assertions on them, like ensuring
that they come from the right IP address, that there are no unexpected
packets etc.

## Test alternatives

It is tricky to solve this in other ways in the test suite because at
the time of generating a packet for a DNS resource, we don't know the
actual IP that is being targeted by a certain proxy IP unless we'd start
reimplementing the round-robin algorithm employed by the Gateway. To
"test" the transparency of the NAT, we'd like to avoid knowing about
these implementation details in the test.

## Future work

In this PR, we currently only deal with "Destination Unreachable" ICMP
errors. There are other ICMP messages such as ICMPv6's `PacketTooBig` or
`ParameterProblem`. We should eventually handle these as well. They are
being deferred because translating those between the different IP
versions is only partially implemented and would thus require more work.
The most pressing need is to translate destination unreachable errors to
enable happy-eyeballs algorithms to work correctly.

Resolves: #5614.
Resolves: #6371.

[0]: https://www.rfc-editor.org/rfc/rfc792
[1]: https://www.rfc-editor.org/rfc/rfc4443#section-3.1
2024-12-02 23:07:41 +00:00
Jamil
e1ed497d12 fix(apple): Expose MACOSX_DEPLOYMENT_TARGET in rust apple build script to signal to rustc which macOS to target (#7443)
`MACOSX_DEPLOYMENT_TARGET` is a standard env var read by gcc and rustc
that determines which version of macOS to target binaries for. This
variable was being removed inadvertently in our rust build script.

Exposing this var fixes a slew of warnings about object files being
built for a newer macOS target than being linked that were showing up in
Xcode for some time now.

Hasn't caused any issues thus far from what I can tell, but possibly
related to #7442
2024-12-02 17:27:11 +00:00
Thomas Eizinger
0a6554122a feat(connlib): utilise GSO for UDP sockets (#7210)
## Context

At present, `connlib` sends UDP packets one at a time. Sending a packet
requires us to make a syscall which is quite expensive. Under load, i.e.
during a speedtest, syscalls account for over 50% of our CPU time [0].
In order to improve this situation, we need to somehow make use of GSO
(generic segmentation offload). With GSO, we can send multiple packets
to the same destination in a single syscall.

The tricky question here is, how can we achieve having multiple UDP
packets ready at once so we can send them in a single syscall? Our TUN
interface only feeds us packets one at a time and `connlib`'s state
machine is single-threaded. Additionally, we currently only have a
single `EncryptBuffer` in which the to-be-sent datagram sits.

## 1. Stack-allocating encrypted IP packets

As a first step, we get rid of the single `EncryptBuffer` and instead
stack-allocate each encrypted IP packet. Due to our small MTU, these
packets are only around 1300 bytes. Stack-allocating that requires a few
memcpy's but those are in the single-digit % range in the terms of CPU
time performance hit. That is nothing compared to how much time we are
spending on UDP syscalls. With the `EncryptBuffer` out the way, we can
now "freely" move around the `EncryptedPacket` structs and - technically
- we can have multiple of them at the same time.

## 2. Implementing GSO

The GSO interface allows you to pass multiple packets **of the same
length and for the same destination** in a single syscall, meaning we
cannot just batch-up arbitrary UDP packets. Counterintuitively, making
use of GSO requires us to do more copying: In particular, we change the
interface of `Io` such that "sending" a packet performs essentially a
lookup of a `BytesMut`-buffer by destination and packet length and
appends the payload to that packet.

## 3. Batch-read IP packets

In order to actually perform GSO, we need to process more than a single
IP packet in one event-loop tick. We achieve this by batch-reading up to
50 IP packets from the mpsc-channel that connects `connlib`'s main
event-loop with the dedicated thread that reads and writes to the TUN
device. These reads and writes happen concurrently to `connlib`'s packet
processing. Thus, it is likely that by the time `connlib` is ready to
process another IP packet, multiple have been read from the device and
are sitting in the channel. Batch-processing these IP packets means that
the buffers in our `GsoQueue` are more likely to contain more than a
single datagram.

Imagine you are running a file upload. The OS will send many packets to
the same destination IP and likely max MTU to the TUN device. It is
likely, that we read 10-20 of these packets in one batch (i.e. within a
single "tick" of the event-loop). All packets will be appended to the
same buffer in the `GsoQueue` and on the next event-loop tick, they will
all be flushed out in a single syscall.

## Results

Overall, this results in a significant reduction of syscalls for sending
UDP message. In [1], we spend only a total of 16% of our CPU time in
`udpv6_sendmsg` whereas in [0] (main), we spent a total of 34%. Do note
that these numbers are relative to the total CPU time spent per program
run and thus can't be compared directly (i.e. you cannot just do 34 - 16
and say we now spend 18% less time sending UDP packets). Nevertheless,
this appears to be a great improvement.

In terms of throughput, we achieve a ~60% improvement in our benchmark
suite. That one is running on localhost though so it might not
necessarily be reflect like that in a real network.

[0]: https://share.firefox.dev/4hvoPju
[1]: https://share.firefox.dev/4frhCPv
2024-12-02 01:09:44 +00:00
Thomas Eizinger
5f4816ee46 fix(connlib): don't warn on non-UDP packet to DNS resolver IP (#7418)
Windows appears to sometimes send ICMP (error?) packets to our DNS
resolver IPs. There is nothing we can do with these but the current code
path logs them as a warning because we expect everything that isn't TCP
to be UDP.

With this patch, we slightly change the parsing logic to first attempt
extracting the UDP packet and fail only with a DEBUG log if it isn't.
2024-12-01 16:01:42 +00:00
Thomas Eizinger
a3e3d4cac5 fix(gateway): filter packets not destined for a client (#7417)
This causes unnecessary logs higher up the stack.
2024-12-01 15:59:56 +00:00
Thomas Eizinger
932f6791fb fix(phoenix-channel): lazily create backoff timer (#7414)
Our `phoenix-channel` component is responsible for maintaining a
WebSocket connection to the portal. In case that connection fails, we
want to reconnect to it using an exponential backoff, eventually giving
up after a certain amount of time.

Unfortunately, the code we have today doesn't quite do that. An
`ExponentialBackoff` has a setting for the `max_elapsed_time`.
Regardless of how many and how often we retry something, we won't ever
wait longer than this amount of time. For the Relay, this is set to
15min. For other components its indefinite (Gateway, headless-client),
or very long (30 days for Android, 1 day for Apple).

The point in time from which this duration is counted is when the
`ExponentialBackoff` is **constructed** which translates to when we
**first** connected to the portal. As a result, our backoff would
immediately fail on the first error if it has been longer than
`max_elapsed_time` since we first connected. For most components, this
codepath is not relevant because the `max_elapsed_time` is so long. For
the Relay however, that is only 15 minutes so chances are, the Relay
would immediately fail (and get rebooted) on the first connection error
with the portal.

To fix this, we now lazily create the `ExponentialBackoff` on the first
error.

This bug has some interesting consequences: When a relay reboots, it
looses all its state, i.e. allocations, channel bindings, available
nonces etc, stamp-secret. Thus, all credentials and state that got
distributed to Clients and Gateways get invalidated, causing disconnects
from the Relay. We have observed these alerts in Sentry for a while and
couldn't explain them. Most likely, this is the root cause for those
because whilst a Relay disconnects, the portal also cannot detect its
presence and pro-actively inform Clients and Gateways to no longer use
this Relay.
2024-11-29 20:19:11 +00:00
Thomas Eizinger
c6e7e6192e build(rust): bump Rust to 1.83 (#7409)
Rust 1.83 comes with a bunch of new lints for elidible lifetimes. Those
also trigger in the generated code of `derivative`. That crate is
actually unmaintained so we replace our usages of it with `derive_more`.
2024-11-29 01:04:06 +00:00
Thomas Eizinger
e46cb3f62b chore(snownet): improve log when MessageIntegrity is missing (#7399) 2024-11-29 00:27:53 +00:00
Thomas Eizinger
3ccf795195 test(connlib): don't waste shrinking time & cycles on IDs (#7402)
When `proptest` discovers a test failure, it will attempt to "shrink"
the input to identify, what exactly causes the issue. How this is done
depends on the data type but mostly performs things such as binary
search to be efficient. Not every input within our tests is relevant for
a failure. For example, which ID we have sampled for a client or a
gateway doesn't at all affect whether or not the test will fail.
`proptest` doesn't know that though so it will still happily spend
shrinking time and cycles on figuring out the minimal difference in IDs
(which is 1 because they have to be different). This is a huge waste of
time for no benefit. We are getting much more value out of `proptest` if
it tries to adjust other things such as the transitions involved in a
test, how many gateways and relays there are etc.

By marking the strategies for the IDs and private keys with `no_shrink`,
we can achieve that.
2024-11-28 20:45:34 +00:00
Thomas Eizinger
fd337dd465 test: reduce number of local rejects for generating IPs (#7401)
When generating random input data in property-based tests, we have to
ensure that the data conforms to certain criteria. For example, IP
addresses must not be multicast or unspecified addresses and they must
not be within our reserved IP ranges.

Currently, we ensure this using "filtering" which is a pretty poor
technique [0]. To improve on this, we refactor the generation of IPs to
automatically exclude all IPs within certain ranges. On very big
test-runs (i.e. > 30000 test cases), too many local rejections lead to
the test suite being aborted early.

[0]:
https://proptest-rs.github.io/proptest/proptest/tutorial/filtering.html
2024-11-28 20:45:02 +00:00
Thomas Eizinger
2c26fc9c0e ci: lint Rust dependencies using cargo deny (#7390)
One of Rust's promises is "if it compiles, it works". However, there are
certain situations in which this isn't true. In particular, when using
dynamic typing patterns where trait objects are downcast to concrete
types, having two versions of the same dependency can silently break
things.

This happened in #7379 where I forgot to patch a certain Sentry
dependency. A similar problem exists with our `tracing-stackdriver`
dependency (see #7241).

Lastly, duplicate dependencies increase the compile-times of a project,
so we should aim for having as few duplicate versions of a particular
dependency as possible in our dependency graph.

This PR introduces `cargo deny`, a linter for Rust dependencies. In
addition to linting for duplicate dependencies, it also enforces that
all dependencies are compatible with an allow-list of licenses and it
warns when a dependency is referred to from multiple crates without
introducing a workspace dependency. Thanks to existing tooling
(https://github.com/mainmatter/cargo-autoinherit), transitioning all
dependencies to workspace dependencies was quite easy.

Resolves: #7241.
2024-11-22 00:17:28 +00:00
Thomas Eizinger
56db250e2c feat(connlib): validate integrity of all relay responses (#7378)
In order to avoid processing of responses of relays that somehow got
altered on the network path, we now use the client's `password` as a
shared secret for the relay to also authenticate its responses. This
means that not all message can be authenticated. In particular, BINDING
requests will still be unauthenticated.

Performing this validation now requires every component that crafts
input to the `Allocation` to include a valid `MessageIntegrity`
attribute. This is somewhat problematic for the regression tests of the
relay and the unit tests of `Allocation`. In both cases, we implement
workarounds so we don't have to actually compute a valid
`MessageIntegrity`. This is deemed acceptable because:

- Both of these are just tests.
- We do test the validation path using `tunnel_test` because there we
run an actual relay.
2024-11-19 18:32:33 +00:00
Thomas Eizinger
ecec00afed chore(snownet): print attributes for all requests and responses (#7380)
When debugging issues related to our TURN allocation code, we sometimes
only have the logs that code submitted to Sentry. As part of the event,
we submit the last 500 debug logs as breadcrumbs to give more context to
the error.

Unconditionally printing the attributes of each request-response pair
will help us in more easily diagnosing, why certain errors happen.
2024-11-19 14:32:23 +00:00
Thomas Eizinger
e8519cca0c chore(snownet): warn on exceeding number of candidate pairs (#7376)
In the latest version, we added a warning log to str0m when the maximum
number of candidate pairs is exceeded:
https://github.com/algesten/str0m/pull/587.

We only ever add the candidates of a single relay to an agent (2
candidates), plus at most 2 server-reflexive candidates and at most 2
host candidates. Unless there is a bug like what we fixed in #7334,
exceeding the default number of candidate _pairs_ (100) should never
happen.

In case it does, the newly added `warn` log in `str0m` will trigger a
Sentry alert.
2024-11-19 04:34:23 +00:00
Thomas Eizinger
de35bb067e fix(telemetry): don't embed errors values in telemetry_event! (#7366)
Due to https://github.com/getsentry/sentry-rust/issues/702, errors which
are embedded as `tracing::Value` unfortunately get silently discarded
when reported as part of Sentry "Event"s and not "Exception"s.

The design idea of these telemetry events is that they aren't fatal
errors so we don't need to treat them with the highest priority. They
may also appear quite often, so to save performance and bandwidth, we
sample them at a rate of 1% at creation time.

In order to not lose the context of these errors, we instead format them
into the message. This makes them completely identical to the `debug!`
logs which we have on every call-site of `telemetry_event!` which
prompted me to make that implicit as part of creating the
`telemetry_event!`.

Resolves: #7343.
2024-11-18 18:17:08 +00:00
Thomas Eizinger
d9fb9e53c8 chore(snownet): print error code for unhandled message (#7367)
All our logic for handling errors is based on the error code. Even
though there should be a 1:1 mapping between error code and reason
phrase, I am seeing odd reports in Sentry for a case that we should be
handling but aren't.
2024-11-18 18:15:04 +00:00
Thomas Eizinger
9536b8116c fix: don't exit TUN thread on errors (#7354)
I noticed that in case there is an error when reading from the TUN
device, we currently exit that thread and we don't have a mechanism at
the moment to restart it. Discarding the thread also means we can no
longer send new instances of `Tun` into it.

Instead of exiting the thread, we now just log the error and continue.
In case the error was caused by the FD being closed, we discard the
instance of `Tun` and wait for a new one.
2024-11-16 06:19:41 +00:00
Thomas Eizinger
4e423dc51c fix(connlib): send all unwritten packets before reading new ones (#7342)
With the parallelisation of TUN and UDP operations, we lost
backpressure: Packets can now be read quicker from the UDP sockets than
they can be sent out the TUN device, causing packet loss in extremely
high-throughput situations.

To avoid this, we don't directly send packets into the channel to the
TUN device thread. This channel is bounded, meaning sending can fail if
reading UDP packets is faster than writing packets to the TUN device.

Due to GRO, we may read multiple UDP packets in one go, requiring us to
write multiple IP packets to the TUN device as part of a single
iteration in the event-loop. Thus, we cannot know, how much space we
need in the channel for outgoing IP packets.

By introducing a dedicated buffer, we can temporarily hold on to all of
these packets and on the next call to `poll`, we flush them out into the
channel. If the channel is full, we will suspend and only continue once
there is space in the channel. This behaviour restores backpressue
because we won't read UDP packets from the socket unless we have space
to write the corresponding packet to the TUN device.

UDP itself actually doesn't have any backpressure, instead the packets
will simply get dropped once the receive buffer overflows. The UDP
packets however carry encrypted IP packets, meaning whatever protocol
sits inside these packets will detect the packet loss and should
throttle their sending-pace accordingly.
2024-11-14 06:25:03 +00:00
Thomas Eizinger
8c5a5fa690 chore(rust): correctly disable ANSI escapes globally (#7336)
I think I finally understood and correctly traced, where the use of ANSI
escape codes came from. It turns out, the `with_ansi` switch on
`tracing_subscriber::fmt::Layer` is what you want to toggle. From there,
it trickles down to the `Writer` which we can then test for in our
`Format`.

Resolves: #7284.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2024-11-14 05:00:53 +00:00
Thomas Eizinger
efeba55709 chore(snownet): fail TURN connection on unknown attribute (#7341)
A TURN server that doesn't understand certain attributes should return
"Unknown attributes" as part of its response. Whilst we aim to be as
spec-compliant as possible, Firezone doesn't officially support other
TURN servers than our own relay.

If we encounter a TURN server that sends us an "Unknown attribute", we
now immediately fail this allocation and clear it as we cannot make any
more assumptions about what the connected relay actually supports.
2024-11-14 02:43:17 +00:00
Thomas Eizinger
3cf5cbb989 chore(connlib): only send some tunnel errors to Sentry (#7340)
Errors from the tunnel can potentially happen on a per-packet basis. In
order to not flood Sentry, reduce the log-level down to `debug` and only
report 1% of all errors. We did the same thing for the gateway in #7299.
2024-11-14 02:32:37 +00:00
Thomas Eizinger
00c7c42113 fix(snownet): don't allow duplicate server-reflexive candidates (#7334)
In #7163, we introduced a shared cache of server-reflexive candidates
within a `snownet::Node`. What we unfortunately overlooked is that if a
node (i.e. a client or a gateway) is behind symmetric NAT, then we will
repeatedly create "new" server-reflexive candiates, thereby filling up
this cache.

This cache is used to initialise the agents with local candidates, which
manifests in us sending dozens if not hundreds of candidates to the
other party. Whilst not harmful in itself, it does create quite a lot of
spam. To fix this, we introduce a limit of only keeping around 1
server-reflexive candidate per IP version, i.e. only 1 IPv4 and IPv6
address.

At present, `connlib` only supports a single egress interface meaning
for now, we are fine with making this assumption.

In case we encounter a new candidate of the same kind and same IP
version, we evict the old one and replace it with the new one. Thus, for
subsequent connections, only the new candidate is used.
2024-11-14 00:14:29 +00:00
Thomas Eizinger
3dd913f6df fix(snownet): emit correct event on invalidating srflx candidate (#7333)
This one has been lurking in the codebase for a while. Fortunately, it
is not very critical because invalidation of server-reflexive addresses
happens pretty rarely.
2024-11-13 20:12:20 +00:00
Thomas Eizinger
7e0d2ca59c chore: add telemetry event in case we see large datagrams (#7335)
If we see these, something fishy is going on (see #7332), so we should
definitely know about these by recording Sentry events. These can
potentially be per packet so we only send a telemetry event which gets
sampled at a rate of 1%.
2024-11-13 20:09:58 +00:00
Thomas Eizinger
48ba2869a8 chore(rust): ban the use of .unwrap except in tests (#7319)
Using the clippy lint `unwrap_used`, we can automatically lint against
all uses of `.unwrap()` on `Result` and `Option`. This turns up quite a
few results actually. In most cases, they are invariants that can't
actually be hit. For these, we change them to `Option`. In other cases,
they can actually be hit. For example, if the user supplies an invalid
log-filter.

Activating this lint ensures the compiler will yell at us every time we
use `.unwrap` to double-check whether we do indeed want to panic here.

Resolves: #7292.
2024-11-13 03:59:22 +00:00
Thomas Eizinger
0e20f7d086 chore(connlib): better error reporting for invalid IP packets (#7320)
Currently, we don't report very detailed errors when we fail to parse
certain IP packets. With this patch, we use `Result` in more places and
also extend the validation of IP packets to:

a) enforce a length of at most 1280 bytes. This should already be the
case due to our MTU but bad things may happen if that is off for some
reason
b) validate the entire IP packet instead of just its header
2024-11-12 19:46:32 +00:00
Thomas Eizinger
19f51568c2 chore(rust): don't pass errors as values for debug logs (#7318)
Our logging library `tracing` supports structured logging. Structured
logging means we can include values within a `tracing::Event` without
having to immediately format it as a string. Processing these values -
such as errors - as their original type allows the various `tracing`
layers to capture and represent them as they see fit.

One of these layers is responsible for sending ERROR and WARN events to
Sentry, as part of which `std::error::Error` values get automatically
captured as so-called "sentry exceptions".

Unfortunately, there is a caveat: If an `std::error::Error` value is
included in an event that does not get mapped to an exception, the
`error` field is completely lost. See
https://github.com/getsentry/sentry-rust/issues/702 for details.

To work around this, we introduce a `err_with_sources` adapter that an
error and all its sources together into a string. For all
`tracing::debug!` statements, we then use this to report these errors.

It is really unfortunate that we have to do this and cannot use the same
mechanism, regardless of the log level. However, until this is fixed
upstream, this will do and gives us better information in the log
submitted to Sentry.
2024-11-12 04:00:02 +00:00
Thomas Eizinger
d38304b21f build(rust): depend on our boringtun fork (#7120)
This switches our dependency on `boringtun` over to our fork at
https://github.com/firezone/boringtun. The idea of the fork is to
carefully only patch selective parts such that upstream things later is
still possible. The complete diff can be seen here:
https://github.com/cloudflare/boringtun/compare/master...firezone:boringtun:master

So far, the only patches in the fork are dependency bumps, linter fixes,
adjustments to log levels and the removal of panics when the destination
buffer is too small.
2024-11-12 03:40:36 +00:00
Thomas Eizinger
237865c37b test(connlib): drain all Transmits at the end of advance (#7315)
Within our test suite, we "spin" for several (simulated) seconds after
each state transition to allow for packets being sent between the
different nodes. The test suite simulates different latencies by
delaying the delivery of some of these packets.

`connlib` has several timers for sending packets, i.e. STUN bindings, WG
keep-alives etc. These timers never end so we cannot simply spin "until
we no longer want to send any packets". Currently, we simply hard-stop
after a few seconds and drop the remaining packets and move on to the
next state transition.

At present, this isn't an issue because only our ICE agent adheres to
the simulated time advancement. `boringtun` is still impure and thus we
usually don't get to see any of the WireGuard packets like keep-alives
and session timeouts etc in our tests. The STUN messages are pretty
resilient to retransmissions so the current packet drop doesn't matter.

In the process of adopting our boringtun fork
(https://github.com/firezone/boringtun) where we will eventually fix the
time impurity, dropping some of these packets caused problems.

To fix this, we now drain all remaining packets that are sitting in the
"yet-to-be-delivered" buffer. These packets are delivered to an "inbox"
that is per-host, meaning the host (i.e. client, gateway or relay) will
still perceive the incoming packet with the correct latency.

We extract this functionality from #7120 because it is generally useful.
2024-11-12 03:19:07 +00:00