At present, and as a result of how `connlib` evolved, we still implement
a `Poll`-based function for receiving data on our UDP socket. Ever since
we moved to dedicated threads for the UDP socket, we can directly block
on "block" on receiving datagrams and don't have to poll the socket.
This simplifies the implementation a fair bit. Additionally, it made me
reailise that we currently don't expose any errors on the UDP socket.
Likely, those will be ephemeral but it is still better than completely
silencing them.
In #9498, the Cloudflare response was updated to match what appears to
be a transient change on their end. It looks like this has changed
again, so to prevent this from breaking CI in the future we relax the
assertion.
It appears that Cloudflare changed the response that it is sending for
the 1.1.1.1 IP so we need to adapt our integration test for packet loops
in order to make CI pass.
Currently, the platform-specific code for controlling DNS resolution on
a system sits in `firezone-headless-client`. This code is also used by
the GUI client. This creates a weird compile-time dependency from the
GUI client to the headless client.
For other components that have platform-specific implementations, we use
the `firezone-bin-shared` crate. As a first step of resolving the
compile-time dependency, we move the `dns_control` module to
`firezone-bin-shared`.
We have been using buffer pools for a while all over `connlib` as a way
to efficiently use heap-allocated memory. This PR harmonizes the usage
of buffer pools across the codebase by introducing a dedicated
`bufferpool` crate. This crate offers a convenient and easy-to-use API
for all the things we (currently) need from buffer pools. As a nice
bonus of having it all in one place, we can now also track metrics of
how many buffers we have currently allocated.
An example output from the local metrics exporter looks like this:
```
Name : system.buffer.count
Description : The number of buffers allocated in the pool.
Unit : {buffers}
Type : Sum
Sum DataPoints
Monotonic : false
Temporality : Cumulative
DataPoint #0
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 96
Attributes :
-> system.buffer.pool.name: udp-socket-v6
-> system.buffer.pool.buffer_size: 65535
DataPoint #1
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 7
Attributes :
-> system.buffer.pool.buffer_size: 131600
-> system.buffer.pool.name: gso-queue
DataPoint #2
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 128
Attributes :
-> system.buffer.pool.name: udp-socket-v4
-> system.buffer.pool.buffer_size: 65535
DataPoint #3
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 8
Attributes :
-> system.buffer.pool.buffer_size: 1336
-> system.buffer.pool.name: ip-packet
DataPoint #4
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 9
Attributes :
-> system.buffer.pool.buffer_size: 1336
-> system.buffer.pool.name: snownet
```
Resolves: #8385
Correctly implementing asynchronous IO is notoriously hard. In order to
not drop packets in the process, one has to ensure a given socket is
ready to accept packets, buffer them if it is not case, suspend
everything else until the socket is ready and then continue.
Until now, we did this because it was the only option to run the UDP
sockets on the same thread as the actual packet processing. That in turn
was motivated by wanting to pass around references of the received
packets for processing. Rust's borrow-checker does not allow to pass
references between threads which forced us to have the sockets on the
same thread as the packet processing.
Like we already did in other places in `connlib`, this can be solved
through the use of buffer pools. Using a buffer pool, we can use heap
allocations to store the received packets without having to make a new
allocation every time we read new packets. Instead, we can have a
dedicated thread that is connected to `connlib`'s packet processing
thread via two channels (one for inbound and one for outbound packets).
These channels are bounded, which ensures backpressure is maintained in
case one of the two threads lags behind. These bounds also mean that we
have at most N buffers from the buffer pool in-flight (where N is the
capacity of the channel).
Within those dedicated threads, we can then use `async/await` notation
to suspend the entire task when a socket isn't ready for sending.
Resolves: #8000
From the perspective of any application, Firezone is a layer-3 network
and will thus use the host's networking stack to form IP packets for
whichever application protocol is in use (UDP, TCP, etc). These packets
then get encapsulated into UDP packets by Firezone and sent to a
Gateway.
As a result of this design, the IP header seen by the networking stacks
of the Client and the receiving service are not visible to any
intermediary along the network path of the Client and Gateway.
In case this network path is congested and middleboxes such as routers
need to drop packets, they will look at the ECN bits in the IP header
(of the UDP packet generated by a Client or Gateway) and flip a bit in
case the previous value indicated support for ECN (`0x01` or `0x10`).
When received by a network stack that supports ECN, seeing `0x11` means
that the network path is congested and that it must reduce its
send/receive windows (or otherwise throttle the connection).
At present, this doesn't work with Firezone because of the
aforementioned encapsulation of IP packets. To support ECN, we need to
therefore:
- Copy ECN bits from a received IP packet to the datagram that
encapsulates it: This ensures that if the Client's network stack support
ECN, we mirror that support on the wire.
- Copy ECN bits from a received datagram to the IP packet the is sent to
the TUN device: This ensures that if the "Congestion Experienced" bit
get set along the network path between Client and Gateway, we reflect
that accordingly on the IP packet emitted by the TUN device.
Resolves: #3758
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
In #8159, we introduced a regression that could lead to a deadlock when
shutting down the TUN device. Whilst we did close the channel prior to
awaiting the thread to exit, we failed to notice that _another_ instance
of the sender could be alive as part of an internally stored "sending
permit" with the `PollSender` in case another packet is queued for
sending. We need to explicitly call `abort_send` to free that.
Judging from the comment and a prior bug, this shutdown logic has been
buggy before. To further avoid this deadlock, we introduce two changes:
- The worker threads only receive a `Weak` reference to the
`wintun::Session`
- We move all device-related state into a dedicated `TunState` struct
that we can drop prior to joining the threads
The combination of these features means that all strong references to
channels and the session are definitely dropped without having to wait
for anything. To provide a clean and synchronous shutdown, we wait for
at most 5s on the worker-threads. If they don't exit until then, we log
a warning and exit anyway.
This should greatly reduce the risk of future bugs here because the
session (and thus the WinTUN device) gets shutdown in any case and so at
worst, we have a few zombie threads around.
Resolves: #8265
Using the clippy lint `unwrap_used`, we can automatically lint against
all uses of `.unwrap()` on `Result` and `Option`. This turns up quite a
few results actually. In most cases, they are invariants that can't
actually be hit. For these, we change them to `Option`. In other cases,
they can actually be hit. For example, if the user supplies an invalid
log-filter.
Activating this lint ensures the compiler will yell at us every time we
use `.unwrap` to double-check whether we do indeed want to panic here.
Resolves: #7292.
On Windows, the network notifier always notifies once at startup. We
make the DNS notifier and Linux match this behavior, and we assert it in
the unit test.
Part of a yak shave towards removing Tauri.