At present, listening for DNS server change and network change events is
handled in the GUI client. Upon an event, a message is sent to the
tunnel service which then applies the new state to `connlib`.
We can avoid some of this boilerplate by moving these listeners to the
tunnel service as part of the handler. As a result, we get a few
improvements:
- We don't need to ignore these events if we don't have a session
because the lifetime of these listeners is tied to the IPC handler on
the service side.
- We need fewer IPC messages
- We can retry the connection directly from within the tunnel service in
case we have no Internet at the time of startup
- We can more easily model out the state machine of a connlib session in
the tunnel service
- On Linux, this means we no longer shell out to `resolvectl` from the
GUI process, unifying access to the "resolvers" from the tunnel service
- On Windows, we no longer need admin privileges on the GUI client for
optimized network-change detection. This now happens in the Tunnel
process which already runs as admin.
Resolves: #9465
In #9498, the Cloudflare response was updated to match what appears to
be a transient change on their end. It looks like this has changed
again, so to prevent this from breaking CI in the future we relax the
assertion.
It appears that Cloudflare changed the response that it is sending for
the 1.1.1.1 IP so we need to adapt our integration test for packet loops
in order to make CI pass.
When working on UI stuff for the Tauri clients on macOS it's helpful if
the UI is buildable. This is a first stab at getting a stub client to
launch on macOS with the help of our AI overlords. Feel free to close or
heavily critique if there is a better approach.
The latest release now also sorts workspace dependencies, as well as
different dependency sections. Keeping these things sorted reduces the
chances of merge conflicts when multiple PRs edit these files.
For our telemetry sessions with Sentry, we need to know which
environment we are running in, i.e. staging, production or on-prem. The
GUI client's tunnel service doesn't have a concept of an environment
until a GUI connects and sends the `StartTelemetry` message. Therefore,
we should scope a telemetry session to a GUI being connected over IPC.
Any errors around setting up / tearing down the background service are a
catch-22. Until a GUI connects, we can't initialise the telemetry
connection but if we fail to set up the background service, no GUI can
ever connect. Hence, the current setup and tear down of the `Telemetry`
module around the `ipc_listen` calls can safely be removed as they are
effectively no-ops anyway.
The name IPC service is not very descriptive. By nature of being
separate processes, we need to use IPC to communicate between them. The
important thing is that the service process has control over the tunnel.
Therefore, we rename everything to "Tunnel service".
The only part that is not changed are historic changelog entries.
Resolves: #9048
In order to avoid routing loops on Windows, our UDP and TCP sockets in
`connlib` embed a "source IP resolver" that finds the "next best"
interface after our TUN device according to Windows' routing metrics.
This ensures that packets don't get routed back into our TUN device.
Currently, errors during this process are only logged on TRACE and
therefore not visible in Sentry. We fix this by moving around some of
the function interfaces and forward the error from the source IP
resolver together with some context of the destination IP.
When this crate is compiled by itself, these features are required. This
doesn't show up in CI because there we compile the entire workspace and
some crate somewhere already activates these features then.
For at least 1 user, the threads shut down correctly, but we didn't seem
to have exited the loop. In
https://firezone-inc.sentry.io/issues/6335839279/events/c11596de18924ee3a1b64ced89b1fba2/?project=4508008945549312,
we can see that both flags are marked as `true` yet we still emitted the
message.
The only way how I can explain this is that the thread shut down in
between the two times we've called the `is_finished` function. To ensure
this doesn't happen, we now only read it once.
This however also shows that 5s may not be enough time for WinTUN to
shutdown. Therefore, we increase the grace period to 10s.
The current `rust/` directory is a bit of a wild-west in terms of how
the crates are organised. Most of them are simply at the top-level when
in reality, they are all `connlib`-related. The Apple and Android FFI
crates - which are entrypoints in the Rust code are defined several
layers deep.
To improve the situation, we move around and rename several crates. The
end result is that all top-level crates / directories are:
- Either entrypoints into the Rust code, i.e. applications such as
Gateway, Relay or a Client
- Or crates shared across all those entrypoints, such as `telemetry` or
`logging`
The module and crate structure around the GUI client and its background
service are currently a mess of circular dependencies. Most of the
service implementation actually sits in `firezone-headless-client`
because the headless-client and the service share certain modules. We
have recently moved most of these to `firezone-bin-shared` which is the
correct place for these modules.
In order to move the background service to `firezone-gui-client`, we
need to untangle a few more things in the GUI client. Those are done
commit-by-commit in this PR. With that out the way, we can finally move
the service module to the GUI client; where is should actually live
given that it has nothing to do with the headless client.
As a result, the headless-client is - as one would expect - really just
a thin wrapper around connlib itself and is reduced down to 4 files with
this PR.
To make things more consistent in the GUI client, we move the `main.rs`
file also into `bin/`. By convention `bin/` is where you define binaries
if a crate has more than one. cargo will then build all of them.
Eventually, we can optimise the compile-times for `firezone-gui-client`
by splitting it into multiple crates:
- Shared structs like IPC messages
- Background service
- GUI client
This will be useful because it allows only re-compiling of the GUI
client alone if nothing in `connlib` changes and vice versa.
Resolves: #6913Resolves: #5754
Both `device_id` and `device_info` are used by the headless-client and
the GUI client / IPC service. They should therefore be defined in the
`bin-shared` crate.
Currently, the platform-specific code for controlling DNS resolution on
a system sits in `firezone-headless-client`. This code is also used by
the GUI client. This creates a weird compile-time dependency from the
GUI client to the headless client.
For other components that have platform-specific implementations, we use
the `firezone-bin-shared` crate. As a first step of resolving the
compile-time dependency, we move the `dns_control` module to
`firezone-bin-shared`.
Presently, the network change detection on Windows is very naive and
simply emits a change event everytime _anything_ changes. We can
optimise this and therefore improve the start-up time of Firezone by:
- Filtering out duplicate events
- Filtering out network change events for our own network adapter
This reduces the number of network change events to 1 during startup. As
far as I can tell from the code comments in this area, we explicitly
send this one to ensure we don't run into a race condition whilst we are
starting up.
Resolves: #8905
The `signals` module isn't something headless-client specific and should
live in our `bin-shared` crate. Once the `ipc_service` module is
decoupled from the headless-client crate, it will be used by both the
headless client and IPC service (which then will be defined in the GUI
client crate).
The `known_dirs` module is used across the headless-client and the GUI
client. It should live in `bin-shared` where all the other
cross-platform modules are.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
The `uptime` module from `firezone-headless-client` is also used in the
GUI client. In order to decouple this dependency, we move the module to
`bin-shared`, next to the other cross-plaform modules.
This tunnel throughput benchmark isn't a very useful benchmark and it is
very flaky. Remove it entirely until we can replace it with something
more robust and useful.
Resolves: #8172
We have been using buffer pools for a while all over `connlib` as a way
to efficiently use heap-allocated memory. This PR harmonizes the usage
of buffer pools across the codebase by introducing a dedicated
`bufferpool` crate. This crate offers a convenient and easy-to-use API
for all the things we (currently) need from buffer pools. As a nice
bonus of having it all in one place, we can now also track metrics of
how many buffers we have currently allocated.
An example output from the local metrics exporter looks like this:
```
Name : system.buffer.count
Description : The number of buffers allocated in the pool.
Unit : {buffers}
Type : Sum
Sum DataPoints
Monotonic : false
Temporality : Cumulative
DataPoint #0
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 96
Attributes :
-> system.buffer.pool.name: udp-socket-v6
-> system.buffer.pool.buffer_size: 65535
DataPoint #1
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 7
Attributes :
-> system.buffer.pool.buffer_size: 131600
-> system.buffer.pool.name: gso-queue
DataPoint #2
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 128
Attributes :
-> system.buffer.pool.name: udp-socket-v4
-> system.buffer.pool.buffer_size: 65535
DataPoint #3
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 8
Attributes :
-> system.buffer.pool.buffer_size: 1336
-> system.buffer.pool.name: ip-packet
DataPoint #4
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 9
Attributes :
-> system.buffer.pool.buffer_size: 1336
-> system.buffer.pool.name: snownet
```
Resolves: #8385
When working on the Rust code of Firezone from a MacOS computer, it is
useful to have pretty much all of the code at least compile to ensure
detect problems early. Eventually, once we target features like a
headless MacOS client, some of these stubs will actually be filled in an
be functional.
Correctly implementing asynchronous IO is notoriously hard. In order to
not drop packets in the process, one has to ensure a given socket is
ready to accept packets, buffer them if it is not case, suspend
everything else until the socket is ready and then continue.
Until now, we did this because it was the only option to run the UDP
sockets on the same thread as the actual packet processing. That in turn
was motivated by wanting to pass around references of the received
packets for processing. Rust's borrow-checker does not allow to pass
references between threads which forced us to have the sockets on the
same thread as the packet processing.
Like we already did in other places in `connlib`, this can be solved
through the use of buffer pools. Using a buffer pool, we can use heap
allocations to store the received packets without having to make a new
allocation every time we read new packets. Instead, we can have a
dedicated thread that is connected to `connlib`'s packet processing
thread via two channels (one for inbound and one for outbound packets).
These channels are bounded, which ensures backpressure is maintained in
case one of the two threads lags behind. These bounds also mean that we
have at most N buffers from the buffer pool in-flight (where N is the
capacity of the channel).
Within those dedicated threads, we can then use `async/await` notation
to suspend the entire task when a socket isn't ready for sending.
Resolves: #8000
From the perspective of any application, Firezone is a layer-3 network
and will thus use the host's networking stack to form IP packets for
whichever application protocol is in use (UDP, TCP, etc). These packets
then get encapsulated into UDP packets by Firezone and sent to a
Gateway.
As a result of this design, the IP header seen by the networking stacks
of the Client and the receiving service are not visible to any
intermediary along the network path of the Client and Gateway.
In case this network path is congested and middleboxes such as routers
need to drop packets, they will look at the ECN bits in the IP header
(of the UDP packet generated by a Client or Gateway) and flip a bit in
case the previous value indicated support for ECN (`0x01` or `0x10`).
When received by a network stack that supports ECN, seeing `0x11` means
that the network path is congested and that it must reduce its
send/receive windows (or otherwise throttle the connection).
At present, this doesn't work with Firezone because of the
aforementioned encapsulation of IP packets. To support ECN, we need to
therefore:
- Copy ECN bits from a received IP packet to the datagram that
encapsulates it: This ensures that if the Client's network stack support
ECN, we mirror that support on the wire.
- Copy ECN bits from a received datagram to the IP packet the is sent to
the TUN device: This ensures that if the "Congestion Experienced" bit
get set along the network path between Client and Gateway, we reflect
that accordingly on the IP packet emitted by the TUN device.
Resolves: #3758
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
The bugfix we attempted in #8156 turned out wrong. Reading the
source-code, we have to call `Session::shutdown` in order to actually
cancel the `Session::receive_blocking` call. Not doing so means we run
into the timeout when discarding the `Tun` device because the
recv-thread is stuck in `Session::receive_blocking`.
Fixes: #8395
Dependabot appears to have a hard time to bump the Tauri dependencies in
a group together. Additionally, our dependency linter `cargo deny`
disallows duplicate dependencies by default. To avoid introducing more
duplicate dependencies, we depend on the upstream `main` branch of two
projects that have already updated their dependencies but did not yet
cut a release.
In #8159, we introduced a regression that could lead to a deadlock when
shutting down the TUN device. Whilst we did close the channel prior to
awaiting the thread to exit, we failed to notice that _another_ instance
of the sender could be alive as part of an internally stored "sending
permit" with the `PollSender` in case another packet is queued for
sending. We need to explicitly call `abort_send` to free that.
Judging from the comment and a prior bug, this shutdown logic has been
buggy before. To further avoid this deadlock, we introduce two changes:
- The worker threads only receive a `Weak` reference to the
`wintun::Session`
- We move all device-related state into a dedicated `TunState` struct
that we can drop prior to joining the threads
The combination of these features means that all strong references to
channels and the session are definitely dropped without having to wait
for anything. To provide a clean and synchronous shutdown, we wait for
at most 5s on the worker-threads. If they don't exit until then, we log
a warning and exit anyway.
This should greatly reduce the risk of future bugs here because the
session (and thus the WinTUN device) gets shutdown in any case and so at
worst, we have a few zombie threads around.
Resolves: #8265
This appears to have regressed in #8159. It is low-priority right now
and we need to unblock a flaky CI so lower the expected BPS and
investigate later.
Same as done for unix-based operation systems in #8117, we introduce a
dedicated "TUN send" thread for Windows in this PR. Not only does this
move the syscalls and copying of sending packets away from `connlib`'s
main thread but it also establishes backpressure between those threads
properly.
WinTUN does not have any ability to signal that it has space in its send
buffer. If it fails to allocate a packet for sending, it will return
`ERROR_BUFFER_OVERFLOW` [0]. We now handle this case gracefully by
suspending the send thread for 10ms and then try again. This isn't a
great way of establishing back-pressure but at least we don't have any
packet loss.
To test this, I temporarily lowered the ring buffer size and ran a speed
test. In that, I could confirm that `ERROR_BUFFER_OVERFLOW` is indeed
emitted and handled as intended.
[0]: https://git.zx2c4.com/wintun/tree/api/session.c#n267
The `wintun` crate will already shutdown the session for us when the
last instance of `Session` gets dropped. Shutting down the session prior
to that already results in an attempt to close an adapter that is no
longer present, causing WinTUN to log (unactionable) errors.
We appear to have caused a pretty big performance regression (~40%) in
037a2e64b6 (identified through
`git-bisect`). Specifically, the regression appears to have been caused
by [`aef411a`
(#7605)](aef411abf5).
Weirdly enough, undoing just that on top of `main` doesn't fix the
regression.
My hypothesis is that using the same file descriptor for read AND write
interests on the same runtime causes issues because those interests are
occasionally cleared (i.e. on false-positive wake-ups).
In this PR, we spawn a dedicated thread each for the sending and
receiving operations of the TUN device. On unix-based systems, a TUN
device is just a file descriptor and can therefore simply be copied and
read & written to from different threads. Most importantly, we only
construct the `AsyncFd` _within_ the newly spawned thread and runtime
because constructing an `AsyncFd` implicitly registers with the runtime
active on the current thread.
As a nice benefit, this allows us to get rid of a `future::select`.
Those are always kind of nasty because they cancel the future that
wasn't ready. My original intuition was that we drop packets due to
cancelled futures there but that could not be confirmed in experiments.
The error code we see here means "There are no more endpoints available
from the endpoint mapper." This has something to do with Windows'
internal RPC communication between components. DNS deactivation is on a
best-effort basis and it appears that everything else is working just
fine, despite this error.
It appears to happen when we shut down our own service, so perhaps it is
just a race condition.
We've previously tried to handle the "No such process" error from
netlink when it tries to remove a route that no longer exists. What we
failed to do is use the correct sign for the error code as netlink
errors are always negative, yet when printed, the are positive numbers.
When an IP stack is programmatically disabled, such as with:
> reg add
"HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip6\Parameters"
/v DisabledComponents /t REG_DWORD /d 255 /f
Attempting to interact with this IP stack will yield "NOT_FOUND" errors.
These aren't worth reporting to Sentry because there isn't much we can
do about it.
The errors returned from Win32 API calls are currently duplicated in
several places. To makes it error-prone to handle them correctly. With
this PR, we de-duplicate this and add proper docs and links for further
reading to them.
We also fix a case where we would currently fail to set IP addresses for
our tunnel interface if the IP stack is not supported.
As it turns out, the effort in #7104 was not a good idea. By logging
errors as values, most of our Sentry reports all have the same title and
thus cannot be differentiated from within the overview at all. To fix
this, we stringify errors with all their sources whenever they got
logged. This ensures log messages are unique and all Sentry issues will
have a useful title.
This updates `wintun` to include
https://github.com/nulldotblack/wintun/pull/27. The error message of the
linked Sentry issues in #7901 of "Unable to find matching {UUID}" is
removed in that PR in favor of a Win32 function that converts the
adapter name directly to the corresponding index and UUID instead of
searching through the list of adapters.
Resolves: #7901.
Within `connlib` - on UNIX platforms - we have dedicated threads that
read from and write to the TUN device. These threads are connected with
`connlib`'s main thread via bounded channels: one in each direction.
When these channels are full, `connlib`'s main thread will suspend and
not read any network packets from the sockets in order to maintain
back-pressure. Reading more packets from the socket would mean most
likely sending more packets out the TUN device.
When debugging #7763, it became apparent that _something_ must be wrong
with these threads and that somehow, we either consider them as full or
aren't emptying them and as a result, we don't read _any_ network
packets from our sockets.
To maintain back-pressure here, we currently use our own `AtomicWaker`
construct that is shared with the TUN thread(s). This is unnecessary. We
can also directly convert the `flume::Sender` into a
`flume::async::SendSink` and therefore directly access a `poll`
interface.
`rtnetlink` has some breaking changes in their latest version. To avoid
waiting until they actually cut a release, we temporarily depend on
their `main` branch.
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Within `connlib`, we read batches of IP packets and process them at
once. Each encrypted packet is appended to a buffer shared with other
packets of the same length. Once the batch is successfully processed,
all of these buffers are written out using GSO to the network. This
allows UDP operations to be much more efficient because not every packet
has to traverse the entire syscall hierarchy of the operating system.
Until now, these buffers got re-allocated on every batch. This is pretty
wasteful and leads to a lot of repeated allocations. Measurements show
that most of the time, we only have a handful of packets with different
segments lengths _per batch_. For example, just booting up the
headless-client and running a speedtest showed that only 5 of these
buffers are were needed at one time.
By introducing a buffer pool, we can reuse these buffers between batches
and avoid reallocating them.
Related: #7747.
Reading and writing to the TUN device within `connlib` happens in a
separate thread. The task running within these threads is connected to
the rest of `connlib` via channels. When the application shuts down,
these threads also need to exit. Currently, we attempt to detect this
from within the task when these channels close. It appears that there is
a race condition here because we first attempt to read from the TUN
device before reading from the channels. We treat read & write errors on
the TUN device as non-fatal so we loop around and attempt to read from
it again, causing an infinite-loop and log spam.
To fix this, we swap the order in which we evaluate the two concurrent
tasks: The first task to be polled is now the channel for outbound
packets and only if that one is empty, we attempt to read new packets
from the TUN device. This is also better from a backpressure point of
view: We should attempt to flush out our local buffers of already
processed packets before taking on "new work".
As a defense-in-depth strategy, we also attempt to detect the particular
error from the tokio runtime when it is being shut down and exit the
task.
Resolves: #7601.
Related: https://github.com/tokio-rs/tokio/issues/7056.