Commit Graph

99 Commits

Author SHA1 Message Date
Thomas Eizinger
18ec6c6860 refactor(rust): move service implementation to GUI client (#9045)
The module and crate structure around the GUI client and its background
service are currently a mess of circular dependencies. Most of the
service implementation actually sits in `firezone-headless-client`
because the headless-client and the service share certain modules. We
have recently moved most of these to `firezone-bin-shared` which is the
correct place for these modules.

In order to move the background service to `firezone-gui-client`, we
need to untangle a few more things in the GUI client. Those are done
commit-by-commit in this PR. With that out the way, we can finally move
the service module to the GUI client; where is should actually live
given that it has nothing to do with the headless client.

As a result, the headless-client is - as one would expect - really just
a thin wrapper around connlib itself and is reduced down to 4 files with
this PR.

To make things more consistent in the GUI client, we move the `main.rs`
file also into `bin/`. By convention `bin/` is where you define binaries
if a crate has more than one. cargo will then build all of them.

Eventually, we can optimise the compile-times for `firezone-gui-client`
by splitting it into multiple crates:

- Shared structs like IPC messages
- Background service
- GUI client

This will be useful because it allows only re-compiling of the GUI
client alone if nothing in `connlib` changes and vice versa.

Resolves: #6913
Resolves: #5754
2025-05-08 13:22:09 +00:00
Thomas Eizinger
f2b1fbe718 refactor(rust): move device_id to bin-shared (#9040)
Both `device_id` and `device_info` are used by the headless-client and
the GUI client / IPC service. They should therefore be defined in the
`bin-shared` crate.
2025-05-06 04:52:37 +00:00
Thomas Eizinger
f11a902b3d refactor(rust): move dns-control to bin-shared (#9023)
Currently, the platform-specific code for controlling DNS resolution on
a system sits in `firezone-headless-client`. This code is also used by
the GUI client. This creates a weird compile-time dependency from the
GUI client to the headless client.

For other components that have platform-specific implementations, we use
the `firezone-bin-shared` crate. As a first step of resolving the
compile-time dependency, we move the `dns_control` module to
`firezone-bin-shared`.
2025-05-06 01:29:09 +00:00
Thomas Eizinger
005b6fe863 feat(windows): optimise network change detection (#9021)
Presently, the network change detection on Windows is very naive and
simply emits a change event everytime _anything_ changes. We can
optimise this and therefore improve the start-up time of Firezone by:

- Filtering out duplicate events
- Filtering out network change events for our own network adapter

This reduces the number of network change events to 1 during startup. As
far as I can tell from the code comments in this area, we explicitly
send this one to ensure we don't run into a race condition whilst we are
starting up.

Resolves: #8905
2025-05-06 00:23:27 +00:00
Thomas Eizinger
806996c245 refactor(rust): move signals to bin-shared (#9024)
The `signals` module isn't something headless-client specific and should
live in our `bin-shared` crate. Once the `ipc_service` module is
decoupled from the headless-client crate, it will be used by both the
headless client and IPC service (which then will be defined in the GUI
client crate).
2025-05-05 23:34:26 +00:00
Thomas Eizinger
ce51c40d0d refactor(rust): move known_dirs to bin-shared (#9026)
The `known_dirs` module is used across the headless-client and the GUI
client. It should live in `bin-shared` where all the other
cross-platform modules are.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2025-05-05 22:45:53 +00:00
Thomas Eizinger
80335676b1 refactor(rust): move uptime to bin-shared (#9027)
The `uptime` module from `firezone-headless-client` is also used in the
GUI client. In order to decouple this dependency, we move the module to
`bin-shared`, next to the other cross-plaform modules.
2025-05-05 12:28:26 +00:00
Thomas Eizinger
8a201494d0 ci: remove flaky Windows benchmark (#8941)
This tunnel throughput benchmark isn't a very useful benchmark and it is
very flaky. Remove it entirely until we can replace it with something
more robust and useful.

Resolves: #8172
2025-04-30 07:24:21 -07:00
Thomas Eizinger
e031dfdb4a refactor(connlib): introduce our own bufferpool crate (#8928)
We have been using buffer pools for a while all over `connlib` as a way
to efficiently use heap-allocated memory. This PR harmonizes the usage
of buffer pools across the codebase by introducing a dedicated
`bufferpool` crate. This crate offers a convenient and easy-to-use API
for all the things we (currently) need from buffer pools. As a nice
bonus of having it all in one place, we can now also track metrics of
how many buffers we have currently allocated.

An example output from the local metrics exporter looks like this:

```
Name         : system.buffer.count
Description  : The number of buffers allocated in the pool.
Unit         : {buffers}
Type         : Sum
Sum DataPoints
Monotonic    : false
Temporality  : Cumulative
DataPoint #0
	StartTime    : 2025-04-29 12:41:25.278436
	EndTime      : 2025-04-29 12:42:25.278088
	Value        : 96
	Attributes   :
		 ->  system.buffer.pool.name: udp-socket-v6
		 ->  system.buffer.pool.buffer_size: 65535
DataPoint #1
	StartTime    : 2025-04-29 12:41:25.278436
	EndTime      : 2025-04-29 12:42:25.278088
	Value        : 7
	Attributes   :
		 ->  system.buffer.pool.buffer_size: 131600
		 ->  system.buffer.pool.name: gso-queue
DataPoint #2
	StartTime    : 2025-04-29 12:41:25.278436
	EndTime      : 2025-04-29 12:42:25.278088
	Value        : 128
	Attributes   :
		 ->  system.buffer.pool.name: udp-socket-v4
		 ->  system.buffer.pool.buffer_size: 65535
DataPoint #3
	StartTime    : 2025-04-29 12:41:25.278436
	EndTime      : 2025-04-29 12:42:25.278088
	Value        : 8
	Attributes   :
		 ->  system.buffer.pool.buffer_size: 1336
		 ->  system.buffer.pool.name: ip-packet
DataPoint #4
	StartTime    : 2025-04-29 12:41:25.278436
	EndTime      : 2025-04-29 12:42:25.278088
	Value        : 9
	Attributes   :
		 ->  system.buffer.pool.buffer_size: 1336
		 ->  system.buffer.pool.name: snownet
```

Resolves: #8385
2025-04-30 08:52:18 +00:00
Thomas Eizinger
6114bb274f chore(rust): make most of the Rust code compile on MacOS (#8924)
When working on the Rust code of Firezone from a MacOS computer, it is
useful to have pretty much all of the code at least compile to ensure
detect problems early. Eventually, once we target features like a
headless MacOS client, some of these stubs will actually be filled in an
be functional.
2025-04-29 11:20:09 +00:00
Thomas Eizinger
93036734ae build(rust): move our own windows dependency to 0.61.0 (#8730)
Version `0.61.0` is what most of our dependencies bring in, so depending
on that allows us to unify the dependency tree here.
2025-04-22 02:35:28 +00:00
Thomas Eizinger
b3746b330f refactor(connlib): spawn dedicated threads for UDP sockets (#7590)
Correctly implementing asynchronous IO is notoriously hard. In order to
not drop packets in the process, one has to ensure a given socket is
ready to accept packets, buffer them if it is not case, suspend
everything else until the socket is ready and then continue.

Until now, we did this because it was the only option to run the UDP
sockets on the same thread as the actual packet processing. That in turn
was motivated by wanting to pass around references of the received
packets for processing. Rust's borrow-checker does not allow to pass
references between threads which forced us to have the sockets on the
same thread as the packet processing.

Like we already did in other places in `connlib`, this can be solved
through the use of buffer pools. Using a buffer pool, we can use heap
allocations to store the received packets without having to make a new
allocation every time we read new packets. Instead, we can have a
dedicated thread that is connected to `connlib`'s packet processing
thread via two channels (one for inbound and one for outbound packets).
These channels are bounded, which ensures backpressure is maintained in
case one of the two threads lags behind. These bounds also mean that we
have at most N buffers from the buffer pool in-flight (where N is the
capacity of the channel).

Within those dedicated threads, we can then use `async/await` notation
to suspend the entire task when a socket isn't ready for sending.

Resolves: #8000
2025-04-14 06:18:06 +00:00
Jamil
4afcdf1c53 test(windows): Expect 80 Mbps on slow actions runners (#8621)
These are still failing a good portion of the time:


https://github.com/firezone/firezone/actions/runs/14226461996/job/39867070540?pr=8620
2025-04-02 22:22:20 +00:00
Thomas Eizinger
58fe527b0e feat(connlib): mirror ECN bits on TUN device (#8511)
From the perspective of any application, Firezone is a layer-3 network
and will thus use the host's networking stack to form IP packets for
whichever application protocol is in use (UDP, TCP, etc). These packets
then get encapsulated into UDP packets by Firezone and sent to a
Gateway.

As a result of this design, the IP header seen by the networking stacks
of the Client and the receiving service are not visible to any
intermediary along the network path of the Client and Gateway.

In case this network path is congested and middleboxes such as routers
need to drop packets, they will look at the ECN bits in the IP header
(of the UDP packet generated by a Client or Gateway) and flip a bit in
case the previous value indicated support for ECN (`0x01` or `0x10`).
When received by a network stack that supports ECN, seeing `0x11` means
that the network path is congested and that it must reduce its
send/receive windows (or otherwise throttle the connection).

At present, this doesn't work with Firezone because of the
aforementioned encapsulation of IP packets. To support ECN, we need to
therefore:

- Copy ECN bits from a received IP packet to the datagram that
encapsulates it: This ensures that if the Client's network stack support
ECN, we mirror that support on the wire.
- Copy ECN bits from a received datagram to the IP packet the is sent to
the TUN device: This ensures that if the "Congestion Experienced" bit
get set along the network path between Client and Gateway, we reflect
that accordingly on the IP packet emitted by the TUN device.

Resolves: #3758

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
2025-03-26 20:55:51 +00:00
Thomas Eizinger
84a2c275ca build(rust): upgrade to Rust 1.85 and Edition 2024 (#8240)
Updates our codebase to the 2024 Edition. For highlights on what
changes, see the following blogpost:
https://blog.rust-lang.org/2025/02/20/Rust-1.85.0.html
2025-03-19 02:58:55 +00:00
Thomas Eizinger
7af4b91ac5 fix(gui-client): call wintun::Session::shutdown on drop (#8464)
The bugfix we attempted in #8156 turned out wrong. Reading the
source-code, we have to call `Session::shutdown` in order to actually
cancel the `Session::receive_blocking` call. Not doing so means we run
into the timeout when discarding the `Tun` device because the
recv-thread is stuck in `Session::receive_blocking`.

Fixes: #8395
2025-03-17 12:58:03 +00:00
Thomas Eizinger
152939c7dd build(rust): bump Tauri dependencies (#8459)
Dependabot appears to have a hard time to bump the Tauri dependencies in
a group together. Additionally, our dependency linter `cargo deny`
disallows duplicate dependencies by default. To avoid introducing more
duplicate dependencies, we depend on the upstream `main` branch of two
projects that have already updated their dependencies but did not yet
cut a release.
2025-03-17 12:19:20 +00:00
Thomas Eizinger
2fe5c00c64 fix(windows): break from retry loop if we sent the packet (#8271)
Regression introduced in #8268.
2025-02-26 06:10:02 +00:00
Thomas Eizinger
96170be082 fix(gui-client): mitigate deadlock when shutting down TUN device (#8268)
In #8159, we introduced a regression that could lead to a deadlock when
shutting down the TUN device. Whilst we did close the channel prior to
awaiting the thread to exit, we failed to notice that _another_ instance
of the sender could be alive as part of an internally stored "sending
permit" with the `PollSender` in case another packet is queued for
sending. We need to explicitly call `abort_send` to free that.

Judging from the comment and a prior bug, this shutdown logic has been
buggy before. To further avoid this deadlock, we introduce two changes:

- The worker threads only receive a `Weak` reference to the
`wintun::Session`
- We move all device-related state into a dedicated `TunState` struct
that we can drop prior to joining the threads

The combination of these features means that all strong references to
channels and the session are definitely dropped without having to wait
for anything. To provide a clean and synchronous shutdown, we wait for
at most 5s on the worker-threads. If they don't exit until then, we log
a warning and exit anyway.

This should greatly reduce the risk of future bugs here because the
session (and thus the WinTUN device) gets shutdown in any case and so at
worst, we have a few zombie threads around.

Resolves: #8265
2025-02-26 00:46:12 +00:00
Thomas Eizinger
643347ba0e test(windows): reduce expected BPS of WinTUN benchmark (#8171)
This appears to have regressed in #8159. It is low-priority right now
and we need to unblock a flaky CI so lower the expected BPS and
investigate later.
2025-02-18 03:34:14 +00:00
Thomas Eizinger
33c707dbf6 feat(windows): introduce dedicated "TUN send" thread (#8159)
Same as done for unix-based operation systems in #8117, we introduce a
dedicated "TUN send" thread for Windows in this PR. Not only does this
move the syscalls and copying of sending packets away from `connlib`'s
main thread but it also establishes backpressure between those threads
properly.

WinTUN does not have any ability to signal that it has space in its send
buffer. If it fails to allocate a packet for sending, it will return
`ERROR_BUFFER_OVERFLOW` [0]. We now handle this case gracefully by
suspending the send thread for 10ms and then try again. This isn't a
great way of establishing back-pressure but at least we don't have any
packet loss.

To test this, I temporarily lowered the ring buffer size and ran a speed
test. In that, I could confirm that `ERROR_BUFFER_OVERFLOW` is indeed
emitted and handled as intended.

[0]: https://git.zx2c4.com/wintun/tree/api/session.c#n267
2025-02-17 20:33:45 +00:00
Thomas Eizinger
af9fc49b18 fix(windows): don't double shutdown session (#8156)
The `wintun` crate will already shutdown the session for us when the
last instance of `Session` gets dropped. Shutting down the session prior
to that already results in an attempt to close an adapter that is no
longer present, causing WinTUN to log (unactionable) errors.
2025-02-17 05:38:11 +00:00
Thomas Eizinger
10ba02e341 fix(connlib): split TUN send & recv into separate threads (#8117)
We appear to have caused a pretty big performance regression (~40%) in
037a2e64b6 (identified through
`git-bisect`). Specifically, the regression appears to have been caused
by [`aef411a`
(#7605)](aef411abf5).
Weirdly enough, undoing just that on top of `main` doesn't fix the
regression.

My hypothesis is that using the same file descriptor for read AND write
interests on the same runtime causes issues because those interests are
occasionally cleared (i.e. on false-positive wake-ups).

In this PR, we spawn a dedicated thread each for the sending and
receiving operations of the TUN device. On unix-based systems, a TUN
device is just a file descriptor and can therefore simply be copied and
read & written to from different threads. Most importantly, we only
construct the `AsyncFd` _within_ the newly spawned thread and runtime
because constructing an `AsyncFd` implicitly registers with the runtime
active on the current thread.

As a nice benefit, this allows us to get rid of a `future::select`.
Those are always kind of nasty because they cancel the future that
wasn't ready. My original intuition was that we drop packets due to
cancelled futures there but that could not be confirmed in experiments.
2025-02-14 05:32:51 +00:00
Thomas Eizinger
7dcda1dc74 fix(windows): silence 0x800706D9 when DNS deactivation fails (#8085)
The error code we see here means "There are no more endpoints available
from the endpoint mapper." This has something to do with Windows'
internal RPC communication between components. DNS deactivation is on a
best-effort basis and it appears that everything else is working just
fine, despite this error.

It appears to happen when we shut down our own service, so perhaps it is
just a race condition.
2025-02-11 05:38:37 +00:00
Thomas Eizinger
d7ebd07183 fix(linux): check for correct sign of netlink error code (#8087)
We've previously tried to handle the "No such process" error from
netlink when it tries to remove a route that no longer exists. What we
failed to do is use the correct sign for the error code as netlink
errors are always negative, yet when printed, the are positive numbers.
2025-02-11 04:47:51 +00:00
Thomas Eizinger
b193dd91f6 fix(windows): don't warn on disabled IP stack (#8086)
When an IP stack is programmatically disabled, such as with:

> reg add
"HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip6\Parameters"
/v DisabledComponents /t REG_DWORD /d 255 /f

Attempting to interact with this IP stack will yield "NOT_FOUND" errors.
These aren't worth reporting to Sentry because there isn't much we can
do about it.
2025-02-11 04:37:17 +00:00
Thomas Eizinger
436b502eab fix(windows): handle disabled IPv6 stack gracefully (#8083)
Fixes: #8049.
2025-02-11 03:21:32 +00:00
Thomas Eizinger
f48df7585c refactor(windows): de-duplicate Win32 error codes (#8071)
The errors returned from Win32 API calls are currently duplicated in
several places. To makes it error-prone to handle them correctly. With
this PR, we de-duplicate this and add proper docs and links for further
reading to them.

We also fix a case where we would currently fail to set IP addresses for
our tunnel interface if the IP stack is not supported.
2025-02-10 23:33:06 +00:00
Thomas Eizinger
d2e9b09874 refactor(rust): stringify errors early (#8033)
As it turns out, the effort in #7104 was not a good idea. By logging
errors as values, most of our Sentry reports all have the same title and
thus cannot be differentiated from within the overview at all. To fix
this, we stringify errors with all their sources whenever they got
logged. This ensures log messages are unique and all Sentry issues will
have a useful title.
2025-02-06 14:18:35 +00:00
Thomas Eizinger
90fb9b8478 refactor(connlib): use Win32 APIs instead of netsh to set IPs (#8003)
This should be faster and hopefully more reliable.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2025-02-03 06:24:28 +00:00
Thomas Eizinger
4c31ac958d build(deps): bump wintun to v0.5.1 (#7912)
This updates `wintun` to include
https://github.com/nulldotblack/wintun/pull/27. The error message of the
linked Sentry issues in #7901 of "Unable to find matching {UUID}" is
removed in that PR in favor of a Win32 function that converts the
adapter name directly to the corresponding index and UUID instead of
searching through the list of adapters.

Resolves: #7901.
2025-01-30 02:44:23 +00:00
Thomas Eizinger
8bd8098cab refactor(connlib): don't re-implement waker for TUN thread (#7944)
Within `connlib` - on UNIX platforms - we have dedicated threads that
read from and write to the TUN device. These threads are connected with
`connlib`'s main thread via bounded channels: one in each direction.
When these channels are full, `connlib`'s main thread will suspend and
not read any network packets from the sockets in order to maintain
back-pressure. Reading more packets from the socket would mean most
likely sending more packets out the TUN device.

When debugging #7763, it became apparent that _something_ must be wrong
with these threads and that somehow, we either consider them as full or
aren't emptying them and as a result, we don't read _any_ network
packets from our sockets.

To maintain back-pressure here, we currently use our own `AtomicWaker`
construct that is shared with the TUN thread(s). This is unnecessary. We
can also directly convert the `flume::Sender` into a
`flume::async::SendSink` and therefore directly access a `poll`
interface.
2025-01-29 15:48:48 +00:00
Thomas Eizinger
416e320319 revert: bump netlink-packet-route and rtnetlink (#7899)
Reverts: #6694
Related: https://github.com/rust-netlink/netlink-packet-route/issues/140
2025-01-28 06:29:07 +00:00
dependabot[bot]
0779757646 build(deps): netlink-packet-route and rtnetlink (#6694)
`rtnetlink` has some breaking changes in their latest version. To avoid
waiting until they actually cut a release, we temporarily depend on
their `main` branch.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2025-01-28 05:21:52 +00:00
Thomas Eizinger
46cdbbcc23 fix(connlib): use a buffer pool for the GSO queue (#7749)
Within `connlib`, we read batches of IP packets and process them at
once. Each encrypted packet is appended to a buffer shared with other
packets of the same length. Once the batch is successfully processed,
all of these buffers are written out using GSO to the network. This
allows UDP operations to be much more efficient because not every packet
has to traverse the entire syscall hierarchy of the operating system.

Until now, these buffers got re-allocated on every batch. This is pretty
wasteful and leads to a lot of repeated allocations. Measurements show
that most of the time, we only have a handful of packets with different
segments lengths _per batch_. For example, just booting up the
headless-client and running a speedtest showed that only 5 of these
buffers are were needed at one time.

By introducing a buffer pool, we can reuse these buffers between batches
and avoid reallocating them.

Related: #7747.
2025-01-13 19:24:52 +00:00
Thomas Eizinger
037a2e64b6 fix(connlib): attempt to detect runtime shutdown within TUN task (#7605)
Reading and writing to the TUN device within `connlib` happens in a
separate thread. The task running within these threads is connected to
the rest of `connlib` via channels. When the application shuts down,
these threads also need to exit. Currently, we attempt to detect this
from within the task when these channels close. It appears that there is
a race condition here because we first attempt to read from the TUN
device before reading from the channels. We treat read & write errors on
the TUN device as non-fatal so we loop around and attempt to read from
it again, causing an infinite-loop and log spam.

To fix this, we swap the order in which we evaluate the two concurrent
tasks: The first task to be polled is now the channel for outbound
packets and only if that one is empty, we attempt to read new packets
from the TUN device. This is also better from a backpressure point of
view: We should attempt to flush out our local buffers of already
processed packets before taking on "new work".

As a defense-in-depth strategy, we also attempt to detect the particular
error from the tokio runtime when it is being shut down and exit the
task.

Resolves: #7601.
Related: https://github.com/tokio-rs/tokio/issues/7056.
2025-01-05 20:41:24 +00:00
Thomas Eizinger
26824fb3c7 fix(gateway): check if we run with correct permissions (#7565)
The gateway needs either the `CAP_NET_ADMIN` capability or run as `root`
in order to access the TUN device as well as configure routes via
`netlink`. Running without either leads to "Permission denied" errors at
runtime. It is good to fail early in these kind of situations.

By checking for this capability early on during startup, these should no
longer surface later. As a bonus, we won't receive (unactionable) Sentry
alerts.

Resolves: #7559.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-12-29 21:45:56 +00:00
Thomas Eizinger
e7cc0e5eef fix(linux): don't fail on unsupported IP version (#7583)
Firezone always attempts to handle IPv4 and IPv6. On Linux systems
without an IPv6 stack, attempts to add an IPv6 route may fail with "Not
supported (os error 95)". We don't need the IPv6 routes on those systems
as we will never receive IPv6 traffic. Therefore, we can safely ignore
these errors and not log them.
2024-12-25 11:09:22 +00:00
Thomas Eizinger
1b04b0eb2b fix(windows): don't warn on deleting non-existing route (#7507)
Similarly as Linux (#7502), we don't want to log an error if we cannot
delete a route that doesn't exist.
2024-12-13 21:09:09 +00:00
Thomas Eizinger
b5d6c27680 fix(linux): don't print error when removing non-existent route (#7502)
We are already handling one case where we are trying to remove a route
that doesn't exist. `ESRCH` is another variant of this error that
manifests as "No such process". According to the Internet, this just
means the route doesn't exist so we can bail out early here.
2024-12-13 04:53:22 +00:00
Thomas Eizinger
90cf191a7c feat(linux): multi-threaded TUN device operations (#7449)
## Context

At present, we only have a single thread that reads and writes to the
TUN device on all platforms. On Linux, it is possible to open the file
descriptor of a TUN device multiple times by setting the
`IFF_MULTI_QUEUE` option using `ioctl`. Using multi-queue, we can then
spawn multiple threads that concurrently read and write to the TUN
device. This is critical for achieving a better throughput.

## Solution

`IFF_MULTI_QUEUE` is a Linux-only thing and therefore only applies to
headless-client, GUI-client on Linux and the Gateway (it may also be
possible on Android, I haven't tried). As such, we need to first change
our internal abstractions a bit to move the creation of the TUN thread
to the `Tun` abstraction itself. For this, we change the interface of
`Tun` to the following:

- `poll_recv_many`: An API, inspired by tokio's `mpsc::Receiver` where
multiple items in a channel can be batch-received.
- `poll_send_ready`: Mimics the API of `Sink` to check whether more
items can be written.
- `send`: Mimics the API of `Sink` to actually send an item.

With these APIs in place, we can implement various (performance)
improvements for the different platforms.

- On Linux, this allows us to spawn multiple threads to read and write
from the TUN device and send all packets into the same channel. The `Io`
component of `connlib` then uses `poll_recv_many` to read batches of up
to 100 packets at once. This ties in well with #7210 because we can then
use GSO to send the encrypted packets in single syscalls to the OS.
- On Windows, we already have a dedicated recv thread because `WinTun`'s
most-convenient API uses blocking IO. As such, we can now also tie into
that by batch-receiving from this channel.
- In addition to using multiple threads, this API now also uses correct
readiness checks on Linux, Darwin and Android to uphold backpressure in
case we cannot write to the TUN device.

## Configuration

Local testing has shown that 2 threads give the best performance for a
local `iperf3` run. I suspect this is because there is only so much
traffic that a single application (i.e. `iperf3`) can generate. With
more than 2 threads, the throughput actually drops drastically because
`connlib`'s main thread is too busy with lock-contention and triggering
`Waker`s for the TUN threads (which mostly idle around if there are 4+
of them). I've made it configurable on the Gateway though so we can
experiment with this during concurrent speedtests etc.

In addition, switching `connlib` to a single-threaded tokio runtime
further increased the throughput. I suspect due to less task / context
switching.

## Results

Local testing with `iperf3` shows some very promising results. We now
achieve a throughput of 2+ Gbit/s.

```
Connecting to host 172.20.0.110, port 5201
Reverse mode, remote host 172.20.0.110 is sending
[  5] local 100.80.159.34 port 57040 connected to 172.20.0.110 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   274 MBytes  2.30 Gbits/sec
[  5]   1.00-2.00   sec   279 MBytes  2.34 Gbits/sec
[  5]   2.00-3.00   sec   216 MBytes  1.82 Gbits/sec
[  5]   3.00-4.00   sec   224 MBytes  1.88 Gbits/sec
[  5]   4.00-5.00   sec   234 MBytes  1.96 Gbits/sec
[  5]   5.00-6.00   sec   238 MBytes  2.00 Gbits/sec
[  5]   6.00-7.00   sec   229 MBytes  1.92 Gbits/sec
[  5]   7.00-8.00   sec   222 MBytes  1.86 Gbits/sec
[  5]   8.00-9.00   sec   223 MBytes  1.87 Gbits/sec
[  5]   9.00-10.00  sec   217 MBytes  1.82 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.30 GBytes  1.98 Gbits/sec  22247             sender
[  5]   0.00-10.00  sec  2.30 GBytes  1.98 Gbits/sec                  receiver

iperf Done.
```

This is a pretty solid improvement over what is in `main`:

```
Connecting to host 172.20.0.110, port 5201
[  5] local 100.65.159.3 port 56970 connected to 172.20.0.110 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  90.4 MBytes   758 Mbits/sec  1800    106 KBytes
[  5]   1.00-2.00   sec  93.4 MBytes   783 Mbits/sec  1550   51.6 KBytes
[  5]   2.00-3.00   sec  92.6 MBytes   777 Mbits/sec  1350   76.8 KBytes
[  5]   3.00-4.00   sec  92.9 MBytes   779 Mbits/sec  1800   56.4 KBytes
[  5]   4.00-5.00   sec  93.4 MBytes   783 Mbits/sec  1650   69.6 KBytes
[  5]   5.00-6.00   sec  90.6 MBytes   760 Mbits/sec  1500   73.2 KBytes
[  5]   6.00-7.00   sec  87.6 MBytes   735 Mbits/sec  1400   76.8 KBytes
[  5]   7.00-8.00   sec  92.6 MBytes   777 Mbits/sec  1600   82.7 KBytes
[  5]   8.00-9.00   sec  91.1 MBytes   764 Mbits/sec  1500   70.8 KBytes
[  5]   9.00-10.00  sec  92.0 MBytes   771 Mbits/sec  1550   85.1 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   917 MBytes   769 Mbits/sec  15700             sender
[  5]   0.00-10.00  sec   916 MBytes   768 Mbits/sec                  receiver

iperf Done.
```
2024-12-05 00:18:20 +00:00
Thomas Eizinger
dd6b52b236 chore(rust): share edition key via workspace table (#7451) 2024-12-03 00:28:06 +00:00
Thomas Eizinger
4f92a0d7ca refactor(gui-client): tidy up GUI controller code (#7444)
This PR intends to be a pure refactoring, i.e. no behaviour change. It
simplifies a few aspects of the GUI controller event-loop by getting rid
of the `select!` macro. We also remove some indirection of the
`gui_controller::Builder`.
2024-12-02 20:07:44 +00:00
Thomas Eizinger
0a6554122a feat(connlib): utilise GSO for UDP sockets (#7210)
## Context

At present, `connlib` sends UDP packets one at a time. Sending a packet
requires us to make a syscall which is quite expensive. Under load, i.e.
during a speedtest, syscalls account for over 50% of our CPU time [0].
In order to improve this situation, we need to somehow make use of GSO
(generic segmentation offload). With GSO, we can send multiple packets
to the same destination in a single syscall.

The tricky question here is, how can we achieve having multiple UDP
packets ready at once so we can send them in a single syscall? Our TUN
interface only feeds us packets one at a time and `connlib`'s state
machine is single-threaded. Additionally, we currently only have a
single `EncryptBuffer` in which the to-be-sent datagram sits.

## 1. Stack-allocating encrypted IP packets

As a first step, we get rid of the single `EncryptBuffer` and instead
stack-allocate each encrypted IP packet. Due to our small MTU, these
packets are only around 1300 bytes. Stack-allocating that requires a few
memcpy's but those are in the single-digit % range in the terms of CPU
time performance hit. That is nothing compared to how much time we are
spending on UDP syscalls. With the `EncryptBuffer` out the way, we can
now "freely" move around the `EncryptedPacket` structs and - technically
- we can have multiple of them at the same time.

## 2. Implementing GSO

The GSO interface allows you to pass multiple packets **of the same
length and for the same destination** in a single syscall, meaning we
cannot just batch-up arbitrary UDP packets. Counterintuitively, making
use of GSO requires us to do more copying: In particular, we change the
interface of `Io` such that "sending" a packet performs essentially a
lookup of a `BytesMut`-buffer by destination and packet length and
appends the payload to that packet.

## 3. Batch-read IP packets

In order to actually perform GSO, we need to process more than a single
IP packet in one event-loop tick. We achieve this by batch-reading up to
50 IP packets from the mpsc-channel that connects `connlib`'s main
event-loop with the dedicated thread that reads and writes to the TUN
device. These reads and writes happen concurrently to `connlib`'s packet
processing. Thus, it is likely that by the time `connlib` is ready to
process another IP packet, multiple have been read from the device and
are sitting in the channel. Batch-processing these IP packets means that
the buffers in our `GsoQueue` are more likely to contain more than a
single datagram.

Imagine you are running a file upload. The OS will send many packets to
the same destination IP and likely max MTU to the TUN device. It is
likely, that we read 10-20 of these packets in one batch (i.e. within a
single "tick" of the event-loop). All packets will be appended to the
same buffer in the `GsoQueue` and on the next event-loop tick, they will
all be flushed out in a single syscall.

## Results

Overall, this results in a significant reduction of syscalls for sending
UDP message. In [1], we spend only a total of 16% of our CPU time in
`udpv6_sendmsg` whereas in [0] (main), we spent a total of 34%. Do note
that these numbers are relative to the total CPU time spent per program
run and thus can't be compared directly (i.e. you cannot just do 34 - 16
and say we now spend 18% less time sending UDP packets). Nevertheless,
this appears to be a great improvement.

In terms of throughput, we achieve a ~60% improvement in our benchmark
suite. That one is running on localhost though so it might not
necessarily be reflect like that in a real network.

[0]: https://share.firefox.dev/4hvoPju
[1]: https://share.firefox.dev/4frhCPv
2024-12-02 01:09:44 +00:00
Thomas Eizinger
c6e7e6192e build(rust): bump Rust to 1.83 (#7409)
Rust 1.83 comes with a bunch of new lints for elidible lifetimes. Those
also trigger in the generated code of `derivative`. That crate is
actually unmaintained so we replace our usages of it with `derive_more`.
2024-11-29 01:04:06 +00:00
Thomas Eizinger
2c26fc9c0e ci: lint Rust dependencies using cargo deny (#7390)
One of Rust's promises is "if it compiles, it works". However, there are
certain situations in which this isn't true. In particular, when using
dynamic typing patterns where trait objects are downcast to concrete
types, having two versions of the same dependency can silently break
things.

This happened in #7379 where I forgot to patch a certain Sentry
dependency. A similar problem exists with our `tracing-stackdriver`
dependency (see #7241).

Lastly, duplicate dependencies increase the compile-times of a project,
so we should aim for having as few duplicate versions of a particular
dependency as possible in our dependency graph.

This PR introduces `cargo deny`, a linter for Rust dependencies. In
addition to linting for duplicate dependencies, it also enforces that
all dependencies are compatible with an allow-list of licenses and it
warns when a dependency is referred to from multiple crates without
introducing a workspace dependency. Thanks to existing tooling
(https://github.com/mainmatter/cargo-autoinherit), transitioning all
dependencies to workspace dependencies was quite easy.

Resolves: #7241.
2024-11-22 00:17:28 +00:00
dependabot[bot]
4014373dc2 build(deps): Bump clap from 4.5.20 to 4.5.21 in /rust (#7369)
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.20 to 4.5.21.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/releases">clap's
releases</a>.</em></p>
<blockquote>
<h2>v4.5.21</h2>
<h2>[4.5.21] - 2024-11-13</h2>
<h3>Fixes</h3>
<ul>
<li><em>(parser)</em> Ensure defaults are filled in on error with
<code>ignore_errors(true)</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/blob/master/CHANGELOG.md">clap's
changelog</a>.</em></p>
<blockquote>
<h2>[4.5.21] - 2024-11-13</h2>
<h3>Fixes</h3>
<ul>
<li><em>(parser)</em> Ensure defaults are filled in on error with
<code>ignore_errors(true)</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="03d722625a"><code>03d7226</code></a>
chore: Release</li>
<li><a
href="3df70fb2b6"><code>3df70fb</code></a>
docs: Update changelog</li>
<li><a
href="3266c36abf"><code>3266c36</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5691">#5691</a>
from epage/custom</li>
<li><a
href="951762db57"><code>951762d</code></a>
feat(complete): Allow any OsString-compatible type to be a
CompletionCandidate</li>
<li><a
href="bb6493e890"><code>bb6493e</code></a>
feat(complete): Offer - as a path option</li>
<li><a
href="27b348dbcb"><code>27b348d</code></a>
refactor(complete): Simplify ArgValueCandidates code</li>
<li><a
href="49b8108f8c"><code>49b8108</code></a>
feat(complete): Add PathCompleter</li>
<li><a
href="82a360aa54"><code>82a360a</code></a>
feat(complete): Add ArgValueCompleter</li>
<li><a
href="47aedc6906"><code>47aedc6</code></a>
fix(complete): Ensure paths are sorted</li>
<li><a
href="431e2bc931"><code>431e2bc</code></a>
test(complete): Ensure ArgValueCandidates get filtered</li>
<li>Additional commits viewable in <a
href="https://github.com/clap-rs/clap/compare/clap_complete-v4.5.20...clap_complete-v4.5.21">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=clap&package-manager=cargo&previous-version=4.5.20&new-version=4.5.21)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-19 06:16:53 +00:00
Thomas Eizinger
24f7ba530d refactor(gui-client): add more context to connection failures (#7364)
Adding more context to these errors makes it easier to identify, which
of the operations fails. In addition, we remove some usages of the "log
and return" anti-pattern to avoid duplicate reports of the same issue.
2024-11-18 18:16:16 +00:00
Thomas Eizinger
48ba2869a8 chore(rust): ban the use of .unwrap except in tests (#7319)
Using the clippy lint `unwrap_used`, we can automatically lint against
all uses of `.unwrap()` on `Result` and `Option`. This turns up quite a
few results actually. In most cases, they are invariants that can't
actually be hit. For these, we change them to `Option`. In other cases,
they can actually be hit. For example, if the user supplies an invalid
log-filter.

Activating this lint ensures the compiler will yell at us every time we
use `.unwrap` to double-check whether we do indeed want to panic here.

Resolves: #7292.
2024-11-13 03:59:22 +00:00
Thomas Eizinger
ad4eea29ff chore(rust): don't panic in fallible functions (#7298)
"Just let it crash" is terrible advice for software that is shipped to
end users. Where possible, we should use proper error handling and only
fail the current function / task that is active, e.g. drop a particular
packet instead of failing all of connlib. We more or less already do
that.

Activating the clippy lint `unwrap_in_result` surfaced a few more places
where we panic despite being in a function that is fallible already.
These cases can easily be converted to not panic and return an error
instead.
2024-11-11 23:55:23 +00:00