firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-03-21 16:41:47 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	60bdbb39cb	refactor(gui-client): move change listeners to tunnel service (#8160 ) At present, listening for DNS server change and network change events is handled in the GUI client. Upon an event, a message is sent to the tunnel service which then applies the new state to `connlib`. We can avoid some of this boilerplate by moving these listeners to the tunnel service as part of the handler. As a result, we get a few improvements: - We don't need to ignore these events if we don't have a session because the lifetime of these listeners is tied to the IPC handler on the service side. - We need fewer IPC messages - We can retry the connection directly from within the tunnel service in case we have no Internet at the time of startup - We can more easily model out the state machine of a connlib session in the tunnel service - On Linux, this means we no longer shell out to `resolvectl` from the GUI process, unifying access to the "resolvers" from the tunnel service - On Windows, we no longer need admin privileges on the GUI client for optimized network-change detection. This now happens in the Tunnel process which already runs as admin. Resolves: #9465	2025-06-11 06:18:14 +00:00
Jamil	822832e02b	chore(macos): allow tauri to build on macOS (#9391 ) When working on UI stuff for the Tauri clients on macOS it's helpful if the UI is buildable. This is a first stab at getting a stub client to launch on macOS with the help of our AI overlords. Feel free to close or heavily critique if there is a better approach.	2025-06-06 09:15:39 +00:00
Thomas Eizinger	d62f82787d	build(deps): bump `netlink` dependency group (#9315 ) In https://github.com/rust-netlink/netlink-packet-route/issues/140#issuecomment-2919539363, the author claims the issue we've been holding the dependency bump back for is resolved. We can now update to the latest versions of the `netlink` dependency group.	2025-05-31 02:34:55 +00:00
Thomas Eizinger	ae872980ae	refactor(gui-client): scope telemetry sessions to GUI client (#9179 ) For our telemetry sessions with Sentry, we need to know which environment we are running in, i.e. staging, production or on-prem. The GUI client's tunnel service doesn't have a concept of an environment until a GUI connects and sends the `StartTelemetry` message. Therefore, we should scope a telemetry session to a GUI being connected over IPC. Any errors around setting up / tearing down the background service are a catch-22. Until a GUI connects, we can't initialise the telemetry connection but if we fail to set up the background service, no GUI can ever connect. Hence, the current setup and tear down of the `Telemetry` module around the `ipc_listen` calls can safely be removed as they are effectively no-ops anyway.	2025-05-20 23:18:18 +00:00
Thomas Eizinger	1bdba3601a	feat(gui-client): rename IPC service to Tunnel service (#9154 ) The name IPC service is not very descriptive. By nature of being separate processes, we need to use IPC to communicate between them. The important thing is that the service process has control over the tunnel. Therefore, we rename everything to "Tunnel service". The only part that is not changed are historic changelog entries. Resolves: #9048	2025-05-19 09:52:06 +00:00
Thomas Eizinger	3300c0fe02	chore(rust): fix windows static analysis errors (#9162 ) The `static-analysis` job for Windows was not yet part of the rule set and therefore some clippy errors slipped through when we merged #9159.	2025-05-16 04:23:53 +00:00
Thomas Eizinger	6165555add	build(deps): bump Rust to 1.87.0 (#9159 )	2025-05-16 01:58:17 +00:00
Thomas Eizinger	b8738448df	refactor(connlib): forward error from source IP resolver (#9116 ) In order to avoid routing loops on Windows, our UDP and TCP sockets in `connlib` embed a "source IP resolver" that finds the "next best" interface after our TUN device according to Windows' routing metrics. This ensures that packets don't get routed back into our TUN device. Currently, errors during this process are only logged on TRACE and therefore not visible in Sentry. We fix this by moving around some of the function interfaces and forward the error from the source IP resolver together with some context of the destination IP.	2025-05-13 13:33:15 +00:00
Thomas Eizinger	4097ee0cdf	chore(gui-client): only read `is_finished` once (#9095 ) For at least 1 user, the threads shut down correctly, but we didn't seem to have exited the loop. In https://firezone-inc.sentry.io/issues/6335839279/events/c11596de18924ee3a1b64ced89b1fba2/?project=4508008945549312, we can see that both flags are marked as `true` yet we still emitted the message. The only way how I can explain this is that the thread shut down in between the two times we've called the `is_finished` function. To ensure this doesn't happen, we now only read it once. This however also shows that 5s may not be enough time for WinTUN to shutdown. Therefore, we increase the grace period to 10s.	2025-05-12 11:47:42 +00:00
Thomas Eizinger	5566f1847f	refactor(rust): move crates into a more sensical hierarchy (#9066 ) The current `rust/` directory is a bit of a wild-west in terms of how the crates are organised. Most of them are simply at the top-level when in reality, they are all `connlib`-related. The Apple and Android FFI crates - which are entrypoints in the Rust code are defined several layers deep. To improve the situation, we move around and rename several crates. The end result is that all top-level crates / directories are: - Either entrypoints into the Rust code, i.e. applications such as Gateway, Relay or a Client - Or crates shared across all those entrypoints, such as `telemetry` or `logging`	2025-05-12 01:04:17 +00:00
Thomas Eizinger	f2b1fbe718	refactor(rust): move `device_id` to `bin-shared` (#9040 ) Both `device_id` and `device_info` are used by the headless-client and the GUI client / IPC service. They should therefore be defined in the `bin-shared` crate.	2025-05-06 04:52:37 +00:00
Thomas Eizinger	f11a902b3d	refactor(rust): move `dns-control` to `bin-shared` (#9023 ) Currently, the platform-specific code for controlling DNS resolution on a system sits in `firezone-headless-client`. This code is also used by the GUI client. This creates a weird compile-time dependency from the GUI client to the headless client. For other components that have platform-specific implementations, we use the `firezone-bin-shared` crate. As a first step of resolving the compile-time dependency, we move the `dns_control` module to `firezone-bin-shared`.	2025-05-06 01:29:09 +00:00
Thomas Eizinger	005b6fe863	feat(windows): optimise network change detection (#9021 ) Presently, the network change detection on Windows is very naive and simply emits a change event everytime _anything_ changes. We can optimise this and therefore improve the start-up time of Firezone by: - Filtering out duplicate events - Filtering out network change events for our own network adapter This reduces the number of network change events to 1 during startup. As far as I can tell from the code comments in this area, we explicitly send this one to ensure we don't run into a race condition whilst we are starting up. Resolves: #8905	2025-05-06 00:23:27 +00:00
Thomas Eizinger	806996c245	refactor(rust): move `signals` to `bin-shared` (#9024 ) The `signals` module isn't something headless-client specific and should live in our `bin-shared` crate. Once the `ipc_service` module is decoupled from the headless-client crate, it will be used by both the headless client and IPC service (which then will be defined in the GUI client crate).	2025-05-05 23:34:26 +00:00
Thomas Eizinger	ce51c40d0d	refactor(rust): move `known_dirs` to `bin-shared` (#9026 ) The `known_dirs` module is used across the headless-client and the GUI client. It should live in `bin-shared` where all the other cross-platform modules are. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2025-05-05 22:45:53 +00:00
Thomas Eizinger	80335676b1	refactor(rust): move `uptime` to `bin-shared` (#9027 ) The `uptime` module from `firezone-headless-client` is also used in the GUI client. In order to decouple this dependency, we move the module to `bin-shared`, next to the other cross-plaform modules.	2025-05-05 12:28:26 +00:00
Thomas Eizinger	6114bb274f	chore(rust): make most of the Rust code compile on MacOS (#8924 ) When working on the Rust code of Firezone from a MacOS computer, it is useful to have pretty much all of the code at least compile to ensure detect problems early. Eventually, once we target features like a headless MacOS client, some of these stubs will actually be filled in an be functional.	2025-04-29 11:20:09 +00:00
Thomas Eizinger	93036734ae	build(rust): move our own `windows` dependency to `0.61.0` (#8730 ) Version `0.61.0` is what most of our dependencies bring in, so depending on that allows us to unify the dependency tree here.	2025-04-22 02:35:28 +00:00
Thomas Eizinger	84a2c275ca	build(rust): upgrade to Rust 1.85 and Edition 2024 (#8240 ) Updates our codebase to the 2024 Edition. For highlights on what changes, see the following blogpost: https://blog.rust-lang.org/2025/02/20/Rust-1.85.0.html	2025-03-19 02:58:55 +00:00
Thomas Eizinger	7af4b91ac5	fix(gui-client): call `wintun::Session::shutdown` on drop (#8464 ) The bugfix we attempted in #8156 turned out wrong. Reading the source-code, we have to call `Session::shutdown` in order to actually cancel the `Session::receive_blocking` call. Not doing so means we run into the timeout when discarding the `Tun` device because the recv-thread is stuck in `Session::receive_blocking`. Fixes: #8395	2025-03-17 12:58:03 +00:00
Thomas Eizinger	2fe5c00c64	fix(windows): break from retry loop if we sent the packet (#8271 ) Regression introduced in #8268.	2025-02-26 06:10:02 +00:00
Thomas Eizinger	96170be082	fix(gui-client): mitigate deadlock when shutting down TUN device (#8268 ) In #8159, we introduced a regression that could lead to a deadlock when shutting down the TUN device. Whilst we did close the channel prior to awaiting the thread to exit, we failed to notice that _another_ instance of the sender could be alive as part of an internally stored "sending permit" with the `PollSender` in case another packet is queued for sending. We need to explicitly call `abort_send` to free that. Judging from the comment and a prior bug, this shutdown logic has been buggy before. To further avoid this deadlock, we introduce two changes: - The worker threads only receive a `Weak` reference to the `wintun::Session` - We move all device-related state into a dedicated `TunState` struct that we can drop prior to joining the threads The combination of these features means that all strong references to channels and the session are definitely dropped without having to wait for anything. To provide a clean and synchronous shutdown, we wait for at most 5s on the worker-threads. If they don't exit until then, we log a warning and exit anyway. This should greatly reduce the risk of future bugs here because the session (and thus the WinTUN device) gets shutdown in any case and so at worst, we have a few zombie threads around. Resolves: #8265	2025-02-26 00:46:12 +00:00
Thomas Eizinger	33c707dbf6	feat(windows): introduce dedicated "TUN send" thread (#8159 ) Same as done for unix-based operation systems in #8117, we introduce a dedicated "TUN send" thread for Windows in this PR. Not only does this move the syscalls and copying of sending packets away from `connlib`'s main thread but it also establishes backpressure between those threads properly. WinTUN does not have any ability to signal that it has space in its send buffer. If it fails to allocate a packet for sending, it will return `ERROR_BUFFER_OVERFLOW` [0]. We now handle this case gracefully by suspending the send thread for 10ms and then try again. This isn't a great way of establishing back-pressure but at least we don't have any packet loss. To test this, I temporarily lowered the ring buffer size and ran a speed test. In that, I could confirm that `ERROR_BUFFER_OVERFLOW` is indeed emitted and handled as intended. [0]: https://git.zx2c4.com/wintun/tree/api/session.c#n267	2025-02-17 20:33:45 +00:00
Thomas Eizinger	af9fc49b18	fix(windows): don't double shutdown session (#8156 ) The `wintun` crate will already shutdown the session for us when the last instance of `Session` gets dropped. Shutting down the session prior to that already results in an attempt to close an adapter that is no longer present, causing WinTUN to log (unactionable) errors.	2025-02-17 05:38:11 +00:00
Thomas Eizinger	10ba02e341	fix(connlib): split TUN send & recv into separate threads (#8117 ) We appear to have caused a pretty big performance regression (~40%) in `037a2e64b6` (identified through `git-bisect`). Specifically, the regression appears to have been caused by [`aef411a` (#7605)](`aef411abf5`). Weirdly enough, undoing just that on top of `main` doesn't fix the regression. My hypothesis is that using the same file descriptor for read AND write interests on the same runtime causes issues because those interests are occasionally cleared (i.e. on false-positive wake-ups). In this PR, we spawn a dedicated thread each for the sending and receiving operations of the TUN device. On unix-based systems, a TUN device is just a file descriptor and can therefore simply be copied and read & written to from different threads. Most importantly, we only construct the `AsyncFd` _within_ the newly spawned thread and runtime because constructing an `AsyncFd` implicitly registers with the runtime active on the current thread. As a nice benefit, this allows us to get rid of a `future::select`. Those are always kind of nasty because they cancel the future that wasn't ready. My original intuition was that we drop packets due to cancelled futures there but that could not be confirmed in experiments.	2025-02-14 05:32:51 +00:00
Thomas Eizinger	7dcda1dc74	fix(windows): silence `0x800706D9` when DNS deactivation fails (#8085 ) The error code we see here means "There are no more endpoints available from the endpoint mapper." This has something to do with Windows' internal RPC communication between components. DNS deactivation is on a best-effort basis and it appears that everything else is working just fine, despite this error. It appears to happen when we shut down our own service, so perhaps it is just a race condition.	2025-02-11 05:38:37 +00:00
Thomas Eizinger	d7ebd07183	fix(linux): check for correct sign of netlink error code (#8087 ) We've previously tried to handle the "No such process" error from netlink when it tries to remove a route that no longer exists. What we failed to do is use the correct sign for the error code as netlink errors are always negative, yet when printed, the are positive numbers.	2025-02-11 04:47:51 +00:00
Thomas Eizinger	b193dd91f6	fix(windows): don't warn on disabled IP stack (#8086 ) When an IP stack is programmatically disabled, such as with: > reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip6\Parameters" /v DisabledComponents /t REG_DWORD /d 255 /f Attempting to interact with this IP stack will yield "NOT_FOUND" errors. These aren't worth reporting to Sentry because there isn't much we can do about it.	2025-02-11 04:37:17 +00:00
Thomas Eizinger	436b502eab	fix(windows): handle disabled IPv6 stack gracefully (#8083 ) Fixes: #8049.	2025-02-11 03:21:32 +00:00
Thomas Eizinger	f48df7585c	refactor(windows): de-duplicate Win32 error codes (#8071 ) The errors returned from Win32 API calls are currently duplicated in several places. To makes it error-prone to handle them correctly. With this PR, we de-duplicate this and add proper docs and links for further reading to them. We also fix a case where we would currently fail to set IP addresses for our tunnel interface if the IP stack is not supported.	2025-02-10 23:33:06 +00:00
Thomas Eizinger	d2e9b09874	refactor(rust): stringify errors early (#8033 ) As it turns out, the effort in #7104 was not a good idea. By logging errors as values, most of our Sentry reports all have the same title and thus cannot be differentiated from within the overview at all. To fix this, we stringify errors with all their sources whenever they got logged. This ensures log messages are unique and all Sentry issues will have a useful title.	2025-02-06 14:18:35 +00:00
Thomas Eizinger	90fb9b8478	refactor(connlib): use Win32 APIs instead of `netsh` to set IPs (#8003 ) This should be faster and hopefully more reliable. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2025-02-03 06:24:28 +00:00
Thomas Eizinger	8bd8098cab	refactor(connlib): don't re-implement waker for TUN thread (#7944 ) Within `connlib` - on UNIX platforms - we have dedicated threads that read from and write to the TUN device. These threads are connected with `connlib`'s main thread via bounded channels: one in each direction. When these channels are full, `connlib`'s main thread will suspend and not read any network packets from the sockets in order to maintain back-pressure. Reading more packets from the socket would mean most likely sending more packets out the TUN device. When debugging #7763, it became apparent that _something_ must be wrong with these threads and that somehow, we either consider them as full or aren't emptying them and as a result, we don't read _any_ network packets from our sockets. To maintain back-pressure here, we currently use our own `AtomicWaker` construct that is shared with the TUN thread(s). This is unnecessary. We can also directly convert the `flume::Sender` into a `flume::async::SendSink` and therefore directly access a `poll` interface.	2025-01-29 15:48:48 +00:00
Thomas Eizinger	416e320319	revert: bump `netlink-packet-route` and `rtnetlink` (#7899 ) Reverts: #6694 Related: https://github.com/rust-netlink/netlink-packet-route/issues/140	2025-01-28 06:29:07 +00:00
dependabot[bot]	0779757646	build(deps): `netlink-packet-route` and `rtnetlink` (#6694 ) `rtnetlink` has some breaking changes in their latest version. To avoid waiting until they actually cut a release, we temporarily depend on their `main` branch. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2025-01-28 05:21:52 +00:00
Thomas Eizinger	46cdbbcc23	fix(connlib): use a buffer pool for the GSO queue (#7749 ) Within `connlib`, we read batches of IP packets and process them at once. Each encrypted packet is appended to a buffer shared with other packets of the same length. Once the batch is successfully processed, all of these buffers are written out using GSO to the network. This allows UDP operations to be much more efficient because not every packet has to traverse the entire syscall hierarchy of the operating system. Until now, these buffers got re-allocated on every batch. This is pretty wasteful and leads to a lot of repeated allocations. Measurements show that most of the time, we only have a handful of packets with different segments lengths _per batch_. For example, just booting up the headless-client and running a speedtest showed that only 5 of these buffers are were needed at one time. By introducing a buffer pool, we can reuse these buffers between batches and avoid reallocating them. Related: #7747.	2025-01-13 19:24:52 +00:00
Thomas Eizinger	037a2e64b6	fix(connlib): attempt to detect runtime shutdown within TUN task (#7605 ) Reading and writing to the TUN device within `connlib` happens in a separate thread. The task running within these threads is connected to the rest of `connlib` via channels. When the application shuts down, these threads also need to exit. Currently, we attempt to detect this from within the task when these channels close. It appears that there is a race condition here because we first attempt to read from the TUN device before reading from the channels. We treat read & write errors on the TUN device as non-fatal so we loop around and attempt to read from it again, causing an infinite-loop and log spam. To fix this, we swap the order in which we evaluate the two concurrent tasks: The first task to be polled is now the channel for outbound packets and only if that one is empty, we attempt to read new packets from the TUN device. This is also better from a backpressure point of view: We should attempt to flush out our local buffers of already processed packets before taking on "new work". As a defense-in-depth strategy, we also attempt to detect the particular error from the tokio runtime when it is being shut down and exit the task. Resolves: #7601. Related: https://github.com/tokio-rs/tokio/issues/7056.	2025-01-05 20:41:24 +00:00
Thomas Eizinger	26824fb3c7	fix(gateway): check if we run with correct permissions (#7565 ) The gateway needs either the `CAP_NET_ADMIN` capability or run as `root` in order to access the TUN device as well as configure routes via `netlink`. Running without either leads to "Permission denied" errors at runtime. It is good to fail early in these kind of situations. By checking for this capability early on during startup, these should no longer surface later. As a bonus, we won't receive (unactionable) Sentry alerts. Resolves: #7559. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-12-29 21:45:56 +00:00
Thomas Eizinger	e7cc0e5eef	fix(linux): don't fail on unsupported IP version (#7583 ) Firezone always attempts to handle IPv4 and IPv6. On Linux systems without an IPv6 stack, attempts to add an IPv6 route may fail with "Not supported (os error 95)". We don't need the IPv6 routes on those systems as we will never receive IPv6 traffic. Therefore, we can safely ignore these errors and not log them.	2024-12-25 11:09:22 +00:00
Thomas Eizinger	1b04b0eb2b	fix(windows): don't warn on deleting non-existing route (#7507 ) Similarly as Linux (#7502), we don't want to log an error if we cannot delete a route that doesn't exist.	2024-12-13 21:09:09 +00:00
Thomas Eizinger	b5d6c27680	fix(linux): don't print error when removing non-existent route (#7502 ) We are already handling one case where we are trying to remove a route that doesn't exist. `ESRCH` is another variant of this error that manifests as "No such process". According to the Internet, this just means the route doesn't exist so we can bail out early here.	2024-12-13 04:53:22 +00:00
Thomas Eizinger	90cf191a7c	feat(linux): multi-threaded TUN device operations (#7449 ) ## Context At present, we only have a single thread that reads and writes to the TUN device on all platforms. On Linux, it is possible to open the file descriptor of a TUN device multiple times by setting the `IFF_MULTI_QUEUE` option using `ioctl`. Using multi-queue, we can then spawn multiple threads that concurrently read and write to the TUN device. This is critical for achieving a better throughput. ## Solution `IFF_MULTI_QUEUE` is a Linux-only thing and therefore only applies to headless-client, GUI-client on Linux and the Gateway (it may also be possible on Android, I haven't tried). As such, we need to first change our internal abstractions a bit to move the creation of the TUN thread to the `Tun` abstraction itself. For this, we change the interface of `Tun` to the following: - `poll_recv_many`: An API, inspired by tokio's `mpsc::Receiver` where multiple items in a channel can be batch-received. - `poll_send_ready`: Mimics the API of `Sink` to check whether more items can be written. - `send`: Mimics the API of `Sink` to actually send an item. With these APIs in place, we can implement various (performance) improvements for the different platforms. - On Linux, this allows us to spawn multiple threads to read and write from the TUN device and send all packets into the same channel. The `Io` component of `connlib` then uses `poll_recv_many` to read batches of up to 100 packets at once. This ties in well with #7210 because we can then use GSO to send the encrypted packets in single syscalls to the OS. - On Windows, we already have a dedicated recv thread because `WinTun`'s most-convenient API uses blocking IO. As such, we can now also tie into that by batch-receiving from this channel. - In addition to using multiple threads, this API now also uses correct readiness checks on Linux, Darwin and Android to uphold backpressure in case we cannot write to the TUN device. ## Configuration Local testing has shown that 2 threads give the best performance for a local `iperf3` run. I suspect this is because there is only so much traffic that a single application (i.e. `iperf3`) can generate. With more than 2 threads, the throughput actually drops drastically because `connlib`'s main thread is too busy with lock-contention and triggering `Waker`s for the TUN threads (which mostly idle around if there are 4+ of them). I've made it configurable on the Gateway though so we can experiment with this during concurrent speedtests etc. In addition, switching `connlib` to a single-threaded tokio runtime further increased the throughput. I suspect due to less task / context switching. ## Results Local testing with `iperf3` shows some very promising results. We now achieve a throughput of 2+ Gbit/s. ``` Connecting to host 172.20.0.110, port 5201 Reverse mode, remote host 172.20.0.110 is sending [ 5] local 100.80.159.34 port 57040 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 274 MBytes 2.30 Gbits/sec [ 5] 1.00-2.00 sec 279 MBytes 2.34 Gbits/sec [ 5] 2.00-3.00 sec 216 MBytes 1.82 Gbits/sec [ 5] 3.00-4.00 sec 224 MBytes 1.88 Gbits/sec [ 5] 4.00-5.00 sec 234 MBytes 1.96 Gbits/sec [ 5] 5.00-6.00 sec 238 MBytes 2.00 Gbits/sec [ 5] 6.00-7.00 sec 229 MBytes 1.92 Gbits/sec [ 5] 7.00-8.00 sec 222 MBytes 1.86 Gbits/sec [ 5] 8.00-9.00 sec 223 MBytes 1.87 Gbits/sec [ 5] 9.00-10.00 sec 217 MBytes 1.82 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 2.30 GBytes 1.98 Gbits/sec 22247 sender [ 5] 0.00-10.00 sec 2.30 GBytes 1.98 Gbits/sec receiver iperf Done. ``` This is a pretty solid improvement over what is in `main`: ``` Connecting to host 172.20.0.110, port 5201 [ 5] local 100.65.159.3 port 56970 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 90.4 MBytes 758 Mbits/sec 1800 106 KBytes [ 5] 1.00-2.00 sec 93.4 MBytes 783 Mbits/sec 1550 51.6 KBytes [ 5] 2.00-3.00 sec 92.6 MBytes 777 Mbits/sec 1350 76.8 KBytes [ 5] 3.00-4.00 sec 92.9 MBytes 779 Mbits/sec 1800 56.4 KBytes [ 5] 4.00-5.00 sec 93.4 MBytes 783 Mbits/sec 1650 69.6 KBytes [ 5] 5.00-6.00 sec 90.6 MBytes 760 Mbits/sec 1500 73.2 KBytes [ 5] 6.00-7.00 sec 87.6 MBytes 735 Mbits/sec 1400 76.8 KBytes [ 5] 7.00-8.00 sec 92.6 MBytes 777 Mbits/sec 1600 82.7 KBytes [ 5] 8.00-9.00 sec 91.1 MBytes 764 Mbits/sec 1500 70.8 KBytes [ 5] 9.00-10.00 sec 92.0 MBytes 771 Mbits/sec 1550 85.1 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 917 MBytes 769 Mbits/sec 15700 sender [ 5] 0.00-10.00 sec 916 MBytes 768 Mbits/sec receiver iperf Done. ```	2024-12-05 00:18:20 +00:00
Thomas Eizinger	4f92a0d7ca	refactor(gui-client): tidy up GUI controller code (#7444 ) This PR intends to be a pure refactoring, i.e. no behaviour change. It simplifies a few aspects of the GUI controller event-loop by getting rid of the `select!` macro. We also remove some indirection of the `gui_controller::Builder`.	2024-12-02 20:07:44 +00:00
Thomas Eizinger	0a6554122a	feat(connlib): utilise GSO for UDP sockets (#7210 ) ## Context At present, `connlib` sends UDP packets one at a time. Sending a packet requires us to make a syscall which is quite expensive. Under load, i.e. during a speedtest, syscalls account for over 50% of our CPU time [0]. In order to improve this situation, we need to somehow make use of GSO (generic segmentation offload). With GSO, we can send multiple packets to the same destination in a single syscall. The tricky question here is, how can we achieve having multiple UDP packets ready at once so we can send them in a single syscall? Our TUN interface only feeds us packets one at a time and `connlib`'s state machine is single-threaded. Additionally, we currently only have a single `EncryptBuffer` in which the to-be-sent datagram sits. ## 1. Stack-allocating encrypted IP packets As a first step, we get rid of the single `EncryptBuffer` and instead stack-allocate each encrypted IP packet. Due to our small MTU, these packets are only around 1300 bytes. Stack-allocating that requires a few memcpy's but those are in the single-digit % range in the terms of CPU time performance hit. That is nothing compared to how much time we are spending on UDP syscalls. With the `EncryptBuffer` out the way, we can now "freely" move around the `EncryptedPacket` structs and - technically - we can have multiple of them at the same time. ## 2. Implementing GSO The GSO interface allows you to pass multiple packets of the same length and for the same destination in a single syscall, meaning we cannot just batch-up arbitrary UDP packets. Counterintuitively, making use of GSO requires us to do more copying: In particular, we change the interface of `Io` such that "sending" a packet performs essentially a lookup of a `BytesMut`-buffer by destination and packet length and appends the payload to that packet. ## 3. Batch-read IP packets In order to actually perform GSO, we need to process more than a single IP packet in one event-loop tick. We achieve this by batch-reading up to 50 IP packets from the mpsc-channel that connects `connlib`'s main event-loop with the dedicated thread that reads and writes to the TUN device. These reads and writes happen concurrently to `connlib`'s packet processing. Thus, it is likely that by the time `connlib` is ready to process another IP packet, multiple have been read from the device and are sitting in the channel. Batch-processing these IP packets means that the buffers in our `GsoQueue` are more likely to contain more than a single datagram. Imagine you are running a file upload. The OS will send many packets to the same destination IP and likely max MTU to the TUN device. It is likely, that we read 10-20 of these packets in one batch (i.e. within a single "tick" of the event-loop). All packets will be appended to the same buffer in the `GsoQueue` and on the next event-loop tick, they will all be flushed out in a single syscall. ## Results Overall, this results in a significant reduction of syscalls for sending UDP message. In [1], we spend only a total of 16% of our CPU time in `udpv6_sendmsg` whereas in [0] (main), we spent a total of 34%. Do note that these numbers are relative to the total CPU time spent per program run and thus can't be compared directly (i.e. you cannot just do 34 - 16 and say we now spend 18% less time sending UDP packets). Nevertheless, this appears to be a great improvement. In terms of throughput, we achieve a ~60% improvement in our benchmark suite. That one is running on localhost though so it might not necessarily be reflect like that in a real network. [0]: https://share.firefox.dev/4hvoPju [1]: https://share.firefox.dev/4frhCPv	2024-12-02 01:09:44 +00:00
Thomas Eizinger	c6e7e6192e	build(rust): bump Rust to 1.83 (#7409 ) Rust 1.83 comes with a bunch of new lints for elidible lifetimes. Those also trigger in the generated code of `derivative`. That crate is actually unmaintained so we replace our usages of it with `derive_more`.	2024-11-29 01:04:06 +00:00
Thomas Eizinger	24f7ba530d	refactor(gui-client): add more context to connection failures (#7364 ) Adding more context to these errors makes it easier to identify, which of the operations fails. In addition, we remove some usages of the "log and return" anti-pattern to avoid duplicate reports of the same issue.	2024-11-18 18:16:16 +00:00
Thomas Eizinger	48ba2869a8	chore(rust): ban the use of `.unwrap` except in tests (#7319 ) Using the clippy lint `unwrap_used`, we can automatically lint against all uses of `.unwrap()` on `Result` and `Option`. This turns up quite a few results actually. In most cases, they are invariants that can't actually be hit. For these, we change them to `Option`. In other cases, they can actually be hit. For example, if the user supplies an invalid log-filter. Activating this lint ensures the compiler will yell at us every time we use `.unwrap` to double-check whether we do indeed want to panic here. Resolves: #7292.	2024-11-13 03:59:22 +00:00
Thomas Eizinger	ad4eea29ff	chore(rust): don't panic in fallible functions (#7298 ) "Just let it crash" is terrible advice for software that is shipped to end users. Where possible, we should use proper error handling and only fail the current function / task that is active, e.g. drop a particular packet instead of failing all of connlib. We more or less already do that. Activating the clippy lint `unwrap_in_result` surfaced a few more places where we panic despite being in a function that is fallible already. These cases can easily be converted to not panic and return an error instead.	2024-11-11 23:55:23 +00:00
Thomas Eizinger	e261cb3c27	chore: remove `git_version!` (#7270 ) Reading the Git version requires the entire Git repository to be present, including all tags. The tags are only created _after_ the artifact is being built, when we publish the release. Therefore, these tags are never included in the actual released binary. For Sentry, we use the `CARGO_PKG_VERSION` variable instead. This doesn't tell us whether somebody built a client from source and then used it so there could be some confusion in Sentry events. It is quite unlikely that this happens though so for the majority of Sentry alerts, this will give us the correct version. For the Android client, we also depend on the `GITHUB_SHA` env variable at compile-time. We do the same thing for the GUI client here. Resolves: #6925.	2024-11-07 22:56:17 +00:00
Thomas Eizinger	78ebad13ab	chore(rust): log more errors as `tracing::Value`s (#7208 ) Logging these as structured values gives us a better stacktrace in Sentry (assuming the errors themselves make proper use of defining an error-chain).	2024-11-05 14:36:47 +00:00

1 2

82 Commits