firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 10:18:54 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	84a2c275ca	build(rust): upgrade to Rust 1.85 and Edition 2024 (#8240 ) Updates our codebase to the 2024 Edition. For highlights on what changes, see the following blogpost: https://blog.rust-lang.org/2025/02/20/Rust-1.85.0.html	2025-03-19 02:58:55 +00:00
Thomas Eizinger	10ba02e341	fix(connlib): split TUN send & recv into separate threads (#8117 ) We appear to have caused a pretty big performance regression (~40%) in `037a2e64b6` (identified through `git-bisect`). Specifically, the regression appears to have been caused by [`aef411a` (#7605)](`aef411abf5`). Weirdly enough, undoing just that on top of `main` doesn't fix the regression. My hypothesis is that using the same file descriptor for read AND write interests on the same runtime causes issues because those interests are occasionally cleared (i.e. on false-positive wake-ups). In this PR, we spawn a dedicated thread each for the sending and receiving operations of the TUN device. On unix-based systems, a TUN device is just a file descriptor and can therefore simply be copied and read & written to from different threads. Most importantly, we only construct the `AsyncFd` _within_ the newly spawned thread and runtime because constructing an `AsyncFd` implicitly registers with the runtime active on the current thread. As a nice benefit, this allows us to get rid of a `future::select`. Those are always kind of nasty because they cancel the future that wasn't ready. My original intuition was that we drop packets due to cancelled futures there but that could not be confirmed in experiments.	2025-02-14 05:32:51 +00:00
Thomas Eizinger	8bd8098cab	refactor(connlib): don't re-implement waker for TUN thread (#7944 ) Within `connlib` - on UNIX platforms - we have dedicated threads that read from and write to the TUN device. These threads are connected with `connlib`'s main thread via bounded channels: one in each direction. When these channels are full, `connlib`'s main thread will suspend and not read any network packets from the sockets in order to maintain back-pressure. Reading more packets from the socket would mean most likely sending more packets out the TUN device. When debugging #7763, it became apparent that _something_ must be wrong with these threads and that somehow, we either consider them as full or aren't emptying them and as a result, we don't read _any_ network packets from our sockets. To maintain back-pressure here, we currently use our own `AtomicWaker` construct that is shared with the TUN thread(s). This is unnecessary. We can also directly convert the `flume::Sender` into a `flume::async::SendSink` and therefore directly access a `poll` interface.	2025-01-29 15:48:48 +00:00
Thomas Eizinger	037a2e64b6	fix(connlib): attempt to detect runtime shutdown within TUN task (#7605 ) Reading and writing to the TUN device within `connlib` happens in a separate thread. The task running within these threads is connected to the rest of `connlib` via channels. When the application shuts down, these threads also need to exit. Currently, we attempt to detect this from within the task when these channels close. It appears that there is a race condition here because we first attempt to read from the TUN device before reading from the channels. We treat read & write errors on the TUN device as non-fatal so we loop around and attempt to read from it again, causing an infinite-loop and log spam. To fix this, we swap the order in which we evaluate the two concurrent tasks: The first task to be polled is now the channel for outbound packets and only if that one is empty, we attempt to read new packets from the TUN device. This is also better from a backpressure point of view: We should attempt to flush out our local buffers of already processed packets before taking on "new work". As a defense-in-depth strategy, we also attempt to detect the particular error from the tokio runtime when it is being shut down and exit the task. Resolves: #7601. Related: https://github.com/tokio-rs/tokio/issues/7056.	2025-01-05 20:41:24 +00:00
Thomas Eizinger	90cf191a7c	feat(linux): multi-threaded TUN device operations (#7449 ) ## Context At present, we only have a single thread that reads and writes to the TUN device on all platforms. On Linux, it is possible to open the file descriptor of a TUN device multiple times by setting the `IFF_MULTI_QUEUE` option using `ioctl`. Using multi-queue, we can then spawn multiple threads that concurrently read and write to the TUN device. This is critical for achieving a better throughput. ## Solution `IFF_MULTI_QUEUE` is a Linux-only thing and therefore only applies to headless-client, GUI-client on Linux and the Gateway (it may also be possible on Android, I haven't tried). As such, we need to first change our internal abstractions a bit to move the creation of the TUN thread to the `Tun` abstraction itself. For this, we change the interface of `Tun` to the following: - `poll_recv_many`: An API, inspired by tokio's `mpsc::Receiver` where multiple items in a channel can be batch-received. - `poll_send_ready`: Mimics the API of `Sink` to check whether more items can be written. - `send`: Mimics the API of `Sink` to actually send an item. With these APIs in place, we can implement various (performance) improvements for the different platforms. - On Linux, this allows us to spawn multiple threads to read and write from the TUN device and send all packets into the same channel. The `Io` component of `connlib` then uses `poll_recv_many` to read batches of up to 100 packets at once. This ties in well with #7210 because we can then use GSO to send the encrypted packets in single syscalls to the OS. - On Windows, we already have a dedicated recv thread because `WinTun`'s most-convenient API uses blocking IO. As such, we can now also tie into that by batch-receiving from this channel. - In addition to using multiple threads, this API now also uses correct readiness checks on Linux, Darwin and Android to uphold backpressure in case we cannot write to the TUN device. ## Configuration Local testing has shown that 2 threads give the best performance for a local `iperf3` run. I suspect this is because there is only so much traffic that a single application (i.e. `iperf3`) can generate. With more than 2 threads, the throughput actually drops drastically because `connlib`'s main thread is too busy with lock-contention and triggering `Waker`s for the TUN threads (which mostly idle around if there are 4+ of them). I've made it configurable on the Gateway though so we can experiment with this during concurrent speedtests etc. In addition, switching `connlib` to a single-threaded tokio runtime further increased the throughput. I suspect due to less task / context switching. ## Results Local testing with `iperf3` shows some very promising results. We now achieve a throughput of 2+ Gbit/s. ``` Connecting to host 172.20.0.110, port 5201 Reverse mode, remote host 172.20.0.110 is sending [ 5] local 100.80.159.34 port 57040 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 274 MBytes 2.30 Gbits/sec [ 5] 1.00-2.00 sec 279 MBytes 2.34 Gbits/sec [ 5] 2.00-3.00 sec 216 MBytes 1.82 Gbits/sec [ 5] 3.00-4.00 sec 224 MBytes 1.88 Gbits/sec [ 5] 4.00-5.00 sec 234 MBytes 1.96 Gbits/sec [ 5] 5.00-6.00 sec 238 MBytes 2.00 Gbits/sec [ 5] 6.00-7.00 sec 229 MBytes 1.92 Gbits/sec [ 5] 7.00-8.00 sec 222 MBytes 1.86 Gbits/sec [ 5] 8.00-9.00 sec 223 MBytes 1.87 Gbits/sec [ 5] 9.00-10.00 sec 217 MBytes 1.82 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 2.30 GBytes 1.98 Gbits/sec 22247 sender [ 5] 0.00-10.00 sec 2.30 GBytes 1.98 Gbits/sec receiver iperf Done. ``` This is a pretty solid improvement over what is in `main`: ``` Connecting to host 172.20.0.110, port 5201 [ 5] local 100.65.159.3 port 56970 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 90.4 MBytes 758 Mbits/sec 1800 106 KBytes [ 5] 1.00-2.00 sec 93.4 MBytes 783 Mbits/sec 1550 51.6 KBytes [ 5] 2.00-3.00 sec 92.6 MBytes 777 Mbits/sec 1350 76.8 KBytes [ 5] 3.00-4.00 sec 92.9 MBytes 779 Mbits/sec 1800 56.4 KBytes [ 5] 4.00-5.00 sec 93.4 MBytes 783 Mbits/sec 1650 69.6 KBytes [ 5] 5.00-6.00 sec 90.6 MBytes 760 Mbits/sec 1500 73.2 KBytes [ 5] 6.00-7.00 sec 87.6 MBytes 735 Mbits/sec 1400 76.8 KBytes [ 5] 7.00-8.00 sec 92.6 MBytes 777 Mbits/sec 1600 82.7 KBytes [ 5] 8.00-9.00 sec 91.1 MBytes 764 Mbits/sec 1500 70.8 KBytes [ 5] 9.00-10.00 sec 92.0 MBytes 771 Mbits/sec 1550 85.1 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 917 MBytes 769 Mbits/sec 15700 sender [ 5] 0.00-10.00 sec 916 MBytes 768 Mbits/sec receiver iperf Done. ```	2024-12-05 00:18:20 +00:00
Thomas Eizinger	dd6b52b236	chore(rust): share edition key via workspace table (#7451 )	2024-12-03 00:28:06 +00:00
Thomas Eizinger	2c26fc9c0e	ci: lint Rust dependencies using `cargo deny` (#7390 ) One of Rust's promises is "if it compiles, it works". However, there are certain situations in which this isn't true. In particular, when using dynamic typing patterns where trait objects are downcast to concrete types, having two versions of the same dependency can silently break things. This happened in #7379 where I forgot to patch a certain Sentry dependency. A similar problem exists with our `tracing-stackdriver` dependency (see #7241). Lastly, duplicate dependencies increase the compile-times of a project, so we should aim for having as few duplicate versions of a particular dependency as possible in our dependency graph. This PR introduces `cargo deny`, a linter for Rust dependencies. In addition to linting for duplicate dependencies, it also enforces that all dependencies are compatible with an allow-list of licenses and it warns when a dependency is referred to from multiple crates without introducing a workspace dependency. Thanks to existing tooling (https://github.com/mainmatter/cargo-autoinherit), transitioning all dependencies to workspace dependencies was quite easy. Resolves: #7241.	2024-11-22 00:17:28 +00:00
Reactor Scram	bfb3250ae2	chore(ci/rust): build and test more packages in Windows (#7036 ) Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-10-15 21:22:27 +00:00
Thomas Eizinger	50d6b865a1	refactor(connlib): move `Tun` implementations out of `firezone-tunnel` (#5903 ) The different implementations of `Tun` are the last platform-specific code within `firezone-tunnel`. By introducing a dedicated crate and a `Tun` trait, we can move this code into (platform-specific) leaf crates: - `connlib-client-android` - `connlib-client-apple` - `firezone-bin-shared` Related: #4473. --------- Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>	2024-07-24 01:10:50 +00:00

9 Commits