mirror of
https://github.com/outbackdingo/firezone.git
synced 2026-01-27 18:18:55 +00:00
Currently, `connlib` is entirely single-threaded. This allows us to reuse a single buffer for processing IP packets and makes reasoning of the packet processing code very simple. Being single-threaded also means we can only make use of a single CPU core and all operations have to be sequential. Analyzing `connlib` using `perf` shows that we spend 26% of our CPU time writing packets to the TUN interface [0]. Because we are single-threaded, `connlib` cannot do anything else during this time. If we could offload the writing of these packets to a different thread, `connlib` could already process the next packet while the current one is writing. Packets that we send to the TUN interface arrived as an encrypted WG packet over UDP and get decrypted into a - currently - shared buffer. Moving the writing to a different thread implies that we have to have more of these buffer that the next packet(s) can be decrypted into. To avoid IP fragmentation, we set the maximum IP MTU to 1280 bytes on the TUN interface. That actually isn't very big and easily fits into a stackframe. The default stack size for threads is 2MB [1]. Instead of creating more buffers and cycling through them, we can also simply stack-allocate our IP packets. This incurs some overhead from copying packets but it is only ~3.5% [2] (This was measured without a separate thread). With stack-allocated packets, almost all lifetime-annotations go away which in itself is already a welcome ergonomics boost. Stack-allocated packets also means we can simply spawn a new thread for the packet processing. This thread is connected with two channel to connlib's main thread. The capacity of 1000 packets will at most consume an additional 3.5 MB of memory which is fine even on our most-constrained devices such as iOS. [0]: https://share.firefox.dev/3z78CzD [1]: https://doc.rust-lang.org/std/thread/#stack-size [2]: https://share.firefox.dev/3Bf4zla Resolves: #6653. Resolves: #5541.