Files
firezone/website
Thomas Eizinger 0a6554122a feat(connlib): utilise GSO for UDP sockets (#7210)
## Context

At present, `connlib` sends UDP packets one at a time. Sending a packet
requires us to make a syscall which is quite expensive. Under load, i.e.
during a speedtest, syscalls account for over 50% of our CPU time [0].
In order to improve this situation, we need to somehow make use of GSO
(generic segmentation offload). With GSO, we can send multiple packets
to the same destination in a single syscall.

The tricky question here is, how can we achieve having multiple UDP
packets ready at once so we can send them in a single syscall? Our TUN
interface only feeds us packets one at a time and `connlib`'s state
machine is single-threaded. Additionally, we currently only have a
single `EncryptBuffer` in which the to-be-sent datagram sits.

## 1. Stack-allocating encrypted IP packets

As a first step, we get rid of the single `EncryptBuffer` and instead
stack-allocate each encrypted IP packet. Due to our small MTU, these
packets are only around 1300 bytes. Stack-allocating that requires a few
memcpy's but those are in the single-digit % range in the terms of CPU
time performance hit. That is nothing compared to how much time we are
spending on UDP syscalls. With the `EncryptBuffer` out the way, we can
now "freely" move around the `EncryptedPacket` structs and - technically
- we can have multiple of them at the same time.

## 2. Implementing GSO

The GSO interface allows you to pass multiple packets **of the same
length and for the same destination** in a single syscall, meaning we
cannot just batch-up arbitrary UDP packets. Counterintuitively, making
use of GSO requires us to do more copying: In particular, we change the
interface of `Io` such that "sending" a packet performs essentially a
lookup of a `BytesMut`-buffer by destination and packet length and
appends the payload to that packet.

## 3. Batch-read IP packets

In order to actually perform GSO, we need to process more than a single
IP packet in one event-loop tick. We achieve this by batch-reading up to
50 IP packets from the mpsc-channel that connects `connlib`'s main
event-loop with the dedicated thread that reads and writes to the TUN
device. These reads and writes happen concurrently to `connlib`'s packet
processing. Thus, it is likely that by the time `connlib` is ready to
process another IP packet, multiple have been read from the device and
are sitting in the channel. Batch-processing these IP packets means that
the buffers in our `GsoQueue` are more likely to contain more than a
single datagram.

Imagine you are running a file upload. The OS will send many packets to
the same destination IP and likely max MTU to the TUN device. It is
likely, that we read 10-20 of these packets in one batch (i.e. within a
single "tick" of the event-loop). All packets will be appended to the
same buffer in the `GsoQueue` and on the next event-loop tick, they will
all be flushed out in a single syscall.

## Results

Overall, this results in a significant reduction of syscalls for sending
UDP message. In [1], we spend only a total of 16% of our CPU time in
`udpv6_sendmsg` whereas in [0] (main), we spent a total of 34%. Do note
that these numbers are relative to the total CPU time spent per program
run and thus can't be compared directly (i.e. you cannot just do 34 - 16
and say we now spend 18% less time sending UDP packets). Nevertheless,
this appears to be a great improvement.

In terms of throughput, we achieve a ~60% improvement in our benchmark
suite. That one is running on localhost though so it might not
necessarily be reflect like that in a real network.

[0]: https://share.firefox.dev/4hvoPju
[1]: https://share.firefox.dev/4frhCPv
2024-12-02 01:09:44 +00:00
..
2024-11-18 19:29:32 +00:00

This is a Next.js project bootstrapped with create-next-app.

Getting Started

First, install dependencies and populate the timestamps.json file:

pnpm setup

Next, create files .env.local and .env.development.local in this directory.

Put this in .env.local:

NEXT_PUBLIC_MIXPANEL_TOKEN=""
NEXT_PUBLIC_GOOGLE_ANALYTICS_ID=""
NEXT_PUBLIC_LINKEDIN_PARTNER_ID=""
FIREZONE_DEPLOYED_SHA=""

And this in .env.development.local:

# Created by Vercel CLI
EDGE_CONFIG=""
FIREZONE_DEPLOYED_SHA=""
SITE_URL=""
VERCEL_DEEP_CLONE=""

After that, make sure to contact the team for their values.

Then, run the development server:

npm run dev
# or
yarn dev
# or
pnpm dev

Open http://localhost:3000 with your browser to see the result.

You can start editing the page by modifying app/page.tsx. The page auto-updates as you edit the file.

Linting

This project uses Prettier to format code and ensure a consistent style. Use the .prettierrc.json in the root of this repo to configure your editor.

Learn More

To learn more about Next.js, take a look at the following resources:

You can check out the Next.js GitHub repository - your feedback and contributions are welcome!

Deploy on Vercel

The easiest way to deploy your Next.js app is to use the Vercel Platform from the creators of Next.js.

Check out our Next.js deployment documentation for more details.