Files
firezone/rust
Thomas Eizinger 194eebd164 fix(connlib): de-prioritise timeout handling (#6077)
`connlib`'s event loop performs work in a very particular order:

1. Local buffers like IP, UDP and DNS packets are emptied.
2. Time-sensitive tasks, if any, are performed.
3. New UDP packets are processed.
4. New IP packets (from the TUN device) are processed.

This priority ensures we don't accept more work (i.e. new packets) until
we have finished processing existing work. As a result, we can keep
local buffers small and processing latencies low.

I am not completely confident on the issue of #6067 but if the busy-loop
originates from a bad timer, then the above priority means we never get
to the part where we read new UDP or IP packets and components such a
`PhoenixChannel` - which operate outside of `connlib'`s event loop -
don't get any CPU time.

A naive fix for this problem is to just de-prioritise the polling of the
timer within `Io::poll`. I say naive because without additional changes,
this could delay the processing of time-sensitive tasks on a very busy
client / gateway where packets are constantly arriving and thus we
never[^1] reach the part where the timer gets polled.

To fix this, we make two distinct changes:

1. We pro-actively break from `connlib'`s event loop every 5000
iterations. This ensures that even on a very busy system, other
components like the `PhoenixChannel` get a chance to do _some_ work once
in a while.
2. In case we force-yield from the event loop, we call `handle_timeout`
and immediately schedule a new wake-up. This ensures time does advance
in regular intervals as well and we don't get wrongly suspended by the
runtime.

These changes don't prevent any timer-loops by themselves. With a
timer-loop, we still busy-loop for 5000 iterations and thus
unnecessarily burn through some CPU cycles. The important bit however is
that we stay operational and can accept packets and portal messages. Any
of them might change the state such that the timer value changes, thus
allowing `connlib` to self-heal from this loop.

Fixes: #6067.

[^1]: This is an assumption based on the possible control flow. In
practise, I believe that reading from the sockets or the TUN device is a
much slower operation than processing the packets. Thus, we should
eventually hit the the timer path too.
2024-07-29 22:16:10 +00:00
..
2023-05-10 07:58:32 -07:00

Rust development guide

Firezone uses Rust for all data plane components. This directory contains the Linux and Windows clients, and low-level networking implementations related to STUN/TURN.

We target the last stable release of Rust using rust-toolchain.toml. If you are using rustup, that is automatically handled for you. Otherwise, ensure you have the latest stable version of Rust installed.

Reading Client logs

The Client logs are written as JSONL for machine-readability.

To make them more human-friendly, pipe them through jq like this:

cd path/to/logs  # e.g. `$HOME/.cache/dev.firezone.client/data/logs` on Linux
cat *.log | jq -r '"\(.time) \(.severity) \(.message)"'

Resulting in, e.g.

2024-04-01T18:25:47.237661392Z INFO started log
2024-04-01T18:25:47.238193266Z INFO GIT_VERSION = 1.0.0-pre.11-35-gcc0d43531
2024-04-01T18:25:48.295243016Z INFO No token / actor_name on disk, starting in signed-out state
2024-04-01T18:25:48.295360641Z INFO null