Files
firezone/rust
Thomas Eizinger d4e9384a08 fix(connlib): don't add new relays after nomination (#6876)
When relays reboot or get redeployed, the portal sends us new relays to
use and or relays we should discontinue using. To be more efficient with
battery and network usage, `connlib` only ever samples a single relay
out of all existing ones for a particular connection.

In case of a network topology where we need to use relays, there are
situations we can end up in:

- The client connects to the gateway's relay, i.e. to the port the
gateway allocated on the relay.
- The gateway connects to the client's relay, i.e to the port the client
allocated on the relay.

When we detect that a relay is down, the party that allocated the port
will now immediately (once #6666 is merged). The other party needs to
wait until it receives the invalidated candidates from its peer.
Invalidating that candidate will also invalidate the currently nominated
socket and fail the connection. In theory at least. That only works if
there are no other candidates available to try.

This is where this patch becomes important. Say we have the following
setup:

- Client samples relay A.
- Gateway samples relay B.
- The nominated candidate pair is "client server-reflexive <=> relay B",
i.e. the client talks to the allocated port on the gateway.

Next:

1. Client and portal get network-partitioned.
2. Relay B disappears.
3. Relay C appears.
4. Relay A reboots.
5. Client reconnects.

At this point, the client is told by the portal to use relays A & C.
Note that relay A rebooted and thus the allocation previously present on
the client is no longer valid. With #6666, we will detect this by
comparing credentials & IPs. The gateway is being told about the same
relays and as part of that, tests that relay B is still there. It learns
that it isn't, invalidates the candidates which fails the connection to
the client (but only locally!).

Meanwhile, as part of the regular `init` procedure, the client made a
new allocation with relays A & C. Because it had previously selected
relay A for the connection with the gateway, the new candidates are
added to the agent, forming new pairs. The gateway has already given up
on this connection however so it won't ever answer these STUN requests.

Concurrently, the gateway's invalidated candidates arrive the client.
They however don't fail the connection because the client is probing the
newly added candidates. This creates a state mismatch between the client
and gateway that is only resolved after the candidates start timing out,
adding an additional delay during which the connection isn't working.

With this PR, we prevent this from happening by only ever adding new
candidates while we are still in the nomination process of a socket. In
theory, there exists a race condition in which we nominate a relay
candidate first and then miss out on a server-reflexive candidate not
being added. In practice, this won't happen because:

- Our host candidates are always available first.
- We learn server-reflexive candidates already as part of the initial
BINDING, before creating the allocation.
- We learn server-reflexive candidates from all relays, not just the one
that has been assigned.

Related: #6666.
2024-10-02 02:00:03 +00:00
..
2023-05-10 07:58:32 -07:00
2024-09-09 19:47:16 +00:00

Rust development guide

Firezone uses Rust for all data plane components. This directory contains the Linux and Windows clients, and low-level networking implementations related to STUN/TURN.

We target the last stable release of Rust using rust-toolchain.toml. If you are using rustup, that is automatically handled for you. Otherwise, ensure you have the latest stable version of Rust installed.

Reading Client logs

The Client logs are written as JSONL for machine-readability.

To make them more human-friendly, pipe them through jq like this:

cd path/to/logs  # e.g. `$HOME/.cache/dev.firezone.client/data/logs` on Linux
cat *.log | jq -r '"\(.time) \(.severity) \(.message)"'

Resulting in, e.g.

2024-04-01T18:25:47.237661392Z INFO started log
2024-04-01T18:25:47.238193266Z INFO GIT_VERSION = 1.0.0-pre.11-35-gcc0d43531
2024-04-01T18:25:48.295243016Z INFO No token / actor_name on disk, starting in signed-out state
2024-04-01T18:25:48.295360641Z INFO null

Benchmarking on Linux

The recommended way for benchmarking any of the Rust components is Linux' perf utility. For example, to attach to a running application, do:

  1. Ensure the binary you are profiling is compiled with the bench profile.
  2. sudo perf perf record -g --freq 10000 --pid $(pgrep <your-binary>).
  3. Run the speed test or whatever load-inducing task you want to measure.
  4. sudo perf script > profile.perf
  5. Open profiler.firefox.com and load profile.perf

Instead of attaching to a process with --pid, you can also specify the path to executable directly. That is useful if you want to capture perf data for a test or a micro-benchmark.