mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 18:18:55 +00:00

Files

Thomas Eizinger 7e0fa50cae fix(connlib): handle silently rebooted / disconnected relays (#6666 )

Our relays are essential for connectivity because they also perform STUN
for us through which we learn our server-reflexive address. Thus, we
must at all times have at least one relay that we can reach in order to
establish a connection.

The portal tracks the connectivity to the relays for us and in case any
of them go down, sends us a `relays_presence` message, meaning we can
stop using that relay and migrate any relayed connections to a new one.
This works well for as long as we are connected to the portal while the
relay is rebooting / going-down. If we are not currently connected to
the portal and a relay we are using reboots, we don't learn about it. At
least if we are actively using it, the connection will fail and further
attempted communication with the relay will time-out and we will stop
using it.

In case we aren't currently using the relay, this gets a bit trickier.
If we aren't using the relay but it rebooted while we were partitioned
from the portal, logging in again might return the same relay to us in
the `init` message, but this time with different credentials.

The first bug that we are fixing in this PR is that we previously
ignored those credentials because we already knew about the relay,
thinking that we can still use our existing credentials. The fix here is
to also compare the credentials and ditch the local state if they
differ.

The second bug identified during fixing the first one is that we need to
pro-actively probe whether all other relays that we know about are
actually still responsive. For that, we issue a `REFRESH` message to
them. If that one times-out or fails otherwise, we will remove that one
from our list of `Allocation`s too.

To fix the 2nd bug, several changes were necessary:

1. We lower the log-level of `Disconnecting from relay` from ERROR to
WARN. Any ERROR emitted during a test-run fails our test-suite which is
what partially motivated this. The test suite builds on the assumption
that ERRORs are fatal and thus should never happen during our tests.
This change surfaces that disconnecting from a relay can indeed happen
during normal operation, which justifies lowering this to WARN. Users
should at the minimum monitor on WARN to be alerted about problems.
2. We reduce the total backoff duration for requests to relays from 60s
to 10s. The current 60s result in total of 8 retries. UDP is unreliable
but it isn't THAT unreliable to justify retrying everything for 60s. We
also use a 10s timeout for ICE, which means these are now aligned to
better match each other. We had to change the max backoff duration
because we only idle-spin for at most 10s in the tests and thus the
current 60s were too long to detect that a relay actually disappeared.
3. We had to shuffle around some function calls to make sure all
intermediary event buffers are emptied at the right point in time to
make the test deterministic.

Fixes: #6648.

2024-10-02 21:14:51 +00:00

.cargo

docs(connlib): add profiling instructions (#6643 )

2024-09-10 14:00:00 +00:00

bin-shared

refactor(connlib): parallelise TUN operations (#6673 )

2024-09-26 03:03:35 +00:00

connlib

fix(connlib): handle silently rebooted / disconnected relays (#6666 )

2024-10-02 21:14:51 +00:00

gateway

refactor(gateway): split proxy IP assignment from authorisation (#6812 )

2024-09-26 23:04:03 +00:00

gui-client

docs: add note on how to rotate client secret for windows code signing (#6900 )

2024-10-02 17:35:40 +00:00

headless-client

feat(clients): use hardware id for device verification (#6857 )

2024-10-02 08:44:26 +00:00

iced-client

chore(rust/gui-client): Iced prototype (#6606 )

2024-09-19 15:31:50 +00:00

ip-packet

refactor(connlib): parallelise TUN operations (#6673 )

2024-09-26 03:03:35 +00:00

logging

chore(rust): use #[expect] instead of #[allow] (#6692 )

2024-09-16 13:51:12 +00:00

phoenix-channel

feat(clients): use hardware id for device verification (#6857 )

2024-10-02 08:44:26 +00:00

relay

build(deps): Bump derive_more from 0.99.18 to 1.0.0 in /rust (#6870 )

2024-09-30 22:25:28 +00:00

socket-factory

chore(rust): use #[expect] instead of #[allow] (#6692 )

2024-09-16 13:51:12 +00:00

telemetry

fix(rust/client): set sentry release version and environment correctly (#6855 )

2024-09-30 16:24:39 +00:00

tests

feat(rust/gui-client): add sentry.io error reporting (#6782 )

2024-09-27 16:34:54 +00:00

tun

refactor(connlib): move Tun implementations out of firezone-tunnel (#5903 )

2024-07-24 01:10:50 +00:00

.dockerignore

refactor(portal): Don't pin session token to user_agent or remote_ip (#2195 )

2023-09-30 07:40:57 -07:00

.gitignore

Implement basic STUN server (#1603 )

2023-05-10 07:58:32 -07:00

Cargo.lock

feat(clients): use hardware id for device verification (#6857 )

2024-10-02 08:44:26 +00:00

Cargo.toml

chore(rust): add tracing-macros dependency (#6866 )

2024-09-30 14:14:36 +00:00

clippy.toml

test(rust): ensure deterministic proptests (#6319 )

2024-08-16 23:15:58 +00:00

docker-compose-dev.yml

chore(docker): local dev docker-compose (#4748 )

2024-05-06 23:43:00 +00:00

docker-init-gateway.sh

fix(gateway): always masquerade for docker-deployed gateways (#6169 )

2024-08-07 03:00:50 +00:00

docker-init-relay.sh

feat(relay): add ec2 metadata discovery (#6617 )

2024-09-12 12:28:55 -06:00

docker-init.sh

fix(gateway): always masquerade for docker-deployed gateways (#6169 )

2024-08-07 03:00:50 +00:00

Dockerfile

build: bump Rust to 1.81.0 (#6616 )

2024-09-09 19:47:16 +00:00

README.md

docs(connlib): add profiling instructions (#6643 )

2024-09-10 14:00:00 +00:00

rust-toolchain.toml

build: bump Rust to 1.81.0 (#6616 )

2024-09-09 19:47:16 +00:00

README.md

Rust development guide

Firezone uses Rust for all data plane components. This directory contains the Linux and Windows clients, and low-level networking implementations related to STUN/TURN.

We target the last stable release of Rust using rust-toolchain.toml. If you are using rustup, that is automatically handled for you. Otherwise, ensure you have the latest stable version of Rust installed.

Reading Client logs

The Client logs are written as JSONL for machine-readability.

To make them more human-friendly, pipe them through jq like this:

cd path/to/logs  # e.g. `$HOME/.cache/dev.firezone.client/data/logs` on Linux
cat *.log | jq -r '"\(.time) \(.severity) \(.message)"'

Resulting in, e.g.

2024-04-01T18:25:47.237661392Z INFO started log
2024-04-01T18:25:47.238193266Z INFO GIT_VERSION = 1.0.0-pre.11-35-gcc0d43531
2024-04-01T18:25:48.295243016Z INFO No token / actor_name on disk, starting in signed-out state
2024-04-01T18:25:48.295360641Z INFO null

Benchmarking on Linux

The recommended way for benchmarking any of the Rust components is Linux' perf utility. For example, to attach to a running application, do:

Ensure the binary you are profiling is compiled with the bench profile.
sudo perf perf record -g --freq 10000 --pid $(pgrep <your-binary>).
Run the speed test or whatever load-inducing task you want to measure.
sudo perf script > profile.perf
Open profiler.firefox.com and load profile.perf

Instead of attaching to a process with --pid, you can also specify the path to executable directly. That is useful if you want to capture perf data for a test or a micro-benchmark.