mirror of
https://github.com/outbackdingo/firezone.git
synced 2026-01-27 18:18:55 +00:00
Refs #5441, but without a reliable way to replicate that issue, I'm not sure if this will completely fix it. Before this PR, a deadlock can happen between 2 threads, call them "main thread" and "worker thread". The deadlock is more likely if more traffic is flowing through the tunnel. # Test results I ran a build from this PR inside the resource-constrained VM and it's likely the deadlock could have triggered there, since the packet channel had 0 capacity (it was full) when we reached `Tun::drop`: ```jsonl {"time":"2024-06-26T22:43:33.2398441Z","target":"firezone_headless_client::ipc_service","logging.googleapis.com/sourceLocation":{"file":"headless-client\\src\\ipc_service.rs","line":"304"},"severity":"INFO","gitVersion":"e591bb9","logFilter":"\"str0m=warn,info\""} .. {"time":"2024-06-26T22:45:42.9035226Z","target":"firezone_tunnel::device_channel::tun_windows","logging.googleapis.com/sourceLocation":{"file":"connlib\\tunnel\\src\\device_channel\\tun_windows.rs","line":"45"},"severity":"INFO","channelCapacity":0,"message":"Shutting down packet channel..."} {"time":"2024-06-26T22:45:42.9035467Z","target":"firezone_tunnel::device_channel::tun_windows","logging.googleapis.com/sourceLocation":{"file":"connlib\\tunnel\\src\\device_channel\\tun_windows.rs","line":"274"},"severity":"INFO","message":"recv_task exiting gracefully"} {"time":"2024-06-26T22:45:43.4978015Z","target":"connlib_client_shared","logging.googleapis.com/sourceLocation":{"file":"connlib\\clients\\shared\\src\\lib.rs","line":"150"},"severity":"INFO","message":"connlib exited gracefully"} ``` I followed these steps: - Run Firezone and sign in - Start a speed test using Cloudflare - During the download phase, quit the GUI I did the same test with0fac698(`main`) and got the "All pipe instances are busy" error dialog 3 out of 5 times. # Details The deadlock will happen in this scenario: - The main thread enters `Tun::drop` here0fac698dfc/rust/connlib/tunnel/src/device_channel/tun_windows.rs (L44)- The worker thread is waiting for space in the packet channel (`packet_tx` and `packet_rx`) here0fac698dfc/rust/connlib/tunnel/src/device_channel/tun_windows.rs (L249)- The main thread tells wintun to shut down. If the worker was on line 247 waiting on wintun, this would unblock it, but the worker is not on line 247.0fac698dfc/rust/connlib/tunnel/src/device_channel/tun_windows.rs (L45)- The main thread waits to join the worker thread0fac698dfc/rust/connlib/tunnel/src/device_channel/tun_windows.rs (L52)The threads are now deadlocked. The main thread is waiting for the worker thread to exit, and the worker thread is waiting for the main thread to either call `poll_recv`, which would cause `blocking_send` to return, or for the main thread to complete `Tun::drop`, which would cause Rust to drop `packet_rx`, which would cause `blocking_send` to return an error. This PR makes 2 changes to prevent this deadlock. Each change alone should work, but for defense-in-depth we make both changes: 1. When the main thread starts `Tun::drop`, we `close` the packet channel, which would unblock any thread waiting on `Sender::blocking_send` 2. We use `Sender::try_send` instead of `Sender::blocking_send`. If the main thread can't consume packets fast enough, we're going to drop them anyway, because the ring buffer in wintun will eventually fill up. So dropping them here isn't much different from dropping them anywhere else, and this keeps the worker thread from locking up.
Rust development guide
Firezone uses Rust for all data plane components. This directory contains the Linux and Windows clients, and low-level networking implementations related to STUN/TURN.
We target the last stable release of Rust using rust-toolchain.toml.
If you are using rustup, that is automatically handled for you.
Otherwise, ensure you have the latest stable version of Rust installed.
Reading Client logs
The Client logs are written as JSONL for machine-readability.
To make them more human-friendly, pipe them through jq like this:
cd path/to/logs # e.g. `$HOME/.cache/dev.firezone.client/data/logs` on Linux
cat *.log | jq -r '"\(.time) \(.severity) \(.message)"'
Resulting in, e.g.
2024-04-01T18:25:47.237661392Z INFO started log
2024-04-01T18:25:47.238193266Z INFO GIT_VERSION = 1.0.0-pre.11-35-gcc0d43531
2024-04-01T18:25:48.295243016Z INFO No token / actor_name on disk, starting in signed-out state
2024-04-01T18:25:48.295360641Z INFO null