Files
firezone/rust
Reactor Scram 990f98e60f fix(windows): prevent deadlock when closing wintun (#5571)
Refs #5441, but without a reliable way to replicate that issue, I'm not
sure if this will completely fix it.

Before this PR, a deadlock can happen between 2 threads, call them "main
thread" and "worker thread".
The deadlock is more likely if more traffic is flowing through the
tunnel.

# Test results

I ran a build from this PR inside the resource-constrained VM and it's
likely the deadlock could have triggered there, since the packet channel
had 0 capacity (it was full) when we reached `Tun::drop`:

```jsonl
{"time":"2024-06-26T22:43:33.2398441Z","target":"firezone_headless_client::ipc_service","logging.googleapis.com/sourceLocation":{"file":"headless-client\\src\\ipc_service.rs","line":"304"},"severity":"INFO","gitVersion":"e591bb9","logFilter":"\"str0m=warn,info\""}
..
{"time":"2024-06-26T22:45:42.9035226Z","target":"firezone_tunnel::device_channel::tun_windows","logging.googleapis.com/sourceLocation":{"file":"connlib\\tunnel\\src\\device_channel\\tun_windows.rs","line":"45"},"severity":"INFO","channelCapacity":0,"message":"Shutting down packet channel..."}
{"time":"2024-06-26T22:45:42.9035467Z","target":"firezone_tunnel::device_channel::tun_windows","logging.googleapis.com/sourceLocation":{"file":"connlib\\tunnel\\src\\device_channel\\tun_windows.rs","line":"274"},"severity":"INFO","message":"recv_task exiting gracefully"}
{"time":"2024-06-26T22:45:43.4978015Z","target":"connlib_client_shared","logging.googleapis.com/sourceLocation":{"file":"connlib\\clients\\shared\\src\\lib.rs","line":"150"},"severity":"INFO","message":"connlib exited gracefully"}
```

I followed these steps:
- Run Firezone and sign in
- Start a speed test using Cloudflare
- During the download phase, quit the GUI

I did the same test with 0fac698 (`main`) and got the "All pipe
instances are busy" error dialog 3 out of 5 times.

# Details

The deadlock will happen in this scenario:

- The main thread enters `Tun::drop` here
0fac698dfc/rust/connlib/tunnel/src/device_channel/tun_windows.rs (L44)
- The worker thread is waiting for space in the packet channel
(`packet_tx` and `packet_rx`) here
0fac698dfc/rust/connlib/tunnel/src/device_channel/tun_windows.rs (L249)
- The main thread tells wintun to shut down. If the worker was on line
247 waiting on wintun, this would unblock it, but the worker is not on
line 247.
0fac698dfc/rust/connlib/tunnel/src/device_channel/tun_windows.rs (L45)
- The main thread waits to join the worker thread
0fac698dfc/rust/connlib/tunnel/src/device_channel/tun_windows.rs (L52)

The threads are now deadlocked. The main thread is waiting for the
worker thread to exit, and the worker thread is waiting for the main
thread to either call `poll_recv`, which would cause `blocking_send` to
return, or for the main thread to complete `Tun::drop`, which would
cause Rust to drop `packet_rx`, which would cause `blocking_send` to
return an error.

This PR makes 2 changes to prevent this deadlock. Each change alone
should work, but for defense-in-depth we make both changes:

1. When the main thread starts `Tun::drop`, we `close` the packet
channel, which would unblock any thread waiting on
`Sender::blocking_send`
2. We use `Sender::try_send` instead of `Sender::blocking_send`. If the
main thread can't consume packets fast enough, we're going to drop them
anyway, because the ring buffer in wintun will eventually fill up. So
dropping them here isn't much different from dropping them anywhere
else, and this keeps the worker thread from locking up.
2024-06-26 23:52:20 +00:00
..
2023-05-10 07:58:32 -07:00

Rust development guide

Firezone uses Rust for all data plane components. This directory contains the Linux and Windows clients, and low-level networking implementations related to STUN/TURN.

We target the last stable release of Rust using rust-toolchain.toml. If you are using rustup, that is automatically handled for you. Otherwise, ensure you have the latest stable version of Rust installed.

Reading Client logs

The Client logs are written as JSONL for machine-readability.

To make them more human-friendly, pipe them through jq like this:

cd path/to/logs  # e.g. `$HOME/.cache/dev.firezone.client/data/logs` on Linux
cat *.log | jq -r '"\(.time) \(.severity) \(.message)"'

Resulting in, e.g.

2024-04-01T18:25:47.237661392Z INFO started log
2024-04-01T18:25:47.238193266Z INFO GIT_VERSION = 1.0.0-pre.11-35-gcc0d43531
2024-04-01T18:25:48.295243016Z INFO No token / actor_name on disk, starting in signed-out state
2024-04-01T18:25:48.295360641Z INFO null