Despite our efforts in #8912, the current implementation still does not do enough to maintain packet ordering across GSO batches. At present, we very aggressively batch packets of the same length together. This however is too eager when we consider packet flows such as the following: ``` 9:03:49.585143 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [.], seq 1:1229, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1228 09:03:49.585151 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [P.], seq 1229:2063, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 834 09:03:49.585157 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [P.], seq 2063:3094, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1031 09:03:49.585187 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [.], seq 3094:4322, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1228 09:03:49.585188 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [P.], seq 4322:5156, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 834 09:03:49.585227 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [.], seq 5156:6384, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1228 09:03:49.585228 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [P.], seq 6384:7612, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1228 09:03:49.585230 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [P.], seq 7612:8249, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 637 09:03:49.585846 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [.], seq 8249:9477, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1228 09:03:49.585851 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [P.], seq 9477:10705, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1228 ``` As we can see here, the remote sends us packet batches of varying lengths: - 1228, 834 - 1031 - 1228, 834 - 1228, 1228, 637 - 1228, 1228 1228 represents a "full" TCP packet so any packet following a full-packet SHOULD be grouped together into a GSO batch. Currently, we are batching all the 1228 packets together and we ignore the fact that there were actually smaller sized packets inbetween those that belong together. To mitigate this, we refactor the `GsoQueue` to remove the `segment_size` from the binning key of our map and instead only group batches by their source, destination and ECN information. Within such a connection, we then create an ordered list of batches. A new batch is started if the length differs or we have previously pushed a packet that isn't of the length of the batch, therefore signalling the end of the batch. The result here looks very promising (this is loading `blog.firezone.dev` via the `lynx` browser from within the headless-client docker container, so going through a Gateway running this PR): |main|this PR| |---|---| ||| Related: #8899
Rust development guide
Firezone uses Rust for all data plane components. This directory contains the Linux and Windows clients, and low-level networking implementations related to STUN/TURN.
We target the last stable release of Rust using rust-toolchain.toml.
If you are using rustup, that is automatically handled for you.
Otherwise, ensure you have the latest stable version of Rust installed.
Reading Client logs
The Client logs are written as JSONL for machine-readability.
To make them more human-friendly, pipe them through jq like this:
cd path/to/logs # e.g. `$HOME/.cache/dev.firezone.client/data/logs` on Linux
cat *.log | jq -r '"\(.time) \(.severity) \(.message)"'
Resulting in, e.g.
2024-04-01T18:25:47.237661392Z INFO started log
2024-04-01T18:25:47.238193266Z INFO GIT_VERSION = 1.0.0-pre.11-35-gcc0d43531
2024-04-01T18:25:48.295243016Z INFO No token / actor_name on disk, starting in signed-out state
2024-04-01T18:25:48.295360641Z INFO null
Benchmarking on Linux
The recommended way for benchmarking any of the Rust components is Linux' perf utility.
For example, to attach to a running application, do:
- Ensure the binary you are profiling is compiled with the
releaseprofile. sudo perf record -g --freq 10000 --pid $(pgrep <your-binary>).- Run the speed test or whatever load-inducing task you want to measure.
sudo perf script > profile.perf- Open profiler.firefox.com and load
profile.perf
Instead of attaching to a process with --pid, you can also specify the path to executable directly.
That is useful if you want to capture perf data for a test or a micro-benchmark.