We are already handling one case where we are trying to remove a route
that doesn't exist. `ESRCH` is another variant of this error that
manifests as "No such process". According to the Internet, this just
means the route doesn't exist so we can bail out early here.
## Context
At present, we only have a single thread that reads and writes to the
TUN device on all platforms. On Linux, it is possible to open the file
descriptor of a TUN device multiple times by setting the
`IFF_MULTI_QUEUE` option using `ioctl`. Using multi-queue, we can then
spawn multiple threads that concurrently read and write to the TUN
device. This is critical for achieving a better throughput.
## Solution
`IFF_MULTI_QUEUE` is a Linux-only thing and therefore only applies to
headless-client, GUI-client on Linux and the Gateway (it may also be
possible on Android, I haven't tried). As such, we need to first change
our internal abstractions a bit to move the creation of the TUN thread
to the `Tun` abstraction itself. For this, we change the interface of
`Tun` to the following:
- `poll_recv_many`: An API, inspired by tokio's `mpsc::Receiver` where
multiple items in a channel can be batch-received.
- `poll_send_ready`: Mimics the API of `Sink` to check whether more
items can be written.
- `send`: Mimics the API of `Sink` to actually send an item.
With these APIs in place, we can implement various (performance)
improvements for the different platforms.
- On Linux, this allows us to spawn multiple threads to read and write
from the TUN device and send all packets into the same channel. The `Io`
component of `connlib` then uses `poll_recv_many` to read batches of up
to 100 packets at once. This ties in well with #7210 because we can then
use GSO to send the encrypted packets in single syscalls to the OS.
- On Windows, we already have a dedicated recv thread because `WinTun`'s
most-convenient API uses blocking IO. As such, we can now also tie into
that by batch-receiving from this channel.
- In addition to using multiple threads, this API now also uses correct
readiness checks on Linux, Darwin and Android to uphold backpressure in
case we cannot write to the TUN device.
## Configuration
Local testing has shown that 2 threads give the best performance for a
local `iperf3` run. I suspect this is because there is only so much
traffic that a single application (i.e. `iperf3`) can generate. With
more than 2 threads, the throughput actually drops drastically because
`connlib`'s main thread is too busy with lock-contention and triggering
`Waker`s for the TUN threads (which mostly idle around if there are 4+
of them). I've made it configurable on the Gateway though so we can
experiment with this during concurrent speedtests etc.
In addition, switching `connlib` to a single-threaded tokio runtime
further increased the throughput. I suspect due to less task / context
switching.
## Results
Local testing with `iperf3` shows some very promising results. We now
achieve a throughput of 2+ Gbit/s.
```
Connecting to host 172.20.0.110, port 5201
Reverse mode, remote host 172.20.0.110 is sending
[ 5] local 100.80.159.34 port 57040 connected to 172.20.0.110 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 274 MBytes 2.30 Gbits/sec
[ 5] 1.00-2.00 sec 279 MBytes 2.34 Gbits/sec
[ 5] 2.00-3.00 sec 216 MBytes 1.82 Gbits/sec
[ 5] 3.00-4.00 sec 224 MBytes 1.88 Gbits/sec
[ 5] 4.00-5.00 sec 234 MBytes 1.96 Gbits/sec
[ 5] 5.00-6.00 sec 238 MBytes 2.00 Gbits/sec
[ 5] 6.00-7.00 sec 229 MBytes 1.92 Gbits/sec
[ 5] 7.00-8.00 sec 222 MBytes 1.86 Gbits/sec
[ 5] 8.00-9.00 sec 223 MBytes 1.87 Gbits/sec
[ 5] 9.00-10.00 sec 217 MBytes 1.82 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 2.30 GBytes 1.98 Gbits/sec 22247 sender
[ 5] 0.00-10.00 sec 2.30 GBytes 1.98 Gbits/sec receiver
iperf Done.
```
This is a pretty solid improvement over what is in `main`:
```
Connecting to host 172.20.0.110, port 5201
[ 5] local 100.65.159.3 port 56970 connected to 172.20.0.110 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 90.4 MBytes 758 Mbits/sec 1800 106 KBytes
[ 5] 1.00-2.00 sec 93.4 MBytes 783 Mbits/sec 1550 51.6 KBytes
[ 5] 2.00-3.00 sec 92.6 MBytes 777 Mbits/sec 1350 76.8 KBytes
[ 5] 3.00-4.00 sec 92.9 MBytes 779 Mbits/sec 1800 56.4 KBytes
[ 5] 4.00-5.00 sec 93.4 MBytes 783 Mbits/sec 1650 69.6 KBytes
[ 5] 5.00-6.00 sec 90.6 MBytes 760 Mbits/sec 1500 73.2 KBytes
[ 5] 6.00-7.00 sec 87.6 MBytes 735 Mbits/sec 1400 76.8 KBytes
[ 5] 7.00-8.00 sec 92.6 MBytes 777 Mbits/sec 1600 82.7 KBytes
[ 5] 8.00-9.00 sec 91.1 MBytes 764 Mbits/sec 1500 70.8 KBytes
[ 5] 9.00-10.00 sec 92.0 MBytes 771 Mbits/sec 1550 85.1 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 917 MBytes 769 Mbits/sec 15700 sender
[ 5] 0.00-10.00 sec 916 MBytes 768 Mbits/sec receiver
iperf Done.
```
This PR intends to be a pure refactoring, i.e. no behaviour change. It
simplifies a few aspects of the GUI controller event-loop by getting rid
of the `select!` macro. We also remove some indirection of the
`gui_controller::Builder`.
## Context
At present, `connlib` sends UDP packets one at a time. Sending a packet
requires us to make a syscall which is quite expensive. Under load, i.e.
during a speedtest, syscalls account for over 50% of our CPU time [0].
In order to improve this situation, we need to somehow make use of GSO
(generic segmentation offload). With GSO, we can send multiple packets
to the same destination in a single syscall.
The tricky question here is, how can we achieve having multiple UDP
packets ready at once so we can send them in a single syscall? Our TUN
interface only feeds us packets one at a time and `connlib`'s state
machine is single-threaded. Additionally, we currently only have a
single `EncryptBuffer` in which the to-be-sent datagram sits.
## 1. Stack-allocating encrypted IP packets
As a first step, we get rid of the single `EncryptBuffer` and instead
stack-allocate each encrypted IP packet. Due to our small MTU, these
packets are only around 1300 bytes. Stack-allocating that requires a few
memcpy's but those are in the single-digit % range in the terms of CPU
time performance hit. That is nothing compared to how much time we are
spending on UDP syscalls. With the `EncryptBuffer` out the way, we can
now "freely" move around the `EncryptedPacket` structs and - technically
- we can have multiple of them at the same time.
## 2. Implementing GSO
The GSO interface allows you to pass multiple packets **of the same
length and for the same destination** in a single syscall, meaning we
cannot just batch-up arbitrary UDP packets. Counterintuitively, making
use of GSO requires us to do more copying: In particular, we change the
interface of `Io` such that "sending" a packet performs essentially a
lookup of a `BytesMut`-buffer by destination and packet length and
appends the payload to that packet.
## 3. Batch-read IP packets
In order to actually perform GSO, we need to process more than a single
IP packet in one event-loop tick. We achieve this by batch-reading up to
50 IP packets from the mpsc-channel that connects `connlib`'s main
event-loop with the dedicated thread that reads and writes to the TUN
device. These reads and writes happen concurrently to `connlib`'s packet
processing. Thus, it is likely that by the time `connlib` is ready to
process another IP packet, multiple have been read from the device and
are sitting in the channel. Batch-processing these IP packets means that
the buffers in our `GsoQueue` are more likely to contain more than a
single datagram.
Imagine you are running a file upload. The OS will send many packets to
the same destination IP and likely max MTU to the TUN device. It is
likely, that we read 10-20 of these packets in one batch (i.e. within a
single "tick" of the event-loop). All packets will be appended to the
same buffer in the `GsoQueue` and on the next event-loop tick, they will
all be flushed out in a single syscall.
## Results
Overall, this results in a significant reduction of syscalls for sending
UDP message. In [1], we spend only a total of 16% of our CPU time in
`udpv6_sendmsg` whereas in [0] (main), we spent a total of 34%. Do note
that these numbers are relative to the total CPU time spent per program
run and thus can't be compared directly (i.e. you cannot just do 34 - 16
and say we now spend 18% less time sending UDP packets). Nevertheless,
this appears to be a great improvement.
In terms of throughput, we achieve a ~60% improvement in our benchmark
suite. That one is running on localhost though so it might not
necessarily be reflect like that in a real network.
[0]: https://share.firefox.dev/4hvoPju
[1]: https://share.firefox.dev/4frhCPv
Rust 1.83 comes with a bunch of new lints for elidible lifetimes. Those
also trigger in the generated code of `derivative`. That crate is
actually unmaintained so we replace our usages of it with `derive_more`.
One of Rust's promises is "if it compiles, it works". However, there are
certain situations in which this isn't true. In particular, when using
dynamic typing patterns where trait objects are downcast to concrete
types, having two versions of the same dependency can silently break
things.
This happened in #7379 where I forgot to patch a certain Sentry
dependency. A similar problem exists with our `tracing-stackdriver`
dependency (see #7241).
Lastly, duplicate dependencies increase the compile-times of a project,
so we should aim for having as few duplicate versions of a particular
dependency as possible in our dependency graph.
This PR introduces `cargo deny`, a linter for Rust dependencies. In
addition to linting for duplicate dependencies, it also enforces that
all dependencies are compatible with an allow-list of licenses and it
warns when a dependency is referred to from multiple crates without
introducing a workspace dependency. Thanks to existing tooling
(https://github.com/mainmatter/cargo-autoinherit), transitioning all
dependencies to workspace dependencies was quite easy.
Resolves: #7241.
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.20 to 4.5.21.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/releases">clap's
releases</a>.</em></p>
<blockquote>
<h2>v4.5.21</h2>
<h2>[4.5.21] - 2024-11-13</h2>
<h3>Fixes</h3>
<ul>
<li><em>(parser)</em> Ensure defaults are filled in on error with
<code>ignore_errors(true)</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/blob/master/CHANGELOG.md">clap's
changelog</a>.</em></p>
<blockquote>
<h2>[4.5.21] - 2024-11-13</h2>
<h3>Fixes</h3>
<ul>
<li><em>(parser)</em> Ensure defaults are filled in on error with
<code>ignore_errors(true)</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="03d722625a"><code>03d7226</code></a>
chore: Release</li>
<li><a
href="3df70fb2b6"><code>3df70fb</code></a>
docs: Update changelog</li>
<li><a
href="3266c36abf"><code>3266c36</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5691">#5691</a>
from epage/custom</li>
<li><a
href="951762db57"><code>951762d</code></a>
feat(complete): Allow any OsString-compatible type to be a
CompletionCandidate</li>
<li><a
href="bb6493e890"><code>bb6493e</code></a>
feat(complete): Offer - as a path option</li>
<li><a
href="27b348dbcb"><code>27b348d</code></a>
refactor(complete): Simplify ArgValueCandidates code</li>
<li><a
href="49b8108f8c"><code>49b8108</code></a>
feat(complete): Add PathCompleter</li>
<li><a
href="82a360aa54"><code>82a360a</code></a>
feat(complete): Add ArgValueCompleter</li>
<li><a
href="47aedc6906"><code>47aedc6</code></a>
fix(complete): Ensure paths are sorted</li>
<li><a
href="431e2bc931"><code>431e2bc</code></a>
test(complete): Ensure ArgValueCandidates get filtered</li>
<li>Additional commits viewable in <a
href="https://github.com/clap-rs/clap/compare/clap_complete-v4.5.20...clap_complete-v4.5.21">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Adding more context to these errors makes it easier to identify, which
of the operations fails. In addition, we remove some usages of the "log
and return" anti-pattern to avoid duplicate reports of the same issue.
Using the clippy lint `unwrap_used`, we can automatically lint against
all uses of `.unwrap()` on `Result` and `Option`. This turns up quite a
few results actually. In most cases, they are invariants that can't
actually be hit. For these, we change them to `Option`. In other cases,
they can actually be hit. For example, if the user supplies an invalid
log-filter.
Activating this lint ensures the compiler will yell at us every time we
use `.unwrap` to double-check whether we do indeed want to panic here.
Resolves: #7292.
"Just let it crash" is terrible advice for software that is shipped to
end users. Where possible, we should use proper error handling and only
fail the current function / task that is active, e.g. drop a particular
packet instead of failing all of connlib. We more or less already do
that.
Activating the clippy lint `unwrap_in_result` surfaced a few more places
where we panic despite being in a function that is fallible already.
These cases can easily be converted to not panic and return an error
instead.
Reading the Git version requires the entire Git repository to be
present, including all tags. The tags are only created _after_ the
artifact is being built, when we publish the release. Therefore, these
tags are never included in the actual released binary.
For Sentry, we use the `CARGO_PKG_VERSION` variable instead. This
doesn't tell us whether somebody built a client from source and then
used it so there could be some confusion in Sentry events. It is quite
unlikely that this happens though so for the majority of Sentry alerts,
this will give us the correct version.
For the Android client, we also depend on the `GITHUB_SHA` env variable
at compile-time. We do the same thing for the GUI client here.
Resolves: #6925.
When building inside a docker container, like we do for the
headless-client and gateway, the `.git` directory is not available.
Thus, determining what our current version is fails and gets reported as
"unknown". We are now also using this for Sentry which is not very
helpful if all errors are categorised under the same version.
In case somebody builds a gateway / client from source, we will have the
full version available. Most users will use our docker containers
though, meaning the version will only always be for a full release.
Resolves: #7184.
Our logging library, `tracing` supports structured logging. This is
useful because it preserves the more than just the string representation
of a value and thus allows the active logging backend(s) to capture more
information for a particular value.
In the case of errors, this is especially useful because it allows us to
capture the sources of a particular error.
Unfortunately, recording an error as a tracing value is a bit cumbersome
because `tracing::Value` is only implemented for `&dyn
std::error::Error`. Casting an error to this is quite verbose. To make
it easier, we introduce two utility functions in `firezone-logging`:
- `std_dyn_err`
- `anyhow_dyn_err`
Tracking errors as correct `tracing::Value`s will be especially helpful
once we enable Sentry's `tracing` integration:
https://docs.rs/sentry-tracing/latest/sentry_tracing/#tracking-errors
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.18 to 4.5.19.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/releases">clap's
releases</a>.</em></p>
<blockquote>
<h2>v4.5.19</h2>
<h2>[4.5.19] - 2024-10-01</h2>
<h3>Internal</h3>
<ul>
<li>Update dependencies</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/blob/master/CHANGELOG.md">clap's
changelog</a>.</em></p>
<blockquote>
<h2>[4.5.19] - 2024-10-01</h2>
<h3>Internal</h3>
<ul>
<li>Update dependencies</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="108907385c"><code>1089073</code></a>
chore: Release</li>
<li><a
href="c9b8c85f09"><code>c9b8c85</code></a>
docs: Update changelog</li>
<li><a
href="8b3de18a8d"><code>8b3de18</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5685">#5685</a>
from epage/engine</li>
<li><a
href="b38538d7c4"><code>b38538d</code></a>
fix(complete)!: Rename dynamic to engine</li>
<li><a
href="232af62f7d"><code>232af62</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5684">#5684</a>
from epage/endless</li>
<li><a
href="0209a79031"><code>0209a79</code></a>
fix(complete): Don't cause endless completions for bash/zsh</li>
<li>See full diff in <a
href="https://github.com/clap-rs/clap/compare/clap_complete-v4.5.18...clap_complete-v4.5.19">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Currently, `connlib` is entirely single-threaded. This allows us to
reuse a single buffer for processing IP packets and makes reasoning of
the packet processing code very simple. Being single-threaded also means
we can only make use of a single CPU core and all operations have to be
sequential.
Analyzing `connlib` using `perf` shows that we spend 26% of our CPU time
writing packets to the TUN interface [0]. Because we are
single-threaded, `connlib` cannot do anything else during this time. If
we could offload the writing of these packets to a different thread,
`connlib` could already process the next packet while the current one is
writing.
Packets that we send to the TUN interface arrived as an encrypted WG
packet over UDP and get decrypted into a - currently - shared buffer.
Moving the writing to a different thread implies that we have to have
more of these buffer that the next packet(s) can be decrypted into.
To avoid IP fragmentation, we set the maximum IP MTU to 1280 bytes on
the TUN interface. That actually isn't very big and easily fits into a
stackframe. The default stack size for threads is 2MB [1].
Instead of creating more buffers and cycling through them, we can also
simply stack-allocate our IP packets. This incurs some overhead from
copying packets but it is only ~3.5% [2] (This was measured without a
separate thread). With stack-allocated packets, almost all
lifetime-annotations go away which in itself is already a welcome
ergonomics boost. Stack-allocated packets also means we can simply spawn
a new thread for the packet processing. This thread is connected with
two channel to connlib's main thread. The capacity of 1000 packets will
at most consume an additional 3.5 MB of memory which is fine even on our
most-constrained devices such as iOS.
[0]: https://share.firefox.dev/3z78CzD
[1]: https://doc.rust-lang.org/std/thread/#stack-size
[2]: https://share.firefox.dev/3Bf4zlaResolves: #6653.
Resolves: #5541.
When deciding which interface we are going to use for connecting to the
portal API, we need to filter through all adapters on Windows and
exclude our own TUN adapter to avoid routing loops. In addition, we also
need to filter for only online adapters, otherwise we might pick one
that is not actually routable.
Resolves: #6802.
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.4 to 4.5.13.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/releases">clap's
releases</a>.</em></p>
<blockquote>
<h2>v4.5.13</h2>
<h2>[4.5.13] - 2024-07-31</h2>
<h3>Fixes</h3>
<ul>
<li><em>(derive)</em> Improve error message when
<code>#[flatten]</code>ing an optional <code>#[group(skip)]</code></li>
<li><em>(help)</em> Properly wrap long subcommand descriptions in
help</li>
</ul>
<h2>v4.5.12</h2>
<h2>[4.5.12] - 2024-07-31</h2>
<h2>v4.5.10</h2>
<h2>[4.5.10] - 2024-07-23</h2>
<h2>v4.5.9</h2>
<h2>[4.5.9] - 2024-07-09</h2>
<h3>Fixes</h3>
<ul>
<li><em>(error)</em> When defining a custom help flag, be sure to
suggest it like we do the built-in one</li>
</ul>
<h2>v4.5.8</h2>
<h2>[4.5.8] - 2024-06-28</h2>
<h3>Fixes</h3>
<ul>
<li>Reduce extra flushes</li>
</ul>
<h2>v4.5.7</h2>
<h2>[4.5.7] - 2024-06-10</h2>
<h3>Fixes</h3>
<ul>
<li>Clean up error message when too few arguments for
<code>num_args</code></li>
</ul>
<h2>v4.5.6</h2>
<h2>[4.5.6] - 2024-06-06</h2>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/blob/master/CHANGELOG.md">clap's
changelog</a>.</em></p>
<blockquote>
<h2>[4.5.13] - 2024-07-31</h2>
<h3>Fixes</h3>
<ul>
<li><em>(derive)</em> Improve error message when
<code>#[flatten]</code>ing an optional <code>#[group(skip)]</code></li>
<li><em>(help)</em> Properly wrap long subcommand descriptions in
help</li>
</ul>
<h2>[4.5.12] - 2024-07-31</h2>
<h2>[4.5.11] - 2024-07-25</h2>
<h2>[4.5.10] - 2024-07-23</h2>
<h2>[4.5.9] - 2024-07-09</h2>
<h3>Fixes</h3>
<ul>
<li><em>(error)</em> When defining a custom help flag, be sure to
suggest it like we do the built-in one</li>
</ul>
<h2>[4.5.8] - 2024-06-28</h2>
<h3>Fixes</h3>
<ul>
<li>Reduce extra flushes</li>
</ul>
<h2>[4.5.7] - 2024-06-10</h2>
<h3>Fixes</h3>
<ul>
<li>Clean up error message when too few arguments for
<code>num_args</code></li>
</ul>
<h2>[4.5.6] - 2024-06-06</h2>
<h2>[4.5.5] - 2024-06-06</h2>
<h3>Fixes</h3>
<ul>
<li>Allow <code>exclusive</code> to override
<code>required_unless_present</code>,
<code>required_unless_present_any</code>,
<code>required_unless_present_all</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d222ae4cb6"><code>d222ae4</code></a>
chore: Release</li>
<li><a
href="a8abcb40c5"><code>a8abcb4</code></a>
docs: Update changelog</li>
<li><a
href="2690e1bdb1"><code>2690e1b</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5621">#5621</a>
from shannmu/dynamic_valuehint</li>
<li><a
href="7fd7b3e40b"><code>7fd7b3e</code></a>
feat(clap_complete): Support to complete custom value of argument</li>
<li><a
href="fc6aaca52b"><code>fc6aaca</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5638">#5638</a>
from epage/cargo</li>
<li><a
href="631e54bc71"><code>631e54b</code></a>
docs(cookbook): Style cargo plugin</li>
<li><a
href="6fb49d08bb"><code>6fb49d0</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5636">#5636</a>
from gibfahn/styles_const</li>
<li><a
href="6f215eee98"><code>6f215ee</code></a>
refactor(styles): make styles example use a const</li>
<li><a
href="bbb2e6fdde"><code>bbb2e6f</code></a>
test: Add test case for completing custom value of argument</li>
<li><a
href="999071c46d"><code>999071c</code></a>
fix: Change <code>visible</code> to <code>hidden</code></li>
<li>Additional commits viewable in <a
href="https://github.com/clap-rs/clap/compare/clap_complete-v4.5.4...clap_complete-v4.5.13">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
The `expect` attribute is similar to `allow` in that it will silence a
particular lint. In addition to `allow` however, `expect` will fail as
soon as the lint is no longer emitted. This ensures we don't end up with
stale `allow` attributes in our codebase. Additionally, it provides a
way of adding a `reason` to document, why the lint is being suppressed.
Refs #6547, this fixes a similar error message but it's not the same
exact issue.
When IPv6 is disabled on a system, our call to set the MTU was failing
with error code 0x80070490. This patch allows some of the MTU-related
syscalls to fail with a warning log.
To replicate the issue, run this command to set a registry value to
disable IPv6, then reboot the system:
`reg add
"HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip6\Parameters"
/v DisabledComponents /t REG_DWORD /d 255 /f`
```[tasklist]
- [x] Update changelog
- [x] Apply PR feedback
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
On Windows, the network notifier always notifies once at startup. We
make the DNS notifier and Linux match this behavior, and we assert it in
the unit test.
Part of a yak shave towards removing Tauri.
As the final step in removing `pnet_packet`, we need to introduce `-Mut`
equivalent slices for UDP, TCP and ICMP packets. As a starting point,
introducing `UpdHeaderSliceMut` and `TcpHeaderSliceMut` is fairly
trivial. The ICMP variants are a bit trickier because those are
different for IPv4 and IPv6. Additionally, ICMP for IPv4 is quite
complex because it can have a variable header length. Additionally. for
both variants, the values in byte range 5-8 are semantically different
depending on the ICMP code.
This requires us to design an API that balances ergonomics and
correctness. Technically, an ICMP identifier and sequence can only be
set if the ICMP code is "echo request" or "echo reply". However, adding
an additional parsing step to guarantee this in the type system is quite
verbose.
The trade-off implemented in this PR allows to us to directly write to
the byte 5-8 using the `set_identifier` and `set_sequence` functions. To
catch errors early, this functions have debug-assertions built in that
ensure that the packet is indeed an ICMP echo packet.
Resolves: #6366.
Currently, we have two structs for representing IP packets: `IpPacket`
and `MutableIpPacket`. As the name suggests, they mostly differ in
mutability. This design was originally inspired by the `pnet_packet`
crate which we based our `IpPacket` on. With subsequent iterations, we
added more and more functionality onto our `IpPacket`, like NAT64 &
NAT46 translation. As a result of that, the `MutableIpPacket` is no
longer directly based on `pnet_packet` but instead just keeps an
internal buffer.
This duplication can be resolved by merging the two structs into a
single `IpPacket`. We do this by first replacing all usages of
`IpPacket` with `MutableIpPacket`, deleting `IpPacket` and renaming
`MutableIpPacket` to `IpPacket`. The final design now has different
`self`-receivers: Some functions take `&self`, some `&mut self` and some
consume the packet using `self`.
This results in a more ergonomic usage of `IpPacket` across the codebase
and deletes a fair bit of code. It also takes us one step closer towards
using `etherparse` for all our IP packet interaction-needs. Lastly, I am
currently exploring a performance-optimisation idea that stack-allocates
all IP packets and for that, the current split between `IpPacket` and
`MutableIpPacket` does not really work.
Related: #6366.
In #6032, we attempted to fix routing loops for Windows and did so
successfully for UDP packets. For TCP sockets, we believed that binding
the socket to an interface is enough to prevent routing loops. This
assumptions is wrong.
> On Windows, a call to bind() affects card selection only incoming
traffic, not outgoing traffic.
>
> Thus, on a client running in a multi-homed system (i.e., more than one
interface card), it's the network stack that selects the card to use,
and it makes its selection based solely on the destination IP, which in
turn is based on the routing table. A call to bind() will not affect the
choice of the card in any way.
On most of our testing machines, this problem didn't surface but it
turns out that on some machines, especially with WiFi cards there is a
conflict between the routes added on the system. In particular, with the
Internet resource active, we also add a catch-all route that we _want_
to have the most priority, i.e. Windows SHOULD send all traffic to our
TUN device. Except for traffic that we generate, like TCP connections to
the portal or UDP packets sent to gateways, relays or DNS servers.
It appears that on some systems, mostly with Ethernet adapters, Windows
picks the "correct" interface for our socket and sends traffic via that
but on other systems, it doesn't. TCP sockets are only used for the
WebSocket connection to the portal. Without that one, Firezone
completely stops working because we can't send any control messages.
To reliably fix this issue, we need to add a dedicated route for the
target IP of each TCP socket that is more specific than the Internet
resource route (`0.0.0.0/0`) but otherwise identical. We do this as part
of creating a new TCP socket. This route is for the _default_ interface
and thus, doesn't get automatically removed when Firezone exits.
We implement a RAII guard that attempts to drop the route on a
best-effort basis. Despite this RAII guard, this route can linger around
in case Firezone is being forced to exit or exits in otherwise unclean
ways. To avoid lingering routes, we always delete all routing table
entries matching the IP of the portal just before we are about to add
one.
Fixes: #6591.
[0]:
https://forums.codeguru.com/showthread.php?487139-Socket-binding-with-routing-table&s=a31637836c1bf7f0bc71c1955e47bdf9&p=1891235#post1891235
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Foo Bar <foo@bar.com>
Co-authored-by: conectado <gabrielalejandro7@gmail.com>
Currently, we buffer UDP packets whenever the socket is busy and try to
flush them out at a later point. This requires allocations and is tricky
to get right.
In order to solve both of these problems, we refactor `snownet` to
return us an `EncryptedPacket` instead of a `Transmit`. An
`EncryptedPacket` is an indirection-abstraction that can be turned into
a `Transmit` given an `EncryptBuffer`. This combination of types allows
us to hold on to the `EncryptedPacket` (which does not contain any
references itself) in the `io` component whilst we are waiting for the
socket to be ready to send again.
This means we will immediately suspend the event loop in case the socket
is no longer ready for sending and resend the datagram in the
`EncryptBuffer` once we get re-polled.
Updates `windows` crates to 0.58 without the bug in #6551.
Supersedes #6556.
The bug was calling `try_send()?` on an MPSC channel of capacity 1,
which would bail out of the worker thread if we got 2 DNS change
notifications faster than the controller task / thread could process the
first one.
This reverts commit d8f25f9bf8.
#6506 broke the Clients and I guess I didn't do any manual smoke test,
so I didn't catch it.
I have leads for a permanent fix in #6551 but I don't want to leave
`main` broken since it will screw up bisects.
Supersedes #5913
This required a big refactor because `HANDLE` is no longer `Send` and
was never supposed to be.
So we add a worker thread for listening to DNS changes, since that
requires us to hold a `HANDLE` across `await` points and I couldn't find
any simpler way to do it.
I could add integration tests for this in a future PR that prove the
notifiers work by poking the registry or setting DNS servers and seeing
if we pick up the changes on time. But setting DNS servers without the
tunnel up may be tricky, so I left it out of scope for this PR.
```[tasklist]
- [x] Fix force-kill bug
```
The output of `git describe` always refers to the last tag that it can
find. This leads to confusing versions being printed such as:
```
2024-08-19T00:24:08.983891Z INFO firezone_headless_client: arch="x86_64" git_version="gateway-1.1.5-30-gf82fee162-modified"
```
Note that this is code running in the headless-client and it refers to
the gateway tag. Whilst not wrong from git's PoV, it is certainly
confusing.
We can fix this by providing a glob-pattern to `git describe` via
`--match`. This makes git ignore any other tags and print a version
identifier that refers to the current program:
```
2024-08-19T00:39:48.634191Z INFO firezone_headless_client: arch="x86_64" git_version="headless-client-1.1.7-31-ga08a3411d-modified"
```
Parsing the current Git version within `firezone-bin-shared` means this
crate (and all its dependents) need to be rebuilt everytime one makes a
commit, even if none of the code actually changes.
To avoid this whilst still allowing `firezone-bin-shared` to export a
useful, shared function, we export a macro instead that can be called
from the respective crates that need the GIT version. This means only
those binaries will be marked as dirty and rebuilds of e.g. unit tests
don't need to rebuild these workspace crates.
Now that we have the `bin-shared` crate, it is easy to move the
health-check functionality into there. That allows us to get rid of a
crate which makes navigating the workspace a bit easier.
Setting up a logger is something that pretty much every entrypoint needs
to do, be it a test, a shared library embedded in another app or a
standalone application. Thus, it makes sense to introduce a dedicated
crate that allows us to bundle all the things together, how we want to
do logging.
This allows us to introduce convenience functions like
`firezone_logging::test` which allow you to construct a logger for a
test as a one-liner.
Crucially though, introducing `firezone-logging` gives us a place to
store a default log directive that silences very noisy crates. When
looking into a problem, it is common to start by simply setting the
log-filter to `debug`. Without further action, this floods the output
with logs from crates like `netlink_proto` on Linux. It is very unlikely
that those are the logs that you want to see. Without a preset filter,
the only alternative here is to explicitly turn off the log filter for
`netlink_proto` by typing something like
`RUST_LOG=netlink_proto=off,debug`. Especially when debugging issues
with customers, this is annoying.
Log filters can be overridden, i.e. a 2nd filter that matches the exact
same scope overrides a previous one. Thus, with this design it is still
possible to activate certain logs at runtime, even if they have silenced
by default.
I'd expect `firezone-logging` to attract more functionality in the
future. For example, we want to support re-loading of log-filters on
other platforms. Additionally, where logs get stored could also be
defined in this crate.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
The `firezone-bin-shared` crate is meant to house non-tunnel related
things. That allows it to compile in parallel to everything else. It
currently only depends on `connlib-shared` to access the `DEFAULT_MTU`
constant. We can remove that by requiring the MTU as a ctor parameter of
`TunDeviceManager`.
A longer write-up of the intended dependency structure is in #4470.
When forwarding DNS queries, we need to remember the original source
socket in order to send the response back. Previously, this mapping was
indexed by the DNS query ID. As it turns out, at least Windows doesn't
have a global DNS query ID counter and may reuse them across different
DNS servers. If that happens and two of these queries overlap, then we
match the wrong responses together.
In the best case, this produces bad DNS results on the client. In the
worst case, those queries were for DNS servers with different IP
versions in which case we triggered a panic in connlib further down the
stack where we created the IP packet for the response.
To fix this, we first and foremost remove the explicit `panic!` from the
`make::` functions in `ip-packet`. Originally, these functions were only
used in tests but we started to use them in production code too and
unfortunately forgot about this panic. By introducing a `Result`, all
call-sites are made aware that this can fail.
Second, we fix the actual indexing into the data structure for forwarded
DNS queries to also include the DNS server's socket. This ensures we
don't treat the DNS query IDs as globally unique.
Third, we replace the panicking path in
`try_handle_forwarded_dns_response` with a log statement, meaning if the
above assumption turns out wrong for some reason, we still don't panic
and simply don't handle the packet.
Currently, `connlib` depends on `hickory-resolver` to perform DNS
queries for non-resources. This is unnecessary. Instead of buffering the
original UDP DNS query, consulting hickory to resolve the name and
mapping the response back, we can simply take the UDP payload and send
it via our protected socket directly to the original upstream DNS
server.
This ensures `connlib` is as transparent as possible for DNS queries for
non-resources. Additionally, it removes a lot of error handling and
other cruft that we currently have to perform because we are using
hickory. For example, hickory will automatically retry a DNS query after
a certain timeout. However, the OS / client talking to `connlib` will
also retry after a certain timeout because it is making DNS queries over
an unreliable transport (UDP). It is thus unnecessary for us to do that
internally.
To correctly test this change, our test-suite needed some refactoring.
Specifically, DNS servers are now modelled as dedicated `Host`s that can
receive (UDP) traffic.
Lastly, we can remove our dependency on `hickory-proto` and
`hickory-resolver` everywhere and only use `domain` for parsing DNS
messages.
Resolves: #6141.
Related: #6033.
Related: #4800. (Impossible to happen with this design)
Closes#5063, supersedes #5850
Other refactors and changes made as part of this:
- Adds the ability to disable DNS control on Windows
- Removes the spooky-action-at-a-distance `from_env` functions that used
to be buried in `tunnel`
- `FIREZONE_DNS_CONTROL` is now a regular `clap` argument again
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Closes#5846
Will be moved down to the IPC service eventually.
The goal for connection roaming is not for totally transparent "Change
Wi-Fi networks without dropping SSH" handoffs, but just for Firezone to
re-connect itself as quickly as possible so that everything above us can
re-connect as quickly as it times out, and won't be hung up with a
broken tunnel.
In `connlib`, traffic is sent through sockets via one of three ways:
1. Direct p2p traffic between clients and gateways: For these, we always
explicitly set the source IP (and thus interface).
2. UDP traffic to the relays: For these, we let the OS pick an
appropriate source interface.
3. WebSocket traffic over TCP to the portal: For this too, we let the OS
pick the source interface.
For (2) and (3), it is possible to run into routing loops, depending on
the routes that we have configured on the TUN device.
In Linux, we can prevent routing loops by marking a socket [0] and
repeating the mark when we add routes [1]. Packets sent via a marked
socket won't be routed by a rule that contains this mark. On Android, we
can do something similar by "protecting" a socket via a syscall on the
Java side [2].
On Windows, routing works slightly different. There, the source
interface is determined based on a computed metric [3] [4]. To prevent
routing loops on Windows, we thus need to find the "next best" interface
after our TUN interface. We can achieve this with a combination of
several syscalls:
1. List all interfaces on the machine
2. Ask Windows for the best route on each interface, except our TUN
interface.
3. Sort by Windows' routing metric and pick the lowest one (lower is
better).
Thanks to the abstraction of `SocketFactory` that we already previously
introduced, Integrating this into `connlib` isn't too difficult:
1. For TCP sockets, we simply resolve the best route after creating the
socket and then bind it to that local interface. That way, all packets
will always going via that interface, regardless of which routes are
present on our TUN interface.
2. UDP is connection-less so we need to decide per-packet, which
interface to use. "Pick the best interface for me" is modelled in
`connlib` via the `DatagramOut::src` field being `None`.
- To ensure those packets don't cause a routing loop, we introduce a
"source IP resolver" for our `UdpSocket`. This function gets called
every time we need to send a packet without a source IP.
- For improved performance, we cache these results. The Windows client
uses this source IP resolver to use the above devised strategy to find a
suitable source IP.
- In case the source IP resolution fails, we don't send the packet. This
is important, otherwise, the kernel might choose our TUN interface again
and trigger a routing loop.
The last remark to make here is that this also works for connection
roaming. The TCP socket gets thrown away when we reconnect to the
portal. Thus, the new socket will pick the new best interface as it is
re-created. The UDP sockets also get thrown away as part of roaming.
That clears the above cache which is what we want: Upon roaming, the
best interface for a given destination IP will likely have changed.
[0]:
59014a9622/rust/headless-client/src/linux.rs (L19-L29)
[1]:
59014a9622/rust/bin-shared/src/tun_device_manager/linux.rs (L204-L224)
[2]:
59014a9622/rust/connlib/clients/android/src/lib.rs (L535-L549)
[3]:
https://learn.microsoft.com/en-us/previous-versions/technet-magazine/cc137807(v=msdn.10)?redirectedfrom=MSDN
[4]:
https://learn.microsoft.com/en-us/windows-server/networking/technologies/network-subsystem/net-sub-interface-metricFixes: #5955.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>