I am not sure if this is currently breaking anything but it seems more
correct to flush all events first and then end the session.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
The tunnel service of the GUI client can only handle one process at a
time. The OS however will happily connect multiple clients to the socket
/ pipe. They will simply idle until the previous process disconnects.
To avoid this situation, we introduce a `Hello` message from the tunnel
service to the GUI client. If the GUI client doesn't receive this
message within 5s, it considers the tunnel service be not responsive.
If our duplicate instance detection works as intended, users are not
expected to hit this.
We already have a pretty powerful IPC framework in place to communicate
between the GUI and the service process. The deeplink implemenation uses
the same IPC mechanisms (UDS / pipes), yet it is effectively a
re-implementation of what we already have, just with less functionality.
In order to provide a more sophisticated handling of the case where
Firezone is launched again while it is already running, we refactor the
deeplink module to reuse the existing IPC framework. This makes it quite
easy to then reuse this in order to ping the already running Firezone
process that a new instance was launched.
For now, this doesn't do anything other than writing a log entry. This
however lays enough ground-work for us to then implement a more
sophisticated handling of that case in the future, e.g. open new windows
etc.
One caveat here is that we are now trying to connect to an existing IPC
socket on every startup, even the first one. Our IPC code has a retry
loop of 10 iterations to be more resilient on Windows when connecting to
pipes. Without any further changes, this would now delay the start of
Firezone always by 1s because we would try to connect to the socket 10x
before concluding that we are the first instance. To fix this, we make
the number of attempts configurable and set it to 1 when attempting to
the GUI IPC socket to avoid unnecessary delays in starting up the
Client.
Related: #5143.
We already use `vite` as a bundling tool but only to rollup some of the
pre-built files. This setup (and therefore our buildscripts) can be
massively simplified by instructing `vite` to also build our TypeScript
code and compile tailwind.
This makes it much easier to develop locally because one only needs to
run `pnpm vite build --watch` to keep everything up to date.
Currently, one machines that have multiple routable egress interfaces,
`connlib` may bounce between the two instead of settling on one. This
happens because we have a dedicated `CandidateSet` that we use to filter
out "duplicate" candidates of the same type. Doing that is important
because if the other party is behind a symmetric NAT, they will send us
many server-reflexive candidates that all only differ by their port,
none of them will actually be routable though.
To prevent sending many of these candidates to the remote, we first
gather them locally in our `CandidateSet` and de-duplicate them.
Bumps the aya group in /rust with 5 updates:
| Package | From | To |
| --- | --- | --- |
| [aya](https://github.com/aya-rs/aya) | ``583709f`` | ``f74a157`` |
| [aya-build](https://github.com/aya-rs/aya) | ``583709f`` | ``f74a157``
|
| [aya-ebpf](https://github.com/aya-rs/aya) | ``583709f`` | ``f74a157``
|
| [aya-log](https://github.com/aya-rs/aya) | ``583709f`` | ``f74a157`` |
| [aya-log-ebpf](https://github.com/aya-rs/aya) | ``583709f`` |
``f74a157`` |
Updates `aya` from `583709f` to `f74a157`
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f74a157907"><code>f74a157</code></a>
Merge pull request <a
href="https://redirect.github.com/aya-rs/aya/issues/1262">#1262</a> from
dave-tucker/stabilize-links</li>
<li><a
href="7dba5a41ad"><code>7dba5a4</code></a>
feat(aya): Make LinkInfo and loaded_links public</li>
<li><a
href="92c73a72e7"><code>92c73a7</code></a>
Merge pull request <a
href="https://redirect.github.com/aya-rs/aya/issues/1263">#1263</a> from
aya-rs/dependabot/cargo/cargo-crates-b22b264ef4</li>
<li><a
href="95c03e6600"><code>95c03e6</code></a>
build(deps): update nix requirement in the cargo-crates group</li>
<li><a
href="33c9f2b2b2"><code>33c9f2b</code></a>
aya: uprobe: use PathBuf for
ResolveSymbolError::DebuglinkAccessError</li>
<li><a
href="90d5604877"><code>90d5604</code></a>
aya: uprobe attach: use mmap instead of reading whole binaries to
memory</li>
<li><a
href="647100faa7"><code>647100f</code></a>
aya: clean up resolve_symbol a bit</li>
<li><a
href="3aded0e0a5"><code>3aded0e</code></a>
aya: move Mmap struct to aya::util</li>
<li>See full diff in <a
href="583709f6a0...f74a157907">compare
view</a></li>
</ul>
</details>
<br />
Updates `aya-build` from `583709f` to `f74a157`
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f74a157907"><code>f74a157</code></a>
Merge pull request <a
href="https://redirect.github.com/aya-rs/aya/issues/1262">#1262</a> from
dave-tucker/stabilize-links</li>
<li><a
href="7dba5a41ad"><code>7dba5a4</code></a>
feat(aya): Make LinkInfo and loaded_links public</li>
<li><a
href="92c73a72e7"><code>92c73a7</code></a>
Merge pull request <a
href="https://redirect.github.com/aya-rs/aya/issues/1263">#1263</a> from
aya-rs/dependabot/cargo/cargo-crates-b22b264ef4</li>
<li><a
href="95c03e6600"><code>95c03e6</code></a>
build(deps): update nix requirement in the cargo-crates group</li>
<li><a
href="33c9f2b2b2"><code>33c9f2b</code></a>
aya: uprobe: use PathBuf for
ResolveSymbolError::DebuglinkAccessError</li>
<li><a
href="90d5604877"><code>90d5604</code></a>
aya: uprobe attach: use mmap instead of reading whole binaries to
memory</li>
<li><a
href="647100faa7"><code>647100f</code></a>
aya: clean up resolve_symbol a bit</li>
<li><a
href="3aded0e0a5"><code>3aded0e</code></a>
aya: move Mmap struct to aya::util</li>
<li>See full diff in <a
href="583709f6a0...f74a157907">compare
view</a></li>
</ul>
</details>
<br />
Updates `aya-ebpf` from `583709f` to `f74a157`
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f74a157907"><code>f74a157</code></a>
Merge pull request <a
href="https://redirect.github.com/aya-rs/aya/issues/1262">#1262</a> from
dave-tucker/stabilize-links</li>
<li><a
href="7dba5a41ad"><code>7dba5a4</code></a>
feat(aya): Make LinkInfo and loaded_links public</li>
<li><a
href="92c73a72e7"><code>92c73a7</code></a>
Merge pull request <a
href="https://redirect.github.com/aya-rs/aya/issues/1263">#1263</a> from
aya-rs/dependabot/cargo/cargo-crates-b22b264ef4</li>
<li><a
href="95c03e6600"><code>95c03e6</code></a>
build(deps): update nix requirement in the cargo-crates group</li>
<li><a
href="33c9f2b2b2"><code>33c9f2b</code></a>
aya: uprobe: use PathBuf for
ResolveSymbolError::DebuglinkAccessError</li>
<li><a
href="90d5604877"><code>90d5604</code></a>
aya: uprobe attach: use mmap instead of reading whole binaries to
memory</li>
<li><a
href="647100faa7"><code>647100f</code></a>
aya: clean up resolve_symbol a bit</li>
<li><a
href="3aded0e0a5"><code>3aded0e</code></a>
aya: move Mmap struct to aya::util</li>
<li>See full diff in <a
href="583709f6a0...f74a157907">compare
view</a></li>
</ul>
</details>
<br />
Updates `aya-log` from `583709f` to `f74a157`
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f74a157907"><code>f74a157</code></a>
Merge pull request <a
href="https://redirect.github.com/aya-rs/aya/issues/1262">#1262</a> from
dave-tucker/stabilize-links</li>
<li><a
href="7dba5a41ad"><code>7dba5a4</code></a>
feat(aya): Make LinkInfo and loaded_links public</li>
<li><a
href="92c73a72e7"><code>92c73a7</code></a>
Merge pull request <a
href="https://redirect.github.com/aya-rs/aya/issues/1263">#1263</a> from
aya-rs/dependabot/cargo/cargo-crates-b22b264ef4</li>
<li><a
href="95c03e6600"><code>95c03e6</code></a>
build(deps): update nix requirement in the cargo-crates group</li>
<li><a
href="33c9f2b2b2"><code>33c9f2b</code></a>
aya: uprobe: use PathBuf for
ResolveSymbolError::DebuglinkAccessError</li>
<li><a
href="90d5604877"><code>90d5604</code></a>
aya: uprobe attach: use mmap instead of reading whole binaries to
memory</li>
<li><a
href="647100faa7"><code>647100f</code></a>
aya: clean up resolve_symbol a bit</li>
<li><a
href="3aded0e0a5"><code>3aded0e</code></a>
aya: move Mmap struct to aya::util</li>
<li>See full diff in <a
href="583709f6a0...f74a157907">compare
view</a></li>
</ul>
</details>
<br />
Updates `aya-log-ebpf` from `583709f` to `f74a157`
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f74a157907"><code>f74a157</code></a>
Merge pull request <a
href="https://redirect.github.com/aya-rs/aya/issues/1262">#1262</a> from
dave-tucker/stabilize-links</li>
<li><a
href="7dba5a41ad"><code>7dba5a4</code></a>
feat(aya): Make LinkInfo and loaded_links public</li>
<li><a
href="92c73a72e7"><code>92c73a7</code></a>
Merge pull request <a
href="https://redirect.github.com/aya-rs/aya/issues/1263">#1263</a> from
aya-rs/dependabot/cargo/cargo-crates-b22b264ef4</li>
<li><a
href="95c03e6600"><code>95c03e6</code></a>
build(deps): update nix requirement in the cargo-crates group</li>
<li><a
href="33c9f2b2b2"><code>33c9f2b</code></a>
aya: uprobe: use PathBuf for
ResolveSymbolError::DebuglinkAccessError</li>
<li><a
href="90d5604877"><code>90d5604</code></a>
aya: uprobe attach: use mmap instead of reading whole binaries to
memory</li>
<li><a
href="647100faa7"><code>647100f</code></a>
aya: clean up resolve_symbol a bit</li>
<li><a
href="3aded0e0a5"><code>3aded0e</code></a>
aya: move Mmap struct to aya::util</li>
<li>See full diff in <a
href="583709f6a0...f74a157907">compare
view</a></li>
</ul>
</details>
<br />
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
To simulate varying network conditions in our tests, each `Host` in our
test network has an "inbox" that contains all incoming network packets
with an added latency. When another hosts sends a packet, the packet
gets added to the inbox. Internally, the inbox has a binary heap that
sorts incoming `Transmits` by their latency and only delivers them to
the node when that delay is up.
Currently, this delivery doesn't always happen because we fail to take
into account the timestamp as when the next `Transmit` is due when we
figure out what to do next.
Instead of just looking at the inner state via `poll_transmit`, we now
also consult the inbox of messages as to when the next message is due
and wake up at the correct time.
Not doing this caused our state machine to think that packets got
dropped because `REFRESH` messages to the relays were timing out.
Resolves: #9118
In order to harvest some of the fruits of our lobbying efforts upstream,
we bump `moka` to latest `main` whilst they are still preparing the next
release.
This removes several dependencies from our dependency tree & lockfile.
Related: https://github.com/moka-rs/moka/pull/509
To send UDP DNS queries to upstream DNS servers, we have a
`UdpSocket::handshake` function that turns a UDP socket into a
single-use object where exactly one datagram is expected from the
address we send a message to. The way this is enforced is via an
equality check.
It appears that this equality check fails if users run an upstream DNS
server on a link-local IPv6 address within a setup that utilises IPv6
scopes. At the time when we receive the response, the packet has already
been successfully routed back to us so we should accept it, even if we
didn't specify a scope as the destination address.
In order to avoid routing loops on Windows, our UDP and TCP sockets in
`connlib` embed a "source IP resolver" that finds the "next best"
interface after our TUN device according to Windows' routing metrics.
This ensures that packets don't get routed back into our TUN device.
Currently, errors during this process are only logged on TRACE and
therefore not visible in Sentry. We fix this by moving around some of
the function interfaces and forward the error from the source IP
resolver together with some context of the destination IP.
As part of launching the Tauri GUI client, we need to observe a specific
initialisation order. In particular, we need to wait until Tauri sends
us a `RunEvent::Ready` before we can initialise things like the tray
menu.
To make this more convenient, Tauri offers a so-called "setup hook" that
can be set on the app builder. Unfortunately, Tauri internally panics if
this provided setup-hook returns an `Err`. Removing this is tracked
upstream: https://github.com/tauri-apps/tauri/issues/12815.
Until this is fixed, we stop using this "setup hook" and instead spawn
our own task that performs this work. This task needs to wait until
Tauri is ready. To achieve that, we introduce an additional mpsc channel
that sends a notification every time we receive a `RunEvent::Ready`.
That should only happen once. We only read from the receiver once, which
is why we ignore the error on the sending side in case the receiver has
already been dropped.
Resolves: #9101
When this crate is compiled by itself, these features are required. This
doesn't show up in CI because there we compile the entire workspace and
some crate somewhere already activates these features then.
Bumps [divan](https://github.com/nvzqz/divan) from 0.1.17 to 0.1.21.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/nvzqz/divan/blob/main/CHANGELOG.md">divan's
changelog</a>.</em></p>
<blockquote>
<h2>[0.1.21] - 2025-04-09</h2>
<h3>Fixed</h3>
<ul>
<li><code>Divan::skip_exact</code> behaved incorrectly in
<code>v0.1.19</code>.</li>
</ul>
<h3>Changed</h3>
<ul>
<li>Improved handling of internal code around filters and those
responsible for
sacking the people who have just been sacked have been sacked.</li>
</ul>
<h2>[0.1.20] - 2025-04-09</h2>
<h3>Fixed</h3>
<ul>
<li><code>Divan::skip_regex</code> accidentally dropped
<a
href="https://docs.rs/regex-lite/latest/regex_lite/struct.Regex.html"><code>regex_lite::Regex</code></a>
and behaved incorrectly in <code>v0.1.19</code>.</li>
</ul>
<h2>[0.1.19] - 2025-04-09</h2>
<h3>Fixed</h3>
<ul>
<li>[<code>cargo-nextest</code>] no longer skips benchmarks with
argument parameters (<a
href="https://redirect.github.com/nvzqz/divan/issues/75">#75</a>).</li>
</ul>
<h3>Changed</h3>
<ul>
<li>Organized positive and negative filters into a split buffer.</li>
</ul>
<h2>[0.1.18] - 2025-04-05</h2>
<h3>Added</h3>
<ul>
<li>
<p>Support for [<code>cargo-nextest</code>] running benchmarks as
tests.</p>
</li>
<li>
<p>[<code>prelude</code>] module for simplifying imports of
[<code>#[bench]</code>][bench_attr],
[<code>#[bench_group]</code>][bench_group_attr],
[<code>black_box</code>], [<code>black_box_drop</code>],
[<code>AllocProfiler</code>], [<code>Bencher</code>], and
[<code>Divan</code>].</p>
</li>
<li>
<p>Support <code>wasi</code> and <code>emscripten</code> targets.</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="52f9d4983e"><code>52f9d49</code></a>
Release v0.1.21</li>
<li><a
href="5afb095486"><code>5afb095</code></a>
Fix broken <code>Divan::skip_exact</code></li>
<li><a
href="e4d112ccbc"><code>e4d112c</code></a>
Release v0.1.20</li>
<li><a
href="1d74108bbe"><code>1d74108</code></a>
Fix broken <code>Divan::skip_regex</code></li>
<li><a
href="58988fc304"><code>58988fc</code></a>
Release v0.1.19</li>
<li><a
href="f43a742d0f"><code>f43a742</code></a>
docs: Change "changes" to "changed" in
changelog</li>
<li><a
href="f348769fb4"><code>f348769</code></a>
docs: List most recent changes in changelog</li>
<li><a
href="07111fb6f8"><code>07111fb</code></a>
Abstract filters into <code>FilterSet</code></li>
<li><a
href="2c865cdf24"><code>2c865cd</code></a>
docs: Add link to <code>prelude</code> module in changelog</li>
<li><a
href="9075b9e0ed"><code>9075b9e</code></a>
fix nextest support for parameterized benches</li>
<li>Additional commits viewable in <a
href="https://github.com/nvzqz/divan/compare/v0.1.17...v0.1.21">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
For at least 1 user, the threads shut down correctly, but we didn't seem
to have exited the loop. In
https://firezone-inc.sentry.io/issues/6335839279/events/c11596de18924ee3a1b64ced89b1fba2/?project=4508008945549312,
we can see that both flags are marked as `true` yet we still emitted the
message.
The only way how I can explain this is that the thread shut down in
between the two times we've called the `is_finished` function. To ensure
this doesn't happen, we now only read it once.
This however also shows that 5s may not be enough time for WinTUN to
shutdown. Therefore, we increase the grace period to 10s.
This can easily happen if we are briefly disconnected from the portal.
It is not the end of the world and not worth creating Sentry alerts for.
Originally, this was intended to be a way of detecting "bad
connectivity" but that didn't really work.
Our DNS over TCP implementation uses `smoltcp` which requires us to
manage sockets individually, i.e. there is no such thing as a listening
socket. Instead, we have to create multiple sockets and rotate through
them.
Whenever we receive new DNS servers from the host app, we throw away all
of those sockets and create new ones.
The way we refer to these sockets internally is via `smoltcp`'s
`SocketHandle`. These are just indices into a `Vec` and this access can
panic when it is out of range. Normally that doesn't happen because such
a `SocketHandle` is only created when the socket is created and
therefore, each `SocketHandle` in existence should be valid.
What we overlooked is that these sockets get destroyed and re-created
when we call `set_listen_addresses` which happens when the host app
tells us about new DNS servers. In that case, sockets that we had just
received a query on and are waiting for a response have their handles
stored in a temporary `HashMap`. Attempting to send back a response for
one of those queries will then either fail with an error that the socket
is not in the right state or - worse - panic with an out of bounds error
if the previously had more listen addresses than we have now.
To fix this, we need to clear this map of pending queries every time we
call `set_listen_addresses`.
When calculating the maximum size of the UDP payload we can send in a
single syscall, we need to take into account the overhead of the IP and
UDP headers.
These run every minute and add a lot of noise to the logs.
```
May 11 18:21:14 gateway-z1w4 firezone-gateway[2007]: 2025-05-11T18:21:14.154Z INFO firezone_tunnel::io::nameserver_set: Evaluating fastest nameserver ips={127.0.0.53}
May 11 18:21:14 gateway-z1w4 firezone-gateway[2007]: 2025-05-11T18:21:14.155Z INFO firezone_tunnel::io::nameserver_set: Evaluated fastest nameserver fastest=127.0.0.53
May 11 18:22:14 gateway-z1w4 firezone-gateway[2007]: 2025-05-11T18:22:14.154Z INFO firezone_tunnel::io::nameserver_set: Evaluating fastest nameserver ips={127.0.0.53}
May 11 18:22:14 gateway-z1w4 firezone-gateway[2007]: 2025-05-11T18:22:14.155Z INFO firezone_tunnel::io::nameserver_set: Evaluated fastest nameserver fastest=127.0.0.53
May 11 18:23:14 gateway-z1w4 firezone-gateway[2007]: 2025-05-11T18:23:14.153Z INFO firezone_tunnel::io::nameserver_set: Evaluating fastest nameserver ips={127.0.0.53}
May 11 18:23:14 gateway-z1w4 firezone-gateway[2007]: 2025-05-11T18:23:14.155Z INFO firezone_tunnel::io::nameserver_set: Evaluated fastest nameserver fastest=127.0.0.53
May 11 18:24:14 gateway-z1w4 firezone-gateway[2007]: 2025-05-11T18:24:14.154Z INFO firezone_tunnel::io::nameserver_set: Evaluating fastest nameserver ips={127.0.0.53}
May 11 18:24:14 gateway-z1w4 firezone-gateway[2007]: 2025-05-11T18:24:14.155Z INFO firezone_tunnel::io::nameserver_set: Evaluated fastest nameserver fastest=127.0.0.53
May 11 18:25:14 gateway-z1w4 firezone-gateway[2007]: 2025-05-11T18:25:14.153Z INFO firezone_tunnel::io::nameserver_set: Evaluating fastest nameserver ips={127.0.0.53}
```
The current `rust/` directory is a bit of a wild-west in terms of how
the crates are organised. Most of them are simply at the top-level when
in reality, they are all `connlib`-related. The Apple and Android FFI
crates - which are entrypoints in the Rust code are defined several
layers deep.
To improve the situation, we move around and rename several crates. The
end result is that all top-level crates / directories are:
- Either entrypoints into the Rust code, i.e. applications such as
Gateway, Relay or a Client
- Or crates shared across all those entrypoints, such as `telemetry` or
`logging`
Before a Client can send packets to a DNS resource, the Gateway must
first setup a NAT table between the IPs assigned by the Client and the
IPs the domain actually resolves to. This is what we call the DNS
resource NAT.
The communication for this process happens over IP through the tunnel
which is an unreliable transport. To ensure that this works reliably
even in the presence of packet loss on the wire, the Client uses an
idempotent algorithm where it tracks the state of the NAT for each
domain that is has ever assigned IPs for (i.e. received an A or AAAA
query from an application). This algorithm ensures that if we don't hear
anything back from the Gateway within 2s, another packet for setting up
the NAT is sent as soon as we receive _any_ DNS query.
This design balances efficiency (we don't try forever) with reliability
(we always check all of them).
In case a domain does not resolve at all or there are resolution errors,
the Gateway replies with `NatStatus::Inactive`. At present, the Client
doesn't handle this in any particular way other than logging that it was
not able to successfully setup the NAT.
The combination of the above results in an undesirable behaviour: If an
application queries a domain without A and AAAA records once, we will
keep retrying forever to resolve it upon every other DNS query issued to
the system. To fix this, we introduce `dns_resource_nat::State::Failed`.
Entries in this state are ignored as part of the above algorithm and
only recreated when explicitly told to do so which we only do when we
receive another DNS query for this domain.
To handle the increased complexity around this system, we extract it
into its own component and add a fleet of unit tests for its behaviour.
Currently, the Tauri build is broken on `main` because #9045
accidentally merged a bit too soon. In that PR, the two binaries that
the `gui-client` crate is composed of are now both defined in `src/bin`.
For some reason, this breaks Tauri's bundler and now on aarch64, it
stops including the `firezone-client-ipc` binary in the bundle. I don't
fully understand why and how that even works for x64 in the first place.
Nowhere in our repository can I find a configuration for the bundler as
to why it should even include that binary in the first place.
To fix this, we now explicitly copy this binary into the correct path
and also rebuild the `data` archive in addition to the `control`
archive.
When we implemented #8350, we chose an error handling strategy that
would shutdown the Gateway in case we didn't have a nameserver selected
for handling those SRV and TXT queries. At the time, this was deemed to
be sufficiently rare to be an adequate strategy. We have since learned
that this can indeed happen when the Gateway starts without network
connectivity which is quite common when using tools such as terraform to
provision infrastructure.
In #9060, we fix this by re-evaluating the fastest nameserver on a
timer. This however doesn't change the error handling strategy when we
don't have a working nameserver at all. It is practically impossible to
have a working Gateway yet us being unable to select a nameserver. We
read them from `/etc/resolv.conf` which is what `libc` uses to also
resolve the domain we connect to for the WebSocket. A working WebSocket
connection is required for us to establish connections to Clients, which
in turn is a precursor to us receiving DNS queries from a Client.
It causes unnecessary complexity to have a code path that can
potentially terminate the Gateway, yet is practically unreachable. To
fix this situation, we remove this code path and instead reply with a
DNS SERVFAIL error.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Opentelemetry logs a DEBUG log every time it creates a new meter. That
happens fairly often now in our codebase which spams the logs on the
DEBUG level.
```
2025-05-09T02:31:51.147Z DEBUG opentelemetry_sdk: name="MeterProvider.ExistingMeterReturned" meter_name="connlib"
```
We fix that be setting `opentelemetry_sdk` to default to `INFO` if it is
not specified explicitly.
Currently, the Gateway reads all nameservers from `/etc/resolv.conf` on
startup and evaluates the fastest one to use for SRV and TXT DNS queries
that are forwarded by the Client. If the machine just booted and we do
not have Internet connectivity just yet, this fails which leaves the
Gateway in state where it cannot fulfill those queries.
In order to ensure we always use the fastest one and to self-heal from
such situations, we add a 60s timer that refreshes this state.
Currently, this will **not** re-read the nameservers from
`/etc/resolv.conf` but still use the same IPs read on startup.
When the Gateway is handed an IP packet for a DNS resource that it
cannot route, it sends back an ICMP unreachable error. According to RFC
792 [0] (for ICMPv4) and RFC 4443 [1] (for ICMPv6), parts of the
original packet should be included in the ICMP error payload to allow
the sending party to correlate, what could not be sent.
For ICMPv4, the RFC says:
```
Internet Header + 64 bits of Data Datagram
The internet header plus the first 64 bits of the original
datagram's data. This data is used by the host to match the
message to the appropriate process. If a higher level protocol
uses port numbers, they are assumed to be in the first 64 data
bits of the original datagram's data.
```
For ICMPv6, the RFC says:
```
As much of invoking packet as possible without the ICMPv6 packet exceeding the minimum IPv6 MTU
```
[0]: https://datatracker.ietf.org/doc/html/rfc792
[1]: https://datatracker.ietf.org/doc/html/rfc4443#section-3.1
The module and crate structure around the GUI client and its background
service are currently a mess of circular dependencies. Most of the
service implementation actually sits in `firezone-headless-client`
because the headless-client and the service share certain modules. We
have recently moved most of these to `firezone-bin-shared` which is the
correct place for these modules.
In order to move the background service to `firezone-gui-client`, we
need to untangle a few more things in the GUI client. Those are done
commit-by-commit in this PR. With that out the way, we can finally move
the service module to the GUI client; where is should actually live
given that it has nothing to do with the headless client.
As a result, the headless-client is - as one would expect - really just
a thin wrapper around connlib itself and is reduced down to 4 files with
this PR.
To make things more consistent in the GUI client, we move the `main.rs`
file also into `bin/`. By convention `bin/` is where you define binaries
if a crate has more than one. cargo will then build all of them.
Eventually, we can optimise the compile-times for `firezone-gui-client`
by splitting it into multiple crates:
- Shared structs like IPC messages
- Background service
- GUI client
This will be useful because it allows only re-compiling of the GUI
client alone if nothing in `connlib` changes and vice versa.
Resolves: #6913Resolves: #5754
Both `device_id` and `device_info` are used by the headless-client and
the GUI client / IPC service. They should therefore be defined in the
`bin-shared` crate.
In order to experiment with alternative GUI libraries, we extracted a
`gui-client-common` crate that would hold GUI-library agnostic code.
We've since upgraded to Tauri v2 and settled on that as the GUI
framework for the Windows and Linux Firezone Clients. Therefore this
abstraction is unnecessary and can be removed again.
This makes it easier to work on the GUI client and also allows the
compiler to flag unused code more easily.
Currently, the platform-specific code for controlling DNS resolution on
a system sits in `firezone-headless-client`. This code is also used by
the GUI client. This creates a weird compile-time dependency from the
GUI client to the headless client.
For other components that have platform-specific implementations, we use
the `firezone-bin-shared` crate. As a first step of resolving the
compile-time dependency, we move the `dns_control` module to
`firezone-bin-shared`.
By default, we send all WARN and ERROR logs to Sentry. This also
includes logs emitted via the `log` crate via a facade that `tracing`
installs. The wintun-rs bindings install such a logger in the native
WinTUN code. The WinTUN code has a bug where it doesn't handle adapter
removal idempotently. That is, if the adapter has already been removed
it logs an error instead of succeeding.
Typically, such logs can easily be suppressed in Sentry. In this case
however, Sentry fails to group these different logs together because
WinTUN's error message contains a path to a temporary directory which is
different every time it gets executed. As such, these logs spam our
Sentry instance with no way for us to disable them with existing tools.
The WireGuard mailing list for WinTUN also appears to be dead. We
attempted to contact the list in February and have not received a reply
yet. The last archive goes back to November 2024 [0]. We use WinTUN as a
prebuild and signed DLL so we also cannot meaningfully patch this on our
end without upstreaming it.
Thus, as a last resort, we add some infrastructure to our logging setup
that allows us to client-side filter events based on certain patterns of
the message and a log level.
[0]: https://lists.zx2c4.com/pipermail/wireguard/
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Presently, the network change detection on Windows is very naive and
simply emits a change event everytime _anything_ changes. We can
optimise this and therefore improve the start-up time of Firezone by:
- Filtering out duplicate events
- Filtering out network change events for our own network adapter
This reduces the number of network change events to 1 during startup. As
far as I can tell from the code comments in this area, we explicitly
send this one to ensure we don't run into a race condition whilst we are
starting up.
Resolves: #8905
The `signals` module isn't something headless-client specific and should
live in our `bin-shared` crate. Once the `ipc_service` module is
decoupled from the headless-client crate, it will be used by both the
headless client and IPC service (which then will be defined in the GUI
client crate).