This extracts the initial refactoring required for #6944. Currently,
`connlib` sends all DNS queries over the same UDP socket as all the p2p
traffic for gateways and relays. In an earlier design of `connlib`, we
already did something similar as we are doing here but using
`hickory_resolver` for the actual DNS resolution.
Instead of depending on hickory, we implement DNS resolution ourselves
by sending a UDP DNS query to the mapped upstream DNS server. There are
no retries, instead, we rely on the original DNS client to retry in case
a packet gets lost on the way.
Modelling recursive DNS queries as explicit events from the
`ClientState` is necessary for implement DNS over TCP and DNS over
HTTPS. In both cases, the query to the upstream server isn't as simple
as emitting a `Transmit`. By modelling the query as an `async fn` within
`Io`, it will be possible to perform them all in one place.
Resolves: #6297.
This brings us one step closer to completing #6140. In Firezone, users
can define custom upstream DNS servers that take priority over
system-defined DNS servers. The IPs of these servers could also be
resources, meaning the DNS queries must be sent through the WireGuard
tunnel to the gateway.
For UDP DNS queries, that is easy because each query is only a single
packet. For TCP DNS queries, we need to have a dedicated TCP-capable DNS
server that parses all incoming queries. If they are required to be
forwarded to the gateway, we then need a TCP-capable DNS client that can
send them to the actual upstream DNS server.
This PR implements such a DNS client. The design is tailored for what we
need in `connlib`: We maintain a permanent TCP connection to each
upstream DNS server and send queries to them. Most likely, users will
only have a handful of DNS servers defined. TCP requires a three-way
handshake before any application data can be sent, maintaining a
connection should therefore greatly improve DNS resolution latency.
DNS resolvers are encouraged to keep TCP connections open but may close
them if they run out of resources. We only re-connect once we have more
queries to send in order to not spam the resolver with connections.
Resolves: #7000.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Closes#7008.
We already signed the GUI exe and the entire MSI package, but when
adding the IPC service we overlooked that one.
This PR:
- Modifies the signing script to accept multiple EXEs
- Modifies the Tauri bundle command to sign both exes
- Updates the changelog

Currently, we emit a single TRACE log for each DNS resource entry that
doesn't match. This is quite spammy and often not needed.
When debugging DNS resources, it is useful to know, which resources we
are matching against. To balance this, we now build a list of all DNS
resource domain patterns that we have and log a single "No resources
matched" log with that list in case none match.
This splits out the actual DNS server from #6944 into a separate crate.
At present, it only contains a DNS server. Later, we will likely add a
DNS client to it as well because the proptests and connlib itself will
need a user-space DNS TCP client.
The implementation uses `smoltcp` but that is entirely encapsulated. The
`Server` struct exposes only a high-level interface for
- feeding inbound packets as well as retrieving outbound packets
- retrieving parsed DNS queries and sending DNS responses
Related: #6140.
Closes#6989
- The tunnel daemon (IPC service) now explicitly sets the ID file's
perms to 0o640, even if the file already exists.
- The GUI error is now non-fatal. If the file can't be read, we just
won't get the device ID in Sentry.
- More specific error message when the GUI fails to read the ID file
We attempted to set the tunnel daemon's umask, but this caused the smoke
tests to fail. Fixing the regression is more urgent than getting the
smoke tests to match local debugging.
---------
Co-authored-by: _ <ReactorScram@users.noreply.github.com>
This has been a long-standing issue.
The base PR fixes the issue for Firefox, and apparently all other
browsers will _not_ change your DNS server, only opportunistically
enable DoH if it finds your current servers to support it.
With the latest version of `proptest-state-machine`, we no longer need
to use their traits because `Sequential::new` is now exposed. This makes
the overall things less magical because there is less indirection.
This incorporates the feedback from #6939 after a discussion with
@conectado. We agreed that the protocol should be more event-based,
where each message has its own event type. Events MAY appear in pairs or
other cardinality combinations, meaning semantically they could be seen
as requests and responses. In general though, due to the unreliable
nature of IP, it is better to view them as events. Events are typically
designed to be idempotent which is important to make this protocol work.
Using events also means it is not as easy to fall into the "trap" of
modelling requests / responses on the control protocol level.
Also allows windows to be closed and re-opened. Tauri behaved exactly
the same way, if you "close" a window it completely destroys it and
panics if you ask to show it again.
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
In #6909, we introduced a regression that wasn't caught by CI.
Previously, we were using a different function to resolve the domain
name of the portal. That function took care of handling the case where
the host didn't have a port number.
In the docker-compose file we always specify a port number, therefore
the case of host-only doesn't get tested.
This currently prevents all clients from signing in to staging & prod.
To correctly handle overlapping CIDR resources, we need to recompute
which ones are active. The `RoamClient` transition was missing that
despite the system-under-test actually doing that bit.
Adding the necessary function call fixes the two regression seeds
detected in CI.
Resolves: #6924.
Do we want to track 401s in sentry? If we see a lot of them, something
is likely wrong but I guess there is some level of 401s that users will
just run into.
Is there a way of marking these as "might not be a really bad error"?
---------
Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
At present, `connlib` utilises the portal as a signalling layer for any
kind of control message that needs to be exchanged between clients and
gateways. For anything regard to connectivity, this is crucial: Before
we have a direct connection to the gateway, we don't really have a
choice other than using the portal as a "relay" to e.g. exchange address
candidates for ICE.
However, once a direct connection has been established, exchanging
information directly with the gateway is faster and removes the portal
as a potential point of failure for the data plane.
For DNS resources, `connlib` intercepts all DNS requests on the client
and assigns its own IPs within the CG-NAT range to all domains that are
configured as resources. Thus, all packets targeting DNS resources will
have one of these IPs set as their destination. The gateway needs to
learn about all the IPs that have been assigned to a certain domain by
the client and perform NAT. We call this concept "DNS resource NAT".
Currently, the domain + the assigned IPs are sent together with the
`allow_access` or `request_connection` message via the portal. The new
control protocol defined in #6732 purposely excludes this information
and only authorises traffic to the entire resource which could also be a
wildcard-DNS resource.
To exchange the assigned IPs for a certain domain with the gateway, we
introduce our own p2p control protocol built on top of IP. All control
protocol messages are sent through the tunnel and thus encrypted at all
times. They are differentiated from regular application traffic as
follows:
- IP src is set to the unspecified IPv6 address (`::`)
- IP dst is set to the unspecified IPv6 address (`::`)
- IP protocol is set to reserved (`0xFF`)
The combination of all three should never appear as regular traffic.
To ensure forwards-compatibility, the control protocol utilises a fixed
8-byte header where the first byte denotes the message kind. In this
current design, there is no concept of a request or response in the
wire-format. Each message is unidirectional and the fact that the two
messages we define in here appear in tandem is purely by convention. We
use the IPv6 payload length to determine the total length of the packet.
The payloads are JSON-encoded. Message types are free to chose whichever
encoding they'd like.
This protocol is sent through the WireGuard tunnel, meaning we are
effectively limited by our device MTU of 1280, otherwise we'd have to
implement fragmentation. For the messages of setting up the DNS resource
NAT, we are below this limit:
- UUIDs are 16 bytes
- Domain names are at most 255 bytes
- IPv6 addresses are 16 bytes * 4
- IPv4 addressers are 4 bytes * 4
Including the JSON serialisation overhead, this results in a total
maximum payload size of 402 bytes, which is well below our MTU.
Finally, another thing to consider here is that IP is unreliable,
meaning each use of this protocol needs to make sure that:
- It is resilient against message re-ordering
- It is resilient against packet loss
The details of how this is ensured for setting up the DNS resource NAT
is left to #6732.
One of the key differences of the new control protocol designed in #6461
is that creating new connections is idempotent. We achieve this by
having the portal generate the ICE credentials and the preshared-key for
the WireGuard tunnel. As long as the ICE credentials don't change, we
don't need to make a new connection.
For `snownet`, this means we are deprecating the previous APIs for
making connections. The client-side APIs will have to stay around until
we merge the client-part of the new control protocol. The server-side
APIs will have to stay around until we remove backwards-compatibility
from the gateway.
As part of #6732, we will be using the tunnel for the p2p control
protocol to setup the DNS resource NAT on the gateway. These messages
will be immediately queued after creating the connection, before ICE is
finished. In order to avoid additional retries within `firezone_tunnel`,
we directly attempt to encapsulate these packets.
`boringtun` internally has a buffer for these because prior to having a
WireGuard session, we don't actually have the necessary keys to encrypt
a packet. Thus, the packet passed here won't actually end up in the
referenced buffer of the `Connecting` state. Instead, if we haven't
already initiated a WireGuard handshake, attempting to encapsulate a
packet will trigger a WireGuard handshake initiation and _that_ is the
packet we need to buffer.
Dropping this packet would require us to wait until `boringtun`
retransmits it as part of the handshake timeout, causing an unnecessary
delay in the connection setup.
We call the `add_ips_with_resource` function with a list of `IpAddr` or
`IpNetwork`s. To make this more ergonomic for the caller, we can accept
an iterator that converts the items on the fly.
The `len` specified in the constructor of `IpPacket` is user-provided.
Technically, that one can be longer than the actual packet. To make sure
we only ever pass out the precise payload of the IP packet, we read the
length from the IP header and cut the slice at the specified length.
For #6461, we will build a control protocol on top of IP that runs
through the WireGuard tunnel. Reading the exact length of the payload is
important for that.
To reduce the TTFB, we immediately force a WireGuard handshake as soon
as ICE completes. Currently, both the client and the gateway do this
which is somewhat counter-productive as there can only be one active
session.
With this patch, only the client will force a new WireGuard handshake as
soon as ICE completes.
With the new control protocol specified in #6461, the client will no
longer initiate new connections. Instead, the credentials are generated
deterministically by the portal based on the gateway's and the client's
public key. For as long as they use the same public key, they also have
the same in-memory state which makes creating connections idempotent.
What we didn't consider in the new design at first is that when clients
roam, they discard all connections but keep the same private key. As a
result, the portal would generate the same ICE credentials which means
the gateway thinks it can reuse the existing connection when new flows
get authorized. The client however discarded all connections (and
rotated its ports and maybe IPs), meaning the previous candidates sent
to the gateway are no longer valid and connectivity fails.
We fix this by also rotating the private keys upon reset. Rotating the
keys itself isn't enough, we also need to propagate the new public key
all the way "over" to the phoenix channel component which lives
separately from connlib's data plane.
To achieve this, we change `PhoenixChannel` to now start in the
"disconnected" state and require an explicit `connect` call. In
addition, the `LoginUrl` constructed by various components now acts
merely as a "prototype", which may require additional data to construct
a fully valid URL. In the case of client and gateway, this is the public
key of the `Node`. This additional parameter needs to be passed to
`PhoenixChannel` in the `connect` call, thus forming a type-safe
contract that ensures we never attempt to connect without providing a
public key.
For the relay, this doesn't apply.
Lastly, this allows us to tidy up the code a bit by:
a) generating the `Node`'s private key from the existing RNG
b) removing `ConnectArgs` which only had two members left
Related: #6461.
Related: #6732.
This makes it easier to ignore random issues from my dev system.
Also added OS tag (`linux` or `windows`) since that doesn't seem to be a
default for Sentry.
```[tasklist]
- [ ] Bikeshed the name `firezone_id` since it'll be hard to change later
```
<img width="367" alt="image"
src="https://github.com/user-attachments/assets/2e936aea-5c36-4208-965a-c578ff8407b7">
Refs #6927
This PR creates a GTK+ event loop, a blank window, and the tray menu. It
connects to the IPC service, you can sign in and everything, but the
About window, Settings window, and Welcome window aren't implemented.
We build a deb package in CI but it isn't pushed to the draft releases
in CD yet.

Pros over Iced:
- More mature
- Easy integration with `tray-icon`
- Small binaries (< 1 MB for this example)
Cons:
- GTK 3.x is abandoned as of March. GTK 4 isn't packaged for Ubuntu
20.04.
- Widgets might be hard to use
- Hard to set up on Windows, only using this for Linux for now
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Bumps
[tauri-winrt-notification](https://github.com/tauri-apps/winrt-notification)
from 0.5.0 to 0.6.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/tauri-apps/winrt-notification/releases">tauri-winrt-notification's
releases</a>.</em></p>
<blockquote>
<h2>tauri-winrt-notification v0.6.0</h2>
<p>Updating crates.io index
Locking 25 packages to latest compatible versions
Adding quick-xml v0.31.0 (latest: v0.36.1)</p>
<!-- raw HTML omitted -->
<pre><code>Fetching advisory database from
`https://github.com/RustSec/advisory-db.git`
Loaded 647 security advisories (from /home/runner/.cargo/advisory-db)
Updating crates.io index
Scanning Cargo.lock for vulnerabilities (25 crate dependencies)
</code></pre>
<!-- raw HTML omitted -->
<h2>[0.6.0]</h2>
<ul>
<li><a
href="30d14afed6"><code>30d14af</code></a>
(<a
href="https://redirect.github.com/tauri-apps/winrt-notification/pull/33">#33</a>
by <a
href="https://github.com/tauri-apps/winrt-notification/../../amrbashir"><code>@amrbashir</code></a>)
Update <code>windows</code> crate to <code>0.58</code></li>
</ul>
<!-- raw HTML omitted -->
<pre><code>Updating crates.io index
Packaging tauri-winrt-notification v0.6.0
(/home/runner/work/winrt-notification/winrt-notification)
Updating crates.io index
Packaged 29 files, 91.2KiB (42.8KiB compressed)
Uploading tauri-winrt-notification v0.6.0
(/home/runner/work/winrt-notification/winrt-notification)
Uploaded tauri-winrt-notification v0.6.0 to registry `crates-io`
note: waiting for `tauri-winrt-notification v0.6.0` to be available at
registry `crates-io`.
You may press ctrl-c to skip waiting; the crate should be available
shortly.
Published tauri-winrt-notification v0.6.0 at registry `crates-io`
</code></pre>
<!-- raw HTML omitted -->
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/tauri-apps/winrt-notification/blob/dev/CHANGELOG.md">tauri-winrt-notification's
changelog</a>.</em></p>
<blockquote>
<h2>[0.6.0]</h2>
<ul>
<li><a
href="30d14afed6"><code>30d14af</code></a>
(<a
href="https://redirect.github.com/tauri-apps/winrt-notification/pull/33">#33</a>
by <a
href="https://github.com/tauri-apps/winrt-notification/../../amrbashir"><code>@amrbashir</code></a>)
Update <code>windows</code> crate to <code>0.58</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="41d83462df"><code>41d8346</code></a>
Publish New Versions (<a
href="https://redirect.github.com/tauri-apps/winrt-notification/issues/36">#36</a>)</li>
<li><a
href="f0a6582996"><code>f0a6582</code></a>
ci: simplify covector config</li>
<li><a
href="30d14afed6"><code>30d14af</code></a>
chore(deps): update windows crate to 0.58 (<a
href="https://redirect.github.com/tauri-apps/winrt-notification/issues/33">#33</a>)</li>
<li>See full diff in <a
href="https://github.com/tauri-apps/winrt-notification/compare/tauri-winrt-notification-v0.5...tauri-winrt-notification-v0.6">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.18 to 4.5.19.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/releases">clap's
releases</a>.</em></p>
<blockquote>
<h2>v4.5.19</h2>
<h2>[4.5.19] - 2024-10-01</h2>
<h3>Internal</h3>
<ul>
<li>Update dependencies</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/blob/master/CHANGELOG.md">clap's
changelog</a>.</em></p>
<blockquote>
<h2>[4.5.19] - 2024-10-01</h2>
<h3>Internal</h3>
<ul>
<li>Update dependencies</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="108907385c"><code>1089073</code></a>
chore: Release</li>
<li><a
href="c9b8c85f09"><code>c9b8c85</code></a>
docs: Update changelog</li>
<li><a
href="8b3de18a8d"><code>8b3de18</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5685">#5685</a>
from epage/engine</li>
<li><a
href="b38538d7c4"><code>b38538d</code></a>
fix(complete)!: Rename dynamic to engine</li>
<li><a
href="232af62f7d"><code>232af62</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5684">#5684</a>
from epage/endless</li>
<li><a
href="0209a79031"><code>0209a79</code></a>
fix(complete): Don't cause endless completions for bash/zsh</li>
<li>See full diff in <a
href="https://github.com/clap-rs/clap/compare/clap_complete-v4.5.18...clap_complete-v4.5.19">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the npm_and_yarn group in /rust/gui-client with 1 update:
[micromatch](https://github.com/micromatch/micromatch).
Updates `micromatch` from 4.0.5 to 4.0.8
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/micromatch/micromatch/releases">micromatch's
releases</a>.</em></p>
<blockquote>
<h2>4.0.8</h2>
<p>Ultimate release that fixes both CVE-2024-4067 and CVE-2024-4068. We
consider the issues low-priority, so even if you see automated scanners
saying otherwise, don't be scared.</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/micromatch/micromatch/blob/master/CHANGELOG.md">micromatch's
changelog</a>.</em></p>
<blockquote>
<h2>[4.0.8] - 2024-08-22</h2>
<ul>
<li>backported CVE-2024-4067 fix (from v4.0.6) over to 4.x branch</li>
</ul>
<h2>[4.0.7] - 2024-05-22</h2>
<ul>
<li>this is basically v4.0.5, with some README updates</li>
<li><strong>it is vulnerable to CVE-2024-4067</strong></li>
<li>Updated braces to v3.0.3 to avoid CVE-2024-4068</li>
<li>does NOT break API compatibility</li>
</ul>
<h2>[4.0.6] - 2024-05-21</h2>
<ul>
<li>Added <code>hasBraces</code> to check if a pattern contains
braces.</li>
<li>Fixes CVE-2024-4067</li>
<li><strong>BREAKS API COMPATIBILITY</strong></li>
<li>Should be labeled as a major release, but it's not.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8bd704ec0d"><code>8bd704e</code></a>
4.0.8</li>
<li><a
href="a0e68416a4"><code>a0e6841</code></a>
run verb to generate README documentation</li>
<li><a
href="4ec288484f"><code>4ec2884</code></a>
Merge branch 'v4' into hauserkristof-feature/v4.0.8</li>
<li><a
href="03aa805217"><code>03aa805</code></a>
Merge pull request <a
href="https://redirect.github.com/micromatch/micromatch/issues/266">#266</a>
from hauserkristof/feature/v4.0.8</li>
<li><a
href="814f5f70ef"><code>814f5f7</code></a>
lint</li>
<li><a
href="67fcce6a10"><code>67fcce6</code></a>
fix: CHANGELOG about braces & CVE-2024-4068, v4.0.5</li>
<li><a
href="113f2e3fa7"><code>113f2e3</code></a>
fix: CVE numbers in CHANGELOG</li>
<li><a
href="d9dbd9a266"><code>d9dbd9a</code></a>
feat: updated CHANGELOG</li>
<li><a
href="2ab13157f4"><code>2ab1315</code></a>
fix: use actions/setup-node@v4</li>
<li><a
href="1406ea38f3"><code>1406ea3</code></a>
feat: rework test to work on macos with node 10,12 and 14</li>
<li>Additional commits viewable in <a
href="https://github.com/micromatch/micromatch/compare/4.0.5...4.0.8">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/firezone/firezone/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This matches roughly when the IPC service deactivates DNS control.
I did this because while debugging #6777 I've accidentally trashed my
DNS and then Windows says it has no Internet and so the headless Client
can't connect to the portal until I run the IPC service to deactivate
DNS control.
Following up from #6919, this PR introduces a dedicated, internal model
for resources as to how the client uses them. This separation serves
several purposes:
1. It allows us to introduce an `Unknown` resource type, ensuring
forwards-compatibility with future resource types.
2. It allows us to remove trait implementations like `PartialEq` or
`PartialOrd` from the message types. With #6732, the messages will
include types like `SecretKey`s which cannot be compared.
3. A decoupling of serialisation and domain models is good practice in
general and has long been overdue.
Extracted from #6838
`tray-icon` as used by the GTK prototype handles checkboxes a little
different than the older `tray-icon` as exposed by Tauri v1, this
accounts for that difference.
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
The `connlib-shared` crate has become a bit of a dependency magnet
without a clear purpose. It hosts utilities like `get_user_agent`,
messages for the client and gateway to communicate with the portal and
domain types like `ResourceId`.
To create a better dependency structure in our workspace, we repurpose
`connlib-shared` as a `connlib-model` crate. Its purpose is to host
domain-specific model types that multiple crates may want to use. For
that purpose, we rename the `callbacks::ResourceDescription` type to
`ResourceView`, designating that this is a _view_ onto a resource as
seen by `connlib`. The message types which currently double up as
connlib-internal model thus become an implementation detail of
`firezone-tunnel` and shouldn't be used for anything else.
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Extracted from #6838
This leads to extra cloning of strings, but if there's less than 1,000
Resources and the tray doesn't update often, it should be fine. We can
sample performance with sentry.io if we're worried.
As part of iterating on the correct approach in #6857, we at some point
removed the hashing of the Firezone-generated device ID. This will break
customers because all of the device IDs as seen by the portal are
changing.
We've since settled on a different approach for device verification. To
not break anyone, we are re-introducing hashing of the device ID.
Related: #6857.
When debugging DNS-related issues, it is useful to see all DNS queries
that go into `connlib` and the responses that we generate. Analogous to
the `wire::net` and `wire::dev` TRACE logs, we introduce `wire::dns`
which logs incoming queries and the responses on TRACE. The output looks
like this:
```
2024-10-02T00:16:47.522847Z TRACE wire::dns::qry: A caldav.fastmail.com qid=55845
2024-10-02T00:16:47.522926Z TRACE wire::dns::qry: AAAA caldav.fastmail.com qid=56277
2024-10-02T00:16:47.531347Z TRACE wire::dns::res: AAAA caldav.fastmail.com => [] qid=56277 rcode=NOERROR
2024-10-02T00:16:47.538984Z TRACE wire::dns::res: A caldav.fastmail.com => [103.168.172.46 | 103.168.172.61] qid=55845 rcode=NOERROR
2024-10-02T00:16:47.580237Z TRACE wire::dns::qry: HTTPS cloudflare-dns.com qid=21518
2024-10-02T00:16:47.580338Z TRACE wire::dns::qry: A example.org qid=35459
2024-10-02T00:16:47.580364Z TRACE wire::dns::qry: AAAA example.org qid=60073
2024-10-02T00:16:47.580699Z TRACE wire::dns::qry: AAAA ipv4only.arpa qid=17280
2024-10-02T00:16:47.580782Z TRACE wire::dns::qry: A ipv4only.arpa qid=47215
2024-10-02T00:16:47.581134Z TRACE wire::dns::qry: A detectportal.firefox.com qid=34970
2024-10-02T00:16:47.581261Z TRACE wire::dns::qry: AAAA detectportal.firefox.com qid=39505
2024-10-02T00:16:47.609502Z TRACE wire::dns::res: AAAA example.org => [2606:2800:21f:cb07:6820:80da:af6b:8b2c] qid=60073 rcode=NOERROR
2024-10-02T00:16:47.609640Z TRACE wire::dns::res: AAAA ipv4only.arpa => [] qid=17280 rcode=NOERROR
2024-10-02T00:16:47.610407Z TRACE wire::dns::res: A ipv4only.arpa => [192.0.0.170 | 192.0.0.171] qid=47215 rcode=NOERROR
2024-10-02T00:16:47.617952Z TRACE wire::dns::res: HTTPS cloudflare-dns.com => [1 alpn=h3,h2 ipv4hint=104.16.248.249,104.16.249.249 ipv6hint=2606:4700::6810:f8f9,2606:4700::6810:f9f9] qid=21518 rcode=NOERROR
2024-10-02T00:16:47.631124Z TRACE wire::dns::res: A example.org => [93.184.215.14] qid=35459 rcode=NOERROR
2024-10-02T00:16:47.640286Z TRACE wire::dns::res: AAAA detectportal.firefox.com => [detectportal.prod.mozaws.net. | prod.detectportal.prod.cloudops.mozgcp.net. | 2600:1901:0:38d7::] qid=39505 rcode=NOERROR
2024-10-02T00:16:47.641332Z TRACE wire::dns::res: A detectportal.firefox.com => [detectportal.prod.mozaws.net. | prod.detectportal.prod.cloudops.mozgcp.net. | 34.107.221.82] qid=34970 rcode=NOERROR
2024-10-02T00:16:48.737608Z TRACE wire::dns::qry: AAAA myfiles.fastmail.com qid=52965
2024-10-02T00:16:48.737710Z TRACE wire::dns::qry: A myfiles.fastmail.com qid=5114
2024-10-02T00:16:48.745282Z TRACE wire::dns::res: AAAA myfiles.fastmail.com => [] qid=52965 rcode=NOERROR
2024-10-02T00:16:49.027932Z TRACE wire::dns::res: A myfiles.fastmail.com => [103.168.172.46 | 103.168.172.61] qid=5114 rcode=NOERROR
2024-10-02T00:16:49.190054Z TRACE wire::dns::qry: HTTPS github.com qid=64696
2024-10-02T00:16:49.190171Z TRACE wire::dns::qry: A github.com qid=11912
2024-10-02T00:16:49.190502Z TRACE wire::dns::res: A github.com => [100.96.0.1 | 100.96.0.2 | 100.96.0.3 | 100.96.0.4] qid=11912 rcode=NOERROR
2024-10-02T00:16:49.190619Z TRACE wire::dns::qry: A github.com qid=63366
2024-10-02T00:16:49.190730Z TRACE wire::dns::res: A github.com => [100.96.0.1 | 100.96.0.2 | 100.96.0.3 | 100.96.0.4] qid=63366 rcode=NOERROR
```
As with the other filters, seeing both queries and responses can be
achieved with `RUST_LOG=wire::dns=trace`. If you are only interested in
the responses, you can activate a more specific log filter using
`RUST_LOG=wire::dns::res=trace`. All responses also print the original
query that they are answering.
Resolves: #6862.
Our relays are essential for connectivity because they also perform STUN
for us through which we learn our server-reflexive address. Thus, we
must at all times have at least one relay that we can reach in order to
establish a connection.
The portal tracks the connectivity to the relays for us and in case any
of them go down, sends us a `relays_presence` message, meaning we can
stop using that relay and migrate any relayed connections to a new one.
This works well for as long as we are connected to the portal while the
relay is rebooting / going-down. If we are not currently connected to
the portal and a relay we are using reboots, we don't learn about it. At
least if we are actively using it, the connection will fail and further
attempted communication with the relay will time-out and we will stop
using it.
In case we aren't currently using the relay, this gets a bit trickier.
If we aren't using the relay but it rebooted while we were partitioned
from the portal, logging in again might return the same relay to us in
the `init` message, but this time with different credentials.
The first bug that we are fixing in this PR is that we previously
ignored those credentials because we already knew about the relay,
thinking that we can still use our existing credentials. The fix here is
to also compare the credentials and ditch the local state if they
differ.
The second bug identified during fixing the first one is that we need to
pro-actively probe whether all other relays that we know about are
actually still responsive. For that, we issue a `REFRESH` message to
them. If that one times-out or fails otherwise, we will remove that one
from our list of `Allocation`s too.
To fix the 2nd bug, several changes were necessary:
1. We lower the log-level of `Disconnecting from relay` from ERROR to
WARN. Any ERROR emitted during a test-run fails our test-suite which is
what partially motivated this. The test suite builds on the assumption
that ERRORs are fatal and thus should never happen during our tests.
This change surfaces that disconnecting from a relay can indeed happen
during normal operation, which justifies lowering this to WARN. Users
should at the minimum monitor on WARN to be alerted about problems.
2. We reduce the total backoff duration for requests to relays from 60s
to 10s. The current 60s result in total of 8 retries. UDP is unreliable
but it isn't THAT unreliable to justify retrying everything for 60s. We
also use a 10s timeout for ICE, which means these are now aligned to
better match each other. We had to change the max backoff duration
because we only idle-spin for at most 10s in the tests and thus the
current 60s were too long to detect that a relay actually disappeared.
3. We had to shuffle around some function calls to make sure all
intermediary event buffers are emptied at the right point in time to
make the test deterministic.
Fixes: #6648.
We want to associate additional device information for the device
verification, these are all parameters that tries to uniquely identify
the hardware.
For that reason we read system information and send it as part of the
query params to the portal, that way the portal can store this when
device is verified and match against that later on.
These are the parameters according to each platform:
|Platform|Query Field|Field Meaning|
|-----|----|-----|
|MacOS|`device_serial`|Hardware's Serial|
|MacOS| `device_uuid`|Hardware's UUID|
|iOS|`identifier_for_vendor`| Identifier for vendor, resets only on
uninstall/install|
|Android|`firebase_installation_id`| Firebase installation ID, resets
only on uninstall/install|
|Windows|`device_serial`|Motherboard's Serial|
|Linux|`device_serial`|Motherboard's Serial|
Fixes#6837
When relays reboot or get redeployed, the portal sends us new relays to
use and or relays we should discontinue using. To be more efficient with
battery and network usage, `connlib` only ever samples a single relay
out of all existing ones for a particular connection.
In case of a network topology where we need to use relays, there are
situations we can end up in:
- The client connects to the gateway's relay, i.e. to the port the
gateway allocated on the relay.
- The gateway connects to the client's relay, i.e to the port the client
allocated on the relay.
When we detect that a relay is down, the party that allocated the port
will now immediately (once #6666 is merged). The other party needs to
wait until it receives the invalidated candidates from its peer.
Invalidating that candidate will also invalidate the currently nominated
socket and fail the connection. In theory at least. That only works if
there are no other candidates available to try.
This is where this patch becomes important. Say we have the following
setup:
- Client samples relay A.
- Gateway samples relay B.
- The nominated candidate pair is "client server-reflexive <=> relay B",
i.e. the client talks to the allocated port on the gateway.
Next:
1. Client and portal get network-partitioned.
2. Relay B disappears.
3. Relay C appears.
4. Relay A reboots.
5. Client reconnects.
At this point, the client is told by the portal to use relays A & C.
Note that relay A rebooted and thus the allocation previously present on
the client is no longer valid. With #6666, we will detect this by
comparing credentials & IPs. The gateway is being told about the same
relays and as part of that, tests that relay B is still there. It learns
that it isn't, invalidates the candidates which fails the connection to
the client (but only locally!).
Meanwhile, as part of the regular `init` procedure, the client made a
new allocation with relays A & C. Because it had previously selected
relay A for the connection with the gateway, the new candidates are
added to the agent, forming new pairs. The gateway has already given up
on this connection however so it won't ever answer these STUN requests.
Concurrently, the gateway's invalidated candidates arrive the client.
They however don't fail the connection because the client is probing the
newly added candidates. This creates a state mismatch between the client
and gateway that is only resolved after the candidates start timing out,
adding an additional delay during which the connection isn't working.
With this PR, we prevent this from happening by only ever adding new
candidates while we are still in the nomination process of a socket. In
theory, there exists a race condition in which we nominate a relay
candidate first and then miss out on a server-reflexive candidate not
being added. In practice, this won't happen because:
- Our host candidates are always available first.
- We learn server-reflexive candidates already as part of the initial
BINDING, before creating the allocation.
- We learn server-reflexive candidates from all relays, not just the one
that has been assigned.
Related: #6666.