At present, `connlib` can process a resource list gracefully that
handles unknown resource types. If a known type fails to match the
schema however, we fail to deserialise the entire list.
To reduce the blast radius of potential bugs here, we accept everything
that is valid JSON as the "value" of a resource. Only when processing
the individual items will we attempt to deserialise it into the expected
model, skipping any resources that cannot be deserialised.
At present, Clients are only allowed to send packets to resources
accessible via the Gateway but not to the Gateway itself. Thus, any
application (including Firezone itself) that opens a listening socket on
the TUN device will never receive any traffic.
This has opens up interesting features like hosting additional services
on the machine that the Gateway is running on. Concretely, in order to
implement #8221, we will run a DNS server on port 53 of the TUN device
as part of the Gateway.
The diff for this ended up being a bit larger because we are introducing
an `IpConfig` abstraction so we don't have to track 4 IP addresses as
separate fields within `ClientOnGateway`; the connection-specific state
on a Gateway. This is where we allow / deny traffic from a Client. To
allow traffic for this particular Gateway, we need to know our own TUN
IP configuration within the component.
Currently, we wait for the first "backoff" duration when the WebSocket
disconnects. Instead, we should just try to reconnect immediately and
only wait if we hit another error.
Whilst we are establishing a connection, the host network stack may run
into timeouts and retransmit packets. Buffering these copies doesn't
make any sense because we are then just flooding the remote with e.g. 4
TCP SYNs for the same connection.
This check is O(N) with the number of buffered packets. Those are at
most a few dozens so there shouldn't be a need for anything more
efficient.
This effectively reverts #8223 due to how this interacts with the
generated packages on Linux. The _package_ itself should still be called
`firezone-client-gui` because that is what we are installing. Perhaps we
will one day add a headless-client package so the naming chosen here
should allow for that.
To customize the desktop entry, we instead make use of the
`desktopTemplate` configuration of the Tauri bundler where we can
provide a custom `.desktop` file where we can specify a particular
application name.
As part of this, we are also updating the docs on the website to mention
the new name `Firezone Client`.
Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.11.0 to 1.14.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/uuid-rs/uuid/releases">uuid's
releases</a>.</em></p>
<blockquote>
<h2>v1.14.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Add FromStr impls to the fmt structs by <a
href="https://github.com/tysen"><code>@tysen</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/806">uuid-rs/uuid#806</a></li>
<li>Prepare for 1.14.0 release by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/807">uuid-rs/uuid#807</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/tysen"><code>@tysen</code></a> made
their first contribution in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/806">uuid-rs/uuid#806</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/uuid-rs/uuid/compare/v1.13.2...v1.14.0">https://github.com/uuid-rs/uuid/compare/v1.13.2...v1.14.0</a></p>
<h2>v1.13.2</h2>
<h2>What's Changed</h2>
<ul>
<li>Add a compile_error when no source of randomness is available on
wasm32-unknown-unknown by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/804">uuid-rs/uuid#804</a></li>
<li>Prepare for 1.13.2 release by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/805">uuid-rs/uuid#805</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/uuid-rs/uuid/compare/1.13.1...v1.13.2">https://github.com/uuid-rs/uuid/compare/1.13.1...v1.13.2</a></p>
<h2>1.13.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix <code>wasm32</code> with <code>atomics</code> by <a
href="https://github.com/bushrat011899"><code>@bushrat011899</code></a>
in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/797">uuid-rs/uuid#797</a></li>
<li>Prepare for 1.13.1 release by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/799">uuid-rs/uuid#799</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/bushrat011899"><code>@bushrat011899</code></a>
made their first contribution in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/797">uuid-rs/uuid#797</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/uuid-rs/uuid/compare/1.13.0...1.13.1">https://github.com/uuid-rs/uuid/compare/1.13.0...1.13.1</a></p>
<h2>1.13.0</h2>
<h2>⚠️ Potential Breakage</h2>
<p>This release updates our version of <code>getrandom</code> to
<code>0.3</code> and <code>rand</code> to <code>0.9</code>. It is a
<strong>potentially breaking change</strong> for the following
users:</p>
<h3>no-std users who enable the <code>rng</code> feature</h3>
<p><code>uuid</code> still uses <code>getrandom</code> by default on
these platforms. Upgrade your version of <code>getrandom</code> and <a
href="https://docs.rs/getrandom/0.3.1/getrandom/index.html#custom-backend">follow
its new docs</a> on configuring a custom backend.</p>
<h3><code>wasm32-unknown-unknown</code> users who enable the
<code>rng</code> feature without the <code>js</code> feature</h3>
<p>Upgrade your version of <code>getrandom</code> and <a
href="https://docs.rs/getrandom/0.3.1/getrandom/index.html#custom-backend">follow
its new docs</a> on configuring a backend.</p>
<p>You'll also need to enable the <code>rng-getrandom</code> or
<code>rng-rand</code> feature of <code>uuid</code> to force it to use
<code>getrandom</code> as its backend:</p>
<pre lang="diff"><code>[dependencies.uuid]
version = "1.13.0"
- features = ["v4"]
+ features = ["v4", "rng-getrandom"]
<p>[dependencies.getrandom]
</tr></table>
</code></pre></p>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="bf5b0b84d2"><code>bf5b0b8</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/807">#807</a> from
uuid-rs/cargo/v1.14.0</li>
<li><a
href="daa07949e9"><code>daa0794</code></a>
prepare for 1.14.0 release</li>
<li><a
href="6bd7bc791b"><code>6bd7bc7</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/806">#806</a> from
tysen/add-fromstr-impls-to-fmt</li>
<li><a
href="5b0ca42c80"><code>5b0ca42</code></a>
Add FromStr impls to the fmt structs</li>
<li><a
href="d8871b3b03"><code>d8871b3</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/805">#805</a> from
uuid-rs/cargo/v1.13.2</li>
<li><a
href="704421094a"><code>7044210</code></a>
prepare for 1.13.2 release</li>
<li><a
href="7893ecce7f"><code>7893ecc</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/804">#804</a> from
uuid-rs/fix/wasm-no-rng</li>
<li><a
href="bf28001d53"><code>bf28001</code></a>
update feature docs</li>
<li><a
href="920e8b183f"><code>920e8b1</code></a>
add a more descriptive compile error when no rng source is available on
wasm</li>
<li><a
href="54214179a6"><code>5421417</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/799">#799</a> from
uuid-rs/cargo/1.13.1</li>
<li>Additional commits viewable in <a
href="https://github.com/uuid-rs/uuid/compare/1.11.0...v1.14.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
In #8159, we introduced a regression that could lead to a deadlock when
shutting down the TUN device. Whilst we did close the channel prior to
awaiting the thread to exit, we failed to notice that _another_ instance
of the sender could be alive as part of an internally stored "sending
permit" with the `PollSender` in case another packet is queued for
sending. We need to explicitly call `abort_send` to free that.
Judging from the comment and a prior bug, this shutdown logic has been
buggy before. To further avoid this deadlock, we introduce two changes:
- The worker threads only receive a `Weak` reference to the
`wintun::Session`
- We move all device-related state into a dedicated `TunState` struct
that we can drop prior to joining the threads
The combination of these features means that all strong references to
channels and the session are definitely dropped without having to wait
for anything. To provide a clean and synchronous shutdown, we wait for
at most 5s on the worker-threads. If they don't exit until then, we log
a warning and exit anyway.
This should greatly reduce the risk of future bugs here because the
session (and thus the WinTUN device) gets shutdown in any case and so at
worst, we have a few zombie threads around.
Resolves: #8265
This configures the GUI client to log to journald in addition to files
as well. For better or worse, this logs all events such that structured
information is preserved, e.g. all additional fields next to the message
are also saved as fields in the journal. By default, when viewing the
logs via `journalctl`, those fields are not displayed. This makes the
default output of `journalctl` for the FIrezone GUI not as useful as it
could be. Fixing that is left to a later stage.
Related: #8173
Using `rustup` - even on NixOS - is easier to manage the Rust toolchain
as some tools rely on being able to use the `rustup` shims such as
`+nightly` to run a nightly toolchain.
With the addition of the Firezone Control Protocol, we are now issuing a
lot more DNS queries on the Gateway. Specifically, every DNS query for a
DNS resource name always triggers a DNS query on the Gateway. This
ensures that changes to DNS entries for resources are picked up without
having to build any sort of "stale detection" in the Gateway itself. As
a result though, a Gateway has to issue a lot of DNS queries to upstream
resolvers which in 99% or more cases will return the same result.
To reduce the load on these upstream, we cache successful results of DNS
queries for 5 minutes.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
On Linux, logs sent to stdout from a systemd-service are automatically
captured by `journald`. This is where most admins expect logs to be and
frankly, doing any kind of debugging of Firezone is much easier if you
can do `journalctl -efu firezone-client-ipc.service` in a terminal and
check what the IPC service is doing.
On Windows, stdout from a service is (unfortunately) ignored.
To achieve this and also allow dynamically changing the log-filter, I
had to introduce a (long-overdue) abstraction over tracing's "reload"
layer that allows us to combine multiple reload-handles into one.
Unfortunately, neither the `reload::Layer` nor the `reload::Handle`
implement `Clone`, which makes this unnecessarily difficult.
Related: #8173
The Gateway keeps some state for each client connection. Part of this
state are filters which can be controlled via the Firezone portal. Even
if no filters are set in the portal, the Gateway uses this data
structure to ensure only packets to allowed resources are forwarded. If
a resource is not allowed, its IP won't exist in the `IpNetworkTable` of
filters and thus won't be allowed.
When a Client disconnects, the Gateway cleans up this data structure and
thus all filters etc are gone. As soon as a Client reconnects, default
filters are installed (which don't allow anything) under the same IP
(the portal always assigns the same IP to Clients).
These filters are only applied on _outbound_ traffic (i.e. from the
Client towards Resources). As a result, packets arriving from Resources
to a Client will still be routed back, causing "Source not allowed"
errors on the client (which has lost all of its state when restarting).
To fix this, we apply the Gateway's filters also on the reverse path of
packets from Resources to Clients.
Resolves: #5568Resolves: #7521Resolves: #6091
At present our Rust implementation of the Phoenix Channel client tries
to detect missing heartbeat responses from the portal. This is
unnecessary and causes brittleness in production.
The WebSocket connection runs over TCP, meaning any kind of actual
network problem / partition will be detected by TCP itself and cause an
IO error further up the stack. In order to keep NAT bindings alive, we
only need to send _some_ traffic every so often, meaning sending a
heartbeat is good enough. We don't need to actually handle the response
in any particular way.
Lastly, by just using an interval, I realised that we can very easily
implement an optimisation from the Phoenix spec: Only send heartbeats if
you haven't sent anything else.
In theory, WebSocket ping/pong frames could be used for this keep-alive
mechanism. Unfortunately, as I understand the Phoenix spec, it requires
its own heartbeat to be sent, otherwise it will disconnect the
WebSocket.
This is pretty confusing when reading logs. For inbound packets, we
assume that if we don't have a NAT session, they belong to the Internet
Resource or a CIDR resource, meaning this log shows up for all packets
for those resources and even for packets that don't belong to any
resource at all.
On Linux desktops, we install a dedicated `.desktop` file that is
responsible for handling our deep-links for sign-in. This desktop entry
is not meant to be launched manually and therefore should be hidden from
the application menus.
The WebSocket connection to the portal from within the Clients, Gateways
and Relays may be temporarily interrupted by IO errors. In such cases we
simply reconnect to it. This isn't as much of a problem for Clients and
Gateways. For Relays however, a disconnect can be disruptive for
customers because the portal will send `relays_presence` events to all
Clients and Gateways. Any relayed connection will therefore be
interrupted. See #8177.
Relays run on our own infrastructure and we want to be notified if their
connection flaps.
In order to differentiate between these scenarios, we remove the logging
from within `phoenix-channel` and report these connection hiccups one
layer up. This allows Clients and Gateways to log them on DEBUG whereas
the Relay can log them on WARN.
Related: #8177
Related: #7004
Now that we have error reporting via Sentry in Swift-land as well, we
can handle errors in the FFI layer more gracefully and return them to
Swift.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
At present, the relay uses a priority in the event-loop that favors
routing traffic. Whenever a task further up in the loop is
`Poll::Ready`, we loop back to the top to continue processing. The issue
with that is that in very busy times, this can lead to starvation in
processing timers and messages from the portal. If we then finally get
to process portal messages, we think that the portal hasn't replied in
some time and proactively cut the connection and reconnect.
As a result, the portal will send `relays_presence` messages to the
clients and gateways which in turn will locally remove the relay. This
breaks relayed connections.
To fix this, instead of immediately traversing to the top of the
event-loop with `continue`, we only set a boolean. This gives each
element of the event-loop a chance to execute, even when a certain
component is very busy.
Related: #8165
Related: #8176
Whilst ICE for a connection is in progress, it might happen that packets
for a particular client are arriving at the Gateway's TUN device. I
assume that these might be from a previous session?
We can only negotiate a WireGuard session once we have a nominated
socket. Thus, the very first packet sent on a session will always
trigger a new handshake. We don't want Gateway's to start handshakes
though, those should always be initiated by the Clients.
To avoid this, we add a conditional to `snownet::Node` that drops
packets iff the current node is a `ServerNode` and we haven't nominated
a socket yet.
The following log output from a Gateway motivated this change:
```
2025-02-17T15:36:45.372Z INFO snownet::node: Connection failed (ICE timeout) cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
// Here the previous connection failed.
2025-02-17T15:36:45.989Z DEBUG firezone_tunnel::gateway: Unknown client, perhaps already disconnected? dst=100.64.69.110
2025-02-17T15:36:45.989Z DEBUG firezone_tunnel::gateway: Unknown client, perhaps already disconnected? dst=100.64.69.110
2025-02-17T15:36:45.989Z DEBUG firezone_tunnel::gateway: Unknown client, perhaps already disconnected? dst=100.64.69.110
2025-02-17T15:36:46.213Z DEBUG firezone_tunnel::gateway: Unknown client, perhaps already disconnected? dst=100.64.69.110
// Until here, packets for this client got dropped but now a new connection (for the same IP!) is being created.
2025-02-17T15:36:46.474Z DEBUG snownet::node: Sampled relay rid=b7198983-0cf6-48ba-a459-e7d27ef7d6c9 client_id=8b106344-ba59-4050-8f9a-e2f0bab6e9e5 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.474Z INFO str0m::ice_::agent: Set local credentials: IceCreds { ufrag: "ipcg", pass: "eyy6s27emu2joisw7aqc7q" } client_id=8b106344-ba59-4050-8f9a-e2f0bab6e9e5 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.474Z INFO str0m::ice_::agent: Set remote credentials: IceCreds { ufrag: "up5k", pass: "4q6uvhawhcbnhbqrddvy5x" } client_id=8b106344-ba59-4050-8f9a-e2f0bab6e9e5 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.474Z INFO str0m::ice_::agent: Add local candidate: Candidate(host=10.0.0.4:38621/udp prio=2130706175) client_id=8b106344-ba59-4050-8f9a-e2f0bab6e9e5 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.474Z INFO str0m::ice_::agent: Add local candidate: Candidate(relay=34.16.221.134:62250/udp prio=37748479) client_id=8b106344-ba59-4050-8f9a-e2f0bab6e9e5 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.474Z INFO str0m::ice_::agent: Add local candidate: Candidate(relay=[2600:1900:4180:ee3:0:78::]:62250/udp prio=37748735) client_id=8b106344-ba59-4050-8f9a-e2f0bab6e9e5 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.474Z INFO str0m::ice_::agent: State change (new connection): New -> Checking client_id=8b106344-ba59-4050-8f9a-e2f0bab6e9e5 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.474Z INFO snownet::node: Created new connection client_id=8b106344-ba59-4050-8f9a-e2f0bab6e9e5 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.475Z INFO firezone_tunnel::peer: Allowing access to resource client=8b106344-ba59-4050-8f9a-e2f0bab6e9e5 resource=dca3fcc6-b5e0-470a-bc7b-6446cdd03bb3 expires=Some("2025-02-24T15:09:11+00:00") client_id=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
// The connection has been created and very likely another packet has arrived at the TUN interface. This time though, we have an entry in our connection map for this IP and try to route it.
2025-02-17T15:36:46.546Z DEBUG boringtun::noise: Sending handshake_initiation cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.546Z DEBUG snownet::node: ICE is still in progress, buffering WG handshake num_buffered=1 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
// We buffered the handshake packet. This is only meant to be done by clients.
2025-02-17T15:36:46.572Z INFO str0m::ice_::agent: Created peer reflexive remote candidate from STUN request: Candidate(prflx=107.197.104.68:49376/udp prio=1862270719) cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.572Z DEBUG str0m::ice_::agent: Created new pair for STUN request: CandidatePair(1-0 prio=162128486503284223 state=Waiting attempts=0 unanswered=0 remote=0 last=None nom=None) cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.574Z INFO str0m::ice_::agent: Created peer reflexive remote candidate from STUN request: Candidate(prflx=[2600:1700:3ecb:2410:7499:175a:5c9:9bc5]:57622/udp prio=1862270975) cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.574Z DEBUG str0m::ice_::agent: Created new pair for STUN request: CandidatePair(2-1 prio=162129586014912511 state=Waiting attempts=0 unanswered=0 remote=0 last=None nom=None) cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.611Z DEBUG str0m::ice_::pair: Nominated pair: CandidatePair(2-1 prio=162129586014912511 state=Succeeded attempts=1 unanswered=0 remote=2 last=Some(Instant { tv_sec: 286264, tv_nsec: 840170135 }) nom=Nominated) cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.612Z INFO str0m::ice_::agent: State change (got nomination, still trying others): Checking -> Connected cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.612Z DEBUG snownet::node: Flushing packets buffered during ICE num_buffered=1 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.612Z INFO snownet::node: Updating remote socket old=None new=Relay { relay: b7198983-0cf6-48ba-a459-e7d27ef7d6c9, dest: [2600:1700:3ecb:2410:7499:175a:5c9:9bc5]:57622 } duration_since_intent=137.48517ms cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
// The connection has been established and we receive the (forced) handshake initiation by the client. However, we also flushed a handshake initiation.
2025-02-17T15:36:46.612Z DEBUG boringtun::noise: Received handshake_initiation remote_idx=731337473 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.613Z DEBUG boringtun::noise: Sending handshake_response local_idx=185230594 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.613Z DEBUG boringtun::noise: Sending handshake_initiation cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.629Z DEBUG snownet::node: Unknown connection or socket has already been nominated ignored_candidate=candidate:fffeff021b36b51d6f7abdc3 1 udp 50331391 34.94.63.38 55487 typ relay cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.629Z DEBUG snownet::node: Unknown connection or socket has already been nominated ignored_candidate=candidate:fffeff64a52b02479dab9c4 1 udp 1694498559 107.197.104.68 49376 typ srflx cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.629Z DEBUG snownet::node: Unknown connection or socket has already been nominated ignored_candidate=candidate:fffeff7ec9b7a7db40ec1c44 1 udp 2130706175 192.168.1.150 49376 typ host cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.630Z DEBUG snownet::node: Unknown connection or socket has already been nominated ignored_candidate=candidate:ffffff026d81f5c8a4d5600e 1 udp 50331647 2600:1900:4120:521c:0:78:: 55487 typ relay cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.630Z DEBUG snownet::node: Unknown connection or socket has already been nominated ignored_candidate=candidate:ffffff64e2c91c4ff6f343f5 1 udp 1694498815 2600:1700:3ecb:2410:7499:175a:5c9:9bc5 57622 typ srflx cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.630Z DEBUG snownet::node: Unknown connection or socket has already been nominated ignored_candidate=candidate:ffffff7ed64262b110d1f279 1 udp 2130706431 2600:1700:3ecb:2410:7499:175a:5c9:9bc5 57622 typ host cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
// We are receiving a response for our handshake initiation. Let the fight begin!
2025-02-17T15:36:46.651Z DEBUG boringtun::noise: Received handshake_response local_idx=185230593 remote_idx=731337474 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.651Z DEBUG firezone_gateway::eventloop: Tunnel error: Failed to decapsulate: Failed to decapsulate: UnexpectedPacket
2025-02-17T15:36:46.651Z DEBUG boringtun::noise: Received handshake_initiation remote_idx=731337475 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.652Z DEBUG boringtun::noise: Sending handshake_response local_idx=185230596 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.652Z DEBUG boringtun::noise: Sending handshake_initiation cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.652Z DEBUG boringtun::noise: Received handshake_response local_idx=185230595 remote_idx=731337476 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.652Z DEBUG firezone_gateway::eventloop: Tunnel error: Failed to decapsulate: Failed to decapsulate: UnexpectedPacket
2025-02-17T15:36:46.652Z DEBUG boringtun::noise: Received handshake_initiation remote_idx=731337477 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.653Z DEBUG boringtun::noise: Sending handshake_response local_idx=185230598 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.653Z DEBUG boringtun::noise: Sending handshake_initiation cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.691Z DEBUG boringtun::noise: Received handshake_response local_idx=185230597 remote_idx=731337478 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.691Z DEBUG firezone_gateway::eventloop: Tunnel error: Failed to decapsulate: Failed to decapsulate: UnexpectedPacket
2025-02-17T15:36:46.691Z DEBUG boringtun::noise: Received handshake_initiation remote_idx=731337479 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.692Z DEBUG boringtun::noise: Sending handshake_response local_idx=185230600 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
2025-02-17T15:36:46.692Z INFO snownet::node: Completed wireguard handshake cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5 duration_since_intent=217.247362ms
2025-02-17T15:36:46.692Z DEBUG firezone_gateway::eventloop: Tunnel error: Failed to decapsulate: Failed to decapsulate: NoCurrentSession
2025-02-17T15:36:46.692Z DEBUG firezone_gateway::eventloop: Tunnel error: Failed to decapsulate: Failed to decapsulate: NoCurrentSession
2025-02-17T15:36:46.692Z DEBUG firezone_gateway::eventloop: Tunnel error: Failed to decapsulate: Failed to decapsulate: NoCurrentSession
2025-02-17T15:36:46.692Z DEBUG firezone_gateway::eventloop: Tunnel error: Failed to decapsulate: Failed to decapsulate: NoCurrentSession
2025-02-17T15:36:46.708Z DEBUG firezone_gateway::eventloop: Tunnel error: Failed to decapsulate: Failed to decapsulate: NoCurrentSession
2025-02-17T15:36:46.731Z DEBUG boringtun::noise: New session session=185230600 cid=8b106344-ba59-4050-8f9a-e2f0bab6e9e5
```
As you can see, with both parties initiating handshakes, they end up
fighting over who should initiate the session.
When `snownet` establishes a connection to another peer, we may end up
in one of four different connection types:
- `PeerToPeer`
- `PeerToRelay`
- `RelayToPeer`
- `RelayToRelay`
From the perspective of the local node, it only matters whether or not
we are sending data from our local socket or a relay's socket because in
the latter case, we have to encapsulate it in a channel data message.
Hence, at present, we often see logs that say "Direct" but really, we
are talking to a port allocated by the remote on a relay.
We know whether or not the remote candidate is a relay by looking at the
candidates they sent us.
To make our logs more descriptive, we now model out all 4 possibilities
here.
When we detect timed-out request to a relay, we print the duration we
were waiting for. Currently, this is offset by one "backoff tick"
because we advance the backoff too early.
Here is a log-output of a test prior to the change:
```
snownet::allocation: Sending BINDING requests to pick active socket relay_socket=V4(127.0.0.1:3478)
handle_timeout{active_socket=None}: snownet::allocation: Request timed out after 1.5s, re-sending id=TransactionId(0x0BFA13E983FEF36EE4877719) method=binding dst=127.0.0.1:3478
handle_timeout{active_socket=None}: snownet::allocation: Request timed out after 2.25s, re-sending id=TransactionId(0x0BFA13E983FEF36EE4877719) method=binding dst=127.0.0.1:3478
handle_timeout{active_socket=None}: snownet::allocation: Request timed out after 3.375s, re-sending id=TransactionId(0x0BFA13E983FEF36EE4877719) method=binding dst=127.0.0.1:3478
handle_timeout{active_socket=None}: snownet::allocation: Request timed out after 3.375s, re-sending id=TransactionId(0x0BFA13E983FEF36EE4877719) method=binding dst=127.0.0.1:3478
handle_timeout{active_socket=None}: snownet::allocation: Backoff expired, giving up id=TransactionId(0x0BFA13E983FEF36EE4877719) method=binding dst=127.0.0.1:3478
```
and with this change:
```
snownet::allocation: Sending BINDING requests to pick active socket relay_socket=V4(127.0.0.1:3478)
handle_timeout{active_socket=None}: snownet::allocation: Request timed out after 1s, re-sending id=TransactionId(0x6C79DD3607DF96806C4A7D8C) method=binding dst=127.0.0.1:3478
handle_timeout{active_socket=None}: snownet::allocation: Request timed out after 1.5s, re-sending id=TransactionId(0x6C79DD3607DF96806C4A7D8C) method=binding dst=127.0.0.1:3478
handle_timeout{active_socket=None}: snownet::allocation: Request timed out after 2.25s, re-sending id=TransactionId(0x6C79DD3607DF96806C4A7D8C) method=binding dst=127.0.0.1:3478
handle_timeout{active_socket=None}: snownet::allocation: Request timed out after 3.375s, re-sending id=TransactionId(0x6C79DD3607DF96806C4A7D8C) method=binding dst=127.0.0.1:3478
handle_timeout{active_socket=None}: snownet::allocation: Backoff expired, giving up id=TransactionId(0x6C79DD3607DF96806C4A7D8C) method=binding dst=127.0.0.1:3478
t
```
There is no functional difference, we were just logging the wrong
duration.
This appears to have regressed in #8159. It is low-priority right now
and we need to unblock a flaky CI so lower the expected BPS and
investigate later.
Same as done for unix-based operation systems in #8117, we introduce a
dedicated "TUN send" thread for Windows in this PR. Not only does this
move the syscalls and copying of sending packets away from `connlib`'s
main thread but it also establishes backpressure between those threads
properly.
WinTUN does not have any ability to signal that it has space in its send
buffer. If it fails to allocate a packet for sending, it will return
`ERROR_BUFFER_OVERFLOW` [0]. We now handle this case gracefully by
suspending the send thread for 10ms and then try again. This isn't a
great way of establishing back-pressure but at least we don't have any
packet loss.
To test this, I temporarily lowered the ring buffer size and ran a speed
test. In that, I could confirm that `ERROR_BUFFER_OVERFLOW` is indeed
emitted and handled as intended.
[0]: https://git.zx2c4.com/wintun/tree/api/session.c#n267
To ensure that our test suite represents production as much as possible,
we introduce a dedicated "Internet" site into the `StubPortal` that only
hosts the Internet resource. All other creates resources are assigned to
other sites.
Bumps [os_info](https://github.com/stanislav-tkach/os_info) from 3.9.2
to 3.10.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/stanislav-tkach/os_info/releases">os_info's
releases</a>.</em></p>
<blockquote>
<h2>os_info 3.10.0</h2>
<ul>
<li>Bluefin Linux support has been added. (<a
href="https://redirect.github.com/stanislav-tkach/os_info/issues/394">#394</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/stanislav-tkach/os_info/blob/master/CHANGELOG.md">os_info's
changelog</a>.</em></p>
<blockquote>
<h2>[3.10.0] (2025-02-09)</h2>
<ul>
<li>Bluefin Linux support has been added. (<a
href="https://redirect.github.com/stanislav-tkach/os_info/issues/394">#394</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0554ec580d"><code>0554ec5</code></a>
Merge pull request <a
href="https://redirect.github.com/stanislav-tkach/os_info/issues/396">#396</a>
from stanislav-tkach/release-3-10-0</li>
<li><a
href="9a0980c375"><code>9a0980c</code></a>
Fix markdown indent</li>
<li><a
href="d189e7de3f"><code>d189e7d</code></a>
Release the 3.10.0 version</li>
<li><a
href="6d7ea4f231"><code>6d7ea4f</code></a>
Merge pull request <a
href="https://redirect.github.com/stanislav-tkach/os_info/issues/395">#395</a>
from stanislav-tkach/fix-spellcheck</li>
<li><a
href="e81339bd5d"><code>e81339b</code></a>
Fix spellcheck</li>
<li><a
href="9c6d24ead9"><code>9c6d24e</code></a>
Merge pull request <a
href="https://redirect.github.com/stanislav-tkach/os_info/issues/394">#394</a>
from sargunv/feature/add-bluefin-support</li>
<li><a
href="a41a664650"><code>a41a664</code></a>
feat: Add support for Bluefin Linux</li>
<li>See full diff in <a
href="https://github.com/stanislav-tkach/os_info/compare/v3.9.2...v3.10.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The `wintun` crate will already shutdown the session for us when the
last instance of `Session` gets dropped. Shutting down the session prior
to that already results in an attempt to close an adapter that is no
longer present, causing WinTUN to log (unactionable) errors.
Every time we start a new session, our telemetry context potentially
changes, i.e. the user may sign into a new account. This should ensure
that both the IPC service and the GUI always use the most up-to-date
`account_slug` as part of Sentry events. In addition, this will also set
the `account_slug` for clients that just signed in. Previously, the
`account_slug` would only get populated on the next start of the client.
By setting the `system_certs` cfg at compile-time, any TLS connections
from `phoenix-channel` will use the system-provided CA store instead of
the embedded one.
Resolves: #8065
Co-authored-by: oddlama <oddlama@oddlama.org>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Alternative to #8128. If the user dismissed the unlock prompt or has
their keyring otherwise misconfigured, it is still useful to allow them
to sign-in. They just won't stay signed-in across reboots of the device.
When we receive an inbound packet from the TUN device on the Gateway, we
make a lookup in the NAT table to see if it needs to be translated back
to a DNS proxy IP.
At present, non-existence of such a NAT entry results in the packet
being sent entirely unmodified because that is what needs to happen for
CIDR resources. Whilst that is important, the same code path is
currently being executed for DNS resources whose NAT session expired!
Those packets should be dropped instead which is what we do with this
PR.
To differentiate between not having a NAT session at all or whether a
previous one existed but is expired now, we keep around all previous
"outside" tuples of NAT sessions around. Those are only very small in
their memory-footprint. The entire NAT table is scoped to a connection
to the given peer and will thus eventually freed once the peer
disconnects. This allows us to reliably and cheaply detect, whether a
packet is using an expired NAT session. This check must be cheap because
all traffic of CIDR resources and the Internet resource needs to perform
this check such that we know that they don't have to be translated.
This might be the source of some of the "Source not allowed" errors we
have been seeing in client logs.
At present, `connlib` communicates with its host app via callbacks.
These callbacks are executed synchronously as part of `connlib`s
event-loop, meaning `connlib` cannot do anything else whilst the
callback is executing in the host app. Additionally, this callback runs
within the `Future` that represents `connlib` and thus runs on a `tokio`
worker thread.
Attempting to interact with the session from within the callback can
lead to panics, for example when `Session::disconnect` is called which
uses `Runtime::block_on`. This isn't allowed by `tokio`: You cannot
block on the execution of an async task from within one of the worker
threads.
To solve both of these problems, we introduce a thread-pool of size 1
that is responsible for executing `connlib` callbacks. Not only does
this allow `connlib` to perform more work such as routing packets or
process portal messages, it also means that it is not possible for the
host app to cause these panics within the `tokio` runtime because the
callbacks run on a different thread.
We appear to have caused a pretty big performance regression (~40%) in
037a2e64b6 (identified through
`git-bisect`). Specifically, the regression appears to have been caused
by [`aef411a`
(#7605)](aef411abf5).
Weirdly enough, undoing just that on top of `main` doesn't fix the
regression.
My hypothesis is that using the same file descriptor for read AND write
interests on the same runtime causes issues because those interests are
occasionally cleared (i.e. on false-positive wake-ups).
In this PR, we spawn a dedicated thread each for the sending and
receiving operations of the TUN device. On unix-based systems, a TUN
device is just a file descriptor and can therefore simply be copied and
read & written to from different threads. Most importantly, we only
construct the `AsyncFd` _within_ the newly spawned thread and runtime
because constructing an `AsyncFd` implicitly registers with the runtime
active on the current thread.
As a nice benefit, this allows us to get rid of a `future::select`.
Those are always kind of nasty because they cancel the future that
wasn't ready. My original intuition was that we drop packets due to
cancelled futures there but that could not be confirmed in experiments.