We use several buffer pools across `connlib` that are all backed by the
same buffer-pool library. Within that library, we currently use another
object-pool library to provide the actual pooling functionality.
Benchmarking has shown that spend quite a bit of time (a few % of total
CPU time), fighting for the lock to either add or remote a buffer from
the pool. This is unnecessary. By using a queue, we can remove buffers
from the front and add buffers at the back, both of which can be
implemented in a lock-free way such that they don't contend.
Using the well-known `crossbeam-queue` library, we have such a queue
directly available.
I wasn't able to directly measure a performance gain in terms of
throughput. What we can measure though, is how much time we spend
dealing with our buffer pool vs everything else. If we compare the
`perf` outputs that were recorded during an `iperf` run each, we can see
that we spend about 60% less time dealing with the buffer pool than we
did before.
|Before|After|
|---|---|
|<img width="1982" height="553" alt="Screenshot From 2025-07-24
20-27-50"
src="https://github.com/user-attachments/assets/1698f28b-5821-456f-95fa-d6f85d901920"
/>|<img width="1982" height="553" alt="Screenshot From 2025-07-24
20-27-53"
src="https://github.com/user-attachments/assets/4f26a2d1-03e3-4c0d-84da-82c53b9761dd"
/>|
The number in the thousands on the left is how often the respective
function was the currently executing function during the profiling run.
Resolves: #9972
Presently, for each UDP packet that we process in `snownet`, we check if
we have already seen this local address of ours and if not, add it to
our list of host candidates. This is a safe way for ensuring that we
consider all addresses that we receive data on as ones that we tell our
peers that they should try and contact us on.
Performance profiling has shown that hashing the socket address of each
packet that is coming in is quite wasteful. We spend about 4-5% of our
main thread time doing this. For comparison, decrypting packets is only
about 30%.
Most of the time, we will already know about this address and therefore,
spending all this CPU time is completely pointless. At the same time
though, we need to be sure that we do discover our local address
correctly.
Inspired by STUN, we therefore move this responsibility to the
`allocation` module. The `allocation` module is responsible for
interacting with our TURN servers and will yield server-reflexive and
relay candidates as a result. It also knows, what the local address is
that it received traffic on so we simply extend that to yield host
candidates as well in addition to server-reflexive and relay candidates.
On my local machine, this bumps us across the 3.5 Gbits/sec mark:
```
Connecting to host 172.20.0.110, port 5201
[ 5] local 100.93.174.92 port 57890 connected to 172.20.0.110 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 319 MBytes 2.67 Gbits/sec 18 548 KBytes
[ 5] 1.00-2.00 sec 413 MBytes 3.46 Gbits/sec 4 884 KBytes
[ 5] 2.00-3.00 sec 417 MBytes 3.50 Gbits/sec 4 1.10 MBytes
[ 5] 3.00-4.00 sec 425 MBytes 3.56 Gbits/sec 415 785 KBytes
[ 5] 4.00-5.00 sec 430 MBytes 3.60 Gbits/sec 154 820 KBytes
[ 5] 5.00-6.00 sec 434 MBytes 3.64 Gbits/sec 251 793 KBytes
[ 5] 6.00-7.00 sec 436 MBytes 3.66 Gbits/sec 123 811 KBytes
[ 5] 7.00-8.00 sec 435 MBytes 3.65 Gbits/sec 2 788 KBytes
[ 5] 8.00-9.00 sec 423 MBytes 3.55 Gbits/sec 0 1.06 MBytes
[ 5] 9.00-10.00 sec 433 MBytes 3.63 Gbits/sec 8 1017 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-20.00 sec 8.21 GBytes 3.53 Gbits/sec 1728 sender
[ 5] 0.00-20.00 sec 8.21 GBytes 3.53 Gbits/sec receiver
iperf Done.
```
These are otherwise hit pretty often in the hot-path and slow packet
routing down because tracing needs to evaluate whether it should log the
statement.
Despite still being in development, the `tauri-specta` project already
proves to be quite useful. It allows us to generate TypeScript bindings
for our commands and events, creating a type-safe contract between the
frontend and the backend.
For example, this ensures that the TypeScript code calls a command
actually with the required parameters and thus avoids runtime failures.
Similarly, the frontend can listen on type-safe events without having to
use any magic strings.
The full `account` struct is only used to render the client's interface,
and doesn't need to be stored in the `client` struct when the `subject`
struct already tracks it.
Oban includes its own configuration validation, which seems to prevent
`runtime.exs` from overriding any compile-time options. This prevents us
from using ENV vars to configure it, such as restricting job execution
to `domain` nodes by setting `queues: []`. To fix that, we make sure to
set Oban configuration in env-specific files `config/dev.exs` and
`config/test.exs`, and at runtime for prod with `config/runtime.exs`.
Fixes#10016
Whilst entering and leaving a span for every packet is very expensive,
doing the same whenever we make timeout related changes is just fine.
Thus, we re-introduce a span removed in #9949 but only for the
`handle_timeout` function.
This gives us the context of the connection ID for not just our own
logs, but also the ones from `boringtun`.
Unfortunately this seems to be a race condition with read or setting the
path properly for this exec block. I've verified `cargo` is in the PATH,
and have tried this on a fresh Mac with Android Studio (latest release
version).
Spawning cargo from `sh -c` fixes the issue.
In 45466e3b78, the `networkSettings`
variable was no longer saved on the `adapter` instance, causing all
calls of the iOS-specific version of getting system resolvers to return
the connlib sentinels after the tunnel first came up.
This PR fixes that logic bug and also cleans this area of the codebase
up just a tiny bit so it's easier to follow.
Lastly, we also fix a bug where if the tunnel came up while Firezone was
already running, `networkSettings` would be `nil`, and we would read the
default system resolvers, which were the connlib sentinels.
Fixes https://github.com/firezone/firezone/issues/10017
On iOS, we were using the tempfile directory to stage the log export,
and then moving this into place from the share sheet presented to the
user.
For some reason, this has stopped working in iOS 18.5.0, and we need to
stage the file in the standard documents directory instead.
Fixes#10014
By chance, I've discovered in a CI failure that we won't be able to
handshake a new session if the `preshared_key` changes. This makes a lot
of sense. The `preshared_key` needs to be the same on both ends as it is
a shared secret that gets mixed into the Noise handshake.
In following sequence of events, we would thus previously run into a
"failed to decrypt handshake packet" scenario:
1. Client requests a connection.
2. Gateway authorizes the connection.
3. Portal restarts / gets deployed. To my knowledge, this will rotate
the `preshared_key` to a new secret. Restarting the portal also cuts all
WebSockets and therefore, the Gateways response never arrives.
4. Client reconnects to the WebSocket, requests a new connection.
5. Gateway reuses the local connection but this connection still uses
the old `preshared_key`!
6. Client needs to wait for the Gateway's ICE timeout before it can
establish a new connection.
How exactly (3) happens doesn't matter. There are probably other
conditions as to where the WebSocket connections get cut and we cannot
complete our connection handshake.
Time-based policy conditions are tricky. When they authorize a flow, we
correctly tell the Gateway to remove access when the time window
expires.
However, we do nothing on the client to reset the connectivity state.
This means that whenever the window of time of access was re-entered,
the client would essentially never be able to connect to it again until
the resource was toggled.
To fix this, we add a 1-minute check in the client channel that
re-checks allowed resources, and updates the client state with the
difference. This means that policies that have time-based conditions are
only accurate to the minute, but this is how they're presented anyhow.
For good measure, we also add a periodic job that runs every minute to
delete expired Flows. This will propagate to the Gateway where, if the
access for a particular client-resource is determined to be actually
gone, will receive `reject_access`.
Zooming out a bit, this PR furthers the theme that:
- Client channels react to underlying resource / policy / membership
changes directly, while
- Gateway channels react primarily to flows being deleted, or the
downstream effects of a prior client authorization
Whenever a client requests a connection to gateway, we need to generate
a preshared key that will be used for the underlying WireGuard tunnel.
When the connection setup broke or otherwise was lost, _after_ the
gateway the received the authorize_flow call, but _before_ the client
could receive the response (and initiate a tunnel), we would have to
wait until an ICE timeout occurred in order to reset state on the
gateway.
This is because the psk was not used to determine if this was a _new_
flow authorization. So the old authorization would be matched, and the
client would never be able to connect, since its tunnel was using the
new psk, and the gateway the old.
To fix this, we generate a secure random 32-byte `psk_base` on each
client and gateway. When a client wishes to connect to a gateway, we
compute the WireGuard preshared key as an HMAC over these two inputs.
This fixes the issue by ensuring that subsequent flow authorization
requests from a particular client to a particular gateway will yield the
same psk.
Related: #9999
Related: https://github.com/firezone/infra/issues/99
Previously, our idle timer was only driven by incoming and outgoing
packets. To detect whether the tunnel is idle, we checked whether either
the last incoming or last outgoing packet was more than 20s ago.
For one, having two timestamps here is unnecessarily complex. We can
simply combine them and always update this timestamp as `last_activity`.
Two, recently, we have started to also take into account not only
packets but other changes to the tunnel, such as an upsert of the
connection or adding new candidate. What we failed to do though, is
update these timestamps because their variable name was related to
packets and not to any activity.
The problem with not updating these timestamps however is that we will
very quickly move out of "connected" back to "idle" because the old
timestamps are still more than 20s ago. Hence, the previous fixes of
moving out of idle on new candidates and connection upsert were
ineffective.
By combining and renaming the timestamps, it is now much more obvious
that we need to update this timestamp in the respective handler
functions which then grants us another 20s of non-idling. This is
important for e.g. connection upserts to ensure the Gateway runs into an
ICE timeout within a short amount of time, should there be something
wrong with the connection that the Client just upserted.
As profiling shows, even if the log target isn't enabled, simply
checking whether or not it is enabled is a significant performance hit.
By guarding these behind `debug_assertions`, I was able to almost
achieve 3.75 Gbits/s locally (when rebased onto #9998). Obviously, this
doesn't quite translate into real-world improvements but it is
nonetheless a welcome improvement.
```
Connecting to host 172.20.0.110, port 5201
[ 5] local 100.93.174.92 port 34678 connected to 172.20.0.110 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 401 MBytes 3.37 Gbits/sec 14 644 KBytes
[ 5] 1.00-2.00 sec 448 MBytes 3.76 Gbits/sec 3 976 KBytes
[ 5] 2.00-3.00 sec 453 MBytes 3.80 Gbits/sec 43 979 KBytes
[ 5] 3.00-4.00 sec 449 MBytes 3.77 Gbits/sec 21 911 KBytes
[ 5] 4.00-5.00 sec 452 MBytes 3.79 Gbits/sec 4 1.15 MBytes
[ 5] 5.00-6.00 sec 451 MBytes 3.78 Gbits/sec 81 1.01 MBytes
[ 5] 6.00-7.00 sec 445 MBytes 3.73 Gbits/sec 39 705 KBytes
[ 5] 7.00-8.00 sec 436 MBytes 3.66 Gbits/sec 3 1016 KBytes
[ 5] 8.00-9.00 sec 460 MBytes 3.85 Gbits/sec 1 956 KBytes
[ 5] 9.00-10.00 sec 453 MBytes 3.80 Gbits/sec 0 1.19 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 4.34 GBytes 3.73 Gbits/sec 209 sender
[ 5] 0.00-10.00 sec 4.34 GBytes 3.73 Gbits/sec receiver
```
I didn't want to remove the `wire` logs entirely because they are quite
useful for debugging. However, they are also exactly this: A debugging
tool. In a production build, we are very unlikely to turn these on which
makes `debug_assertions` a good tool for keeping these around without
interfering with performance.
Fixes an edge case where a WiFi interface could go offline, then come
back online with the same connectivity, preventing the path update
handler from reset connlib state.
This would cause an issue especially if the WiFi was disabled for more
than 30 seconds / 2 minutes.
Bumps
[@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node)
from 22.15.30 to 24.0.15.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Our `ThreadedUdpSocket` uses a background thread for the actual socket
operation. It merely represents a handle to send and receive from these
sockets but not the socket itself. Dropping the handle will shutdown the
background thread but that is an asynchronous operation.
In order to be sure that we can rebind the same port, we need to wait
for the background thread to stop.
We thus add a `Drop` implementation for the `ThreadedUdpSocket` that
waits for its background thread to disappear before it continues.
Resolves: #9992
The `flows` table tracks authorizations we've made for a resource and
persists them, so that we can determine which authorizations are still
valid across deploys or hiccups in the control plane connections.
Before, when the "in-use" authorization for a resource was deleted, we
would have flapped the resource in the client, and sent `reject_access`
to the gateway. However, that would cause issues in the following edge
case:
- Client is currently connected to Resource A through Policy B
- Client websocket goes down
- Policy B is created for Resource A (for another actor group), and
Policy A is deleted by admin
- Client reconnects
- Client sees that its resource list is the same
- Gateway has since received `reject_access` because no new flows were
created for this client-resource combination
To prevent this from happening, we now try to "reauthorize" the flow
whenever the last cached flow is removed for a particular
client-resource pair. This avoids needing to toggle the resource on the
client since we won't have sent `reject_access` to the gateway.
Bumps [mixpanel-browser](https://github.com/mixpanel/mixpanel-js) and
[@types/mixpanel-browser](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/mixpanel-browser).
These dependencies needed to be updated together.
Updates `mixpanel-browser` from 2.65.0 to 2.67.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/mixpanel/mixpanel-js/releases">mixpanel-browser's
releases</a>.</em></p>
<blockquote>
<h2>Fixes and minor updates</h2>
<ul>
<li><code>get_api_host()</code> is now used consistently across the SDK
to ensure that per-endpoint API host configs are respected
everywhere</li>
<li>A fix is included for the ordering of (asynchronous) operations when
calling <code>mixpanel.reset()</code> while a session recording is
active</li>
<li>Default Feature Flag context now includes <code>device_id</code>
alongside <code>distinct_id</code></li>
<li><code>$experiment_started</code> events now include several
API-latency-tracking properties</li>
</ul>
<h2>Fine-grained API host configuration and session recording fixes</h2>
<p>A new <code>api_hosts</code> configuration option enables different
endpoints (events, profiles, groups, session recordings) to be sent to
different hosts, for selective proxying, e.g.:</p>
<pre lang="js"><code>mixpanel.init('<TOKEN>', {
api_hosts: {
// proxy only session-recording requests, and leave the rest on the
default host api-js.mixpanel.com
'record': 'https://my-proxy.com',
},
});
</code></pre>
<p>This release also fixes a race condition when calling
<code>mixpanel.reset()</code> while a session recording is active, and
adds an initial TypeScript <code>types.d.ts</code> file.</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/mixpanel/mixpanel-js/blob/master/CHANGELOG.md">mixpanel-browser's
changelog</a>.</em></p>
<blockquote>
<p><strong>2.67.0</strong> (17 Jul 2025)</p>
<ul>
<li>Use <code>get_api_host()</code> consistently across the SDK</li>
<li>Include <code>device_id</code> in default Feature Flag context</li>
<li>Track latency props in <code>$experiment_started</code> event</li>
<li>Fix async behavior in <code>mixpanel.reset()</code> when a session
recording is active</li>
<li>Fix recorder integration test race conditions</li>
</ul>
<p><strong>2.66.0</strong> (8 Jul 2025)</p>
<ul>
<li>Add <code>api_host</code> configuration option to support different
hosts/proxies for different endpoints (thanks <a
href="https://github.com/chrisknu"><code>@chrisknu</code></a>)</li>
<li>Add types.d.ts from existing public repo</li>
<li>Fix race condition when calling <code>mixpanel.reset()</code> while
a session recording is active</li>
</ul>
<p><strong>2.65.0</strong> (20 May 2025)</p>
<ul>
<li><code>mixpanel.people.track_charge()</code> (deprecated) no longer
sets profile property</li>
<li>Adds page height and width tracking to autocapture click
tracking</li>
<li>Session recording now stops when mixpanel.reset() is called</li>
<li>Support for adding arbitrary query string params to tracking
requests (thanks <a
href="https://github.com/dylan-asos"><code>@dylan-asos</code></a>)</li>
<li>Feature flagging API revisions</li>
<li>Whale Browser detection</li>
</ul>
<p><strong>2.64.0</strong> (15 Apr 2025)</p>
<ul>
<li>Add <code>record_heatmap_data</code> init option for Session
Recording to ensure click events are captured for Heat Maps</li>
<li>Initial support for feature flagging</li>
</ul>
<p><strong>2.63.0</strong> (1 Apr 2025)</p>
<ul>
<li>Update rrweb to latest alpha version</li>
<li>Refactor SDK build process to rely mainly on Rollup</li>
</ul>
<p><strong>2.62.0</strong> (26 Mar 2025)</p>
<ul>
<li>Replace UUID generator with UUIDv4 (using native API when
available)</li>
<li>Consistently use native JSON serialization when available</li>
<li>Fix for session recording idle timeout race condition</li>
</ul>
<p><strong>2.61.2</strong> (14 Mar 2025)</p>
<ul>
<li>Revert 10ms throttle on enqueueing events to improve tracking
reliability on page unload</li>
</ul>
<p><strong>2.61.1</strong> (11 Mar 2025)</p>
<ul>
<li>Session recording stops if initial DOM snapshot fails</li>
<li>Errors triggered by rrweb's record function are now caught</li>
<li>Fix for issue causing opt-out check error messages in
<code>debug</code> mode</li>
</ul>
<p><strong>2.61.0</strong> (6 Mar 2025)</p>
<ul>
<li>Session recordings now continue across page loads within the same
tab, using IndexedDB for persistence</li>
</ul>
<p><strong>2.60.0</strong> (31 Jan 2025)</p>
<ul>
<li>Expanded Autocapture configs</li>
<li>Prevent duplicate values in persistence when using people.union
(thanks <a
href="https://github.com/chrisdeely"><code>@chrisdeely</code></a>)</li>
</ul>
<p><strong>2.59.0</strong> (21 Jan 2025)</p>
<ul>
<li>Initial Autocapture support</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ec87a0ab39"><code>ec87a0a</code></a>
2.67.0</li>
<li><a
href="952dea91ab"><code>952dea9</code></a>
changelog for 2.67.0</li>
<li><a
href="0ea33fd603"><code>0ea33fd</code></a>
Merge branch '2.67.0-rc'</li>
<li><a
href="e51b679cdd"><code>e51b679</code></a>
Update conventions, ignore is no longer used</li>
<li><a
href="742fabb992"><code>742fabb</code></a>
Create dependabot yml from template, set ignore rule from example
directory (...</li>
<li><a
href="6cdecd6ad0"><code>6cdecd6</code></a>
Push to 2.67</li>
<li><a
href="2033e9eb48"><code>2033e9e</code></a>
Dist files</li>
<li><a
href="b1c6a3f796"><code>b1c6a3f</code></a>
send in variant fetch start/complete as date strings</li>
<li><a
href="ba23ab0404"><code>ba23ab0</code></a>
Merge pull request <a
href="https://redirect.github.com/mixpanel/mixpanel-js/issues/281">#281</a>
from mixpanel/jg-final-flush</li>
<li><a
href="591fa5050f"><code>591fa50</code></a>
test fix</li>
<li>Additional commits viewable in <a
href="https://github.com/mixpanel/mixpanel-js/compare/v2.65.0...v2.67.0">compare
view</a></li>
</ul>
</details>
<br />
Updates `@types/mixpanel-browser` from 2.60.0 to 2.66.0
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/mixpanel-browser">compare
view</a></li>
</ul>
</details>
<br />
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Currently, packets for allocations, i.e. from relays are parsed inside
the `Allocation` struct. We have one of those structs for each relay
that `snownet` is talking to. When we disconnect from a relay because it
is e.g. not responding, then we deallocate this struct. As a result,
message that arrive from this relay can no longer be handled. This can
happen when the response time is longer than our timeout.
These packets then fall-through and end up being logged as "packet has
unknown format".
To prevent this, we make the signature on `Allocation` strongly-typed
and expect a fully parsed `Message` to be given to us. This allows us to
parse the message early and discard it with a DEBUG log in case we don't
have the necessary local state to handle it.
The functionality here is essentially the same, we just change at what
level this is being logged at from WARN to DEBUG.
We have to make one additional adjustment to make this work: Guard all
messages to be parsed by any `Allocation` to come from port 3478. This
is the assigned port that all relays are expected to listen on. If we
don't have any local state for a given address, we cannot decide whether
it is a STUN message for an agent or a STUN message for a relay that we
have disconnected from. Therefore, we need to de-multiplex based on the
source port.
Bumps [pnpm/action-setup](https://github.com/pnpm/action-setup) from
4.0.0 to 4.1.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pnpm/action-setup/releases">pnpm/action-setup's
releases</a>.</em></p>
<blockquote>
<h2>v4.1.0</h2>
<p>Add support for <code>package.yaml</code> <a
href="https://redirect.github.com/pnpm/action-setup/pull/156">#156</a>.</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="a7487c7e89"><code>a7487c7</code></a>
feat: update dist</li>
<li><a
href="fff70888d0"><code>fff7088</code></a>
test: update pnpm to v9</li>
<li><a
href="6e3017af18"><code>6e3017a</code></a>
docs: support <code>package.yaml</code> (<a
href="https://redirect.github.com/pnpm/action-setup/issues/157">#157</a>)</li>
<li><a
href="0cb0538c33"><code>0cb0538</code></a>
feat: support <code>package.yaml</code> (<a
href="https://redirect.github.com/pnpm/action-setup/issues/156">#156</a>)</li>
<li><a
href="e303250a24"><code>e303250</code></a>
docs: update pnpm version in readme examples (<a
href="https://redirect.github.com/pnpm/action-setup/issues/154">#154</a>)</li>
<li><a
href="ac5bf11548"><code>ac5bf11</code></a>
Update examples to use pnpm v9 (<a
href="https://redirect.github.com/pnpm/action-setup/issues/142">#142</a>)</li>
<li><a
href="18ac635edf"><code>18ac635</code></a>
docs: remove redundant manual cache due to setup-node cache (<a
href="https://redirect.github.com/pnpm/action-setup/issues/131">#131</a>)</li>
<li><a
href="0d0b43217a"><code>0d0b432</code></a>
docs: add warning about v2</li>
<li><a
href="0eb0e97082"><code>0eb0e97</code></a>
Add readme example for omitting <code>version</code> (<a
href="https://redirect.github.com/pnpm/action-setup/issues/134">#134</a>)</li>
<li><a
href="23657c8550"><code>23657c8</code></a>
docs: change order of setup node and pnpm (<a
href="https://redirect.github.com/pnpm/action-setup/issues/129">#129</a>)</li>
<li>Additional commits viewable in <a
href="fe02b34f77...a7487c7e89">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
On Apple platforms, we tried to be clever about filtering path updates
from the network connectivity change monitor, because there can be a
flurry of them upon waking from sleep or network roaming.
However, because of this, we had a bug that could occur in certain
situations (such as waking from sleep) where we could effectively "land"
on an empty DNS resolver list. This could happen if:
1. We receive a path update handler that meaningfully changes
connectivity, but its `supportsDNS` property is `false`. This means it
hasn't received any resolvers from DHCP yet. We would then setDns with
an empty resolver list.
2. We then receive a path update handler with the _only_ change being
`supportDNS=true`. Since we didn't count this change as a meaningful
path change, we skipped the `setDns` call, and connlib would be stuck
without DNS resolution.
To fix the above, we stop trying to be clever about connectivity
changes, and just use `oldPath != path`. That will increase reset a bit,
but it will now handle other edge cases such as an IP address changing
on the primary interface, any other interfaces change, and the like.
Fixes#9866
Bumps the flowbite group in /rust/gui-client with 1 update:
[flowbite-react](https://github.com/themesberg/flowbite-react/tree/HEAD/packages/ui).
Updates `flowbite-react` from 0.11.8 to 0.11.9
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/themesberg/flowbite-react/releases">flowbite-react's
releases</a>.</em></p>
<blockquote>
<h2>flowbite-react@0.11.9</h2>
<h3>Patch Changes</h3>
<ul>
<li><a
href="https://redirect.github.com/themesberg/flowbite-react/pull/1587">#1587</a>
<a
href="3028f83f89"><code>3028f83</code></a>
Thanks <a href="https://github.com/raahed"><code>@raahed</code></a>! -
feat(Datepicker): Implemented a filter function prop</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/themesberg/flowbite-react/blob/main/packages/ui/CHANGELOG.md">flowbite-react's
changelog</a>.</em></p>
<blockquote>
<h2>0.11.9</h2>
<h3>Patch Changes</h3>
<ul>
<li><a
href="https://redirect.github.com/themesberg/flowbite-react/pull/1587">#1587</a>
<a
href="3028f83f89"><code>3028f83</code></a>
Thanks <a href="https://github.com/raahed"><code>@raahed</code></a>! -
feat(Datepicker): Implemented a filter function prop</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="213be8eb96"><code>213be8e</code></a>
Version Packages (<a
href="https://github.com/themesberg/flowbite-react/tree/HEAD/packages/ui/issues/1590">#1590</a>)</li>
<li><a
href="3028f83f89"><code>3028f83</code></a>
feat: Add 'filterDate' prop function on Datepicker (<a
href="https://github.com/themesberg/flowbite-react/tree/HEAD/packages/ui/issues/1587">#1587</a>)</li>
<li>See full diff in <a
href="https://github.com/themesberg/flowbite-react/commits/flowbite-react@0.11.9/packages/ui">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
When validating the found system resolvers on macOS and iOS, we would
stop after validating the first found resolver (usually IPv4) because
`break` was used instead of `continue`.
Fixes#9914
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Due to network partitions between the Client and the Portal, it is
possible that a Client requests a new connection, then disconnects from
the portal and re-requests the connection once it is reconnected.
On the Gateway, we would have already authorized the first request and
initialise our ICE agents with our local candidates. The second time
around, the connection would be reused. The Client however has lost its
state and therefore, we need to tell it our candidates again.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Room join requests on the portal are only valid whilst we have a
WebSocket connection. To make sure the portal processes all our requests
correctly, we need to hold all other messages back while we are waiting
to join the room.
If the connection flaps while we are waiting to join a room, we may have
a lingering join request that never gets fulfilled and thus blocks the
sending of messages forever.
---------
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
Now that we are capable of migrating a connection to another relay with
#9979, our test suite exposed an edge-case: If we are in the middle of
migrating a connection, it could be that the idle timer triggers because
we have not seen any application traffic in the last 20s.
Moving to idle mode drastically reduces the number of STUN bindings we
send and if this happens whilst we are still checking candidates, the
nomination doesn't happen in time for our boringtun handshake to
succeed.
Thus, we add a condition to our idle timer to not trigger unless ICE has
completed and reports us as `connected`.
When looking at logs, reducing noise is critical to make it easier to
spot important information. When sending logs to Sentry, we currently
append the fields of certain spans to message to make the output similar
to that of `tracing_subscriber::fmt`.
The actual name of a field inside a span is separated from the span name
by a colon. For example, here is a log message as we see it in Sentry
today:
> handle_input:class=success response
handle_input:from=C1A0479AA153FACA0722A5DF76343CF2BEECB10E:3478
handle_input:method=binding handle_input:rtt=34.7479ms
handle_input:tid=BB30E859ED88FFDF0786B634 request=["Software(snownet;
session=BCA42EF159C794F41AE45BF5099E54D3A193A7184C4D2C3560C2FE49C4C6CFB7)"]
response=["Software(firezone-relay; rev=e4ba5a69)",
"XorMappedAddress(B824B4035A78A6B188EF38BE13AA3C1B1B1196D6:52625)"]
Really, what we would like to see is only this:
> class=success response
from=C1A0479AA153FACA0722A5DF76343CF2BEECB10E:3478 method=binding
rtt=34.7479ms tid=BB30E859ED88FFDF0786B634 request=["Software(snownet;
session=BCA42EF159C794F41AE45BF5099E54D3A193A7184C4D2C3560C2FE49C4C6CFB7)"]
response=["Software(firezone-relay; rev=e4ba5a69)",
"XorMappedAddress(B824B4035A78A6B188EF38BE13AA3C1B1B1196D6:52625)"]
The duplication of `handle_input:` is just noise. In our local log
output, we already strip the name of the span to make it easier to read.
Here we now also do the same for the logs reported to Sentry.
When looking through customer logs, we see a lot of "Resolved best route
outside of tunnel" messages. Those get logged every time we need to
rerun our re-implementation of Windows' weighting algorithm as to which
source interface / IP a packet should be sent from.
Currently, this gets cached in every socket instance so for the
peer-to-peer socket, this is only computed once per destination IP.
However, for DNS queries, we make a new socket for every query. Using a
new source port DNS queries is recommended to avoid fingerprinting of
DNS queries. Using a new socket also means that we need to re-run this
algorithm every time we make a DNS query which is why we see this log so
often.
To fix this, we need to share this cache across all UDP sockets. Cache
invalidation is one of the hardest problems in computer science and this
instance is no different. This cache needs to be reset every time we
roam as that changes the weighting of which source interface to use.
To achieve this, we extend the `SocketFactory` trait with a `reset`
method. This method is called whenever we roam and can then reset a
shared cache inside the `UdpSocketFactory`. The "source IP resolver"
function that is passed to the UDP socket now simply accesses this
shared cache and inserts a new entry when it needs to resolve the IP.
As an added benefit, this may speed up DNS queries on Windows a bit
(although I haven't benchmarked it). It should certainly drastically
reduce the amount of syscalls we make on Windows.
In #6876, we added functionality that would only make use of new remote
candidates whilst we haven't nominated a socket yet with the remote. The
reason for that was because in the described edge-case where relays
reboot or get replaced whilst the client is partitioned from the portal
(or we experience a connection hiccup), only one of the two peers, i.e.
Client or Gateway would migrate to the new relay, leaving the other one
in an inconsistent state.
Looking at recent customer logs, I've been seeing a lot of these
messages:
> Unknown connection or socket has already been nominated
For this particular customer, these are then very quickly followed by
ICE timeouts, leaving the connection unusable.
Considering that, I no longer think that the above change was a good
idea and we should instead always make use of all candidates that we are
given. What we are seeing is that in deployment scenarios where the
latency link between Client and Gateway is very short (5-10ms) yet the
latency to the portal is longer (~30-50ms), we trigger a race condition
where we are temporarily nominating a _peer-reflexive_ candidate pair
instead of a regular one. This happens because with such a short latency
link, Client and Gateway are _faster_ in sending back and forth several
STUN bindings than the control plane is in delivering all the
candidates.
Due to the functionality added in #6876, this then results in us not
accepting the candidates. It further appears that a nominated
peer-reflexive candidate does not provide a stable connection which is
why we then run into an ICE timeout, requiring Firezone to establish a
new connection only to have the same thing happen again.
This is very disruptive for the user experience as the connection only
works for a few moments at a time.
With #9793, we have actually added a feature that is also at play here.
Now that we don't immediately act on an ICE timeout, it is actually
possible for both Client and Gateway to migrate a connection to a
different relay, should the one that they are using get disconnected. In
#9793, we added a timeout of 2s for this.
To make this fully work, we need to patch str0m to transition to
`Checking` early. Presently, str0m would directly transition from
`Disconnected` to `Connected` in this case which in some of the
high-latency scenarios that we are testing in CI is not enough to
recover the connection within 2s. By transitioning to `Checking` early,
we abort this timer.
Related: https://github.com/algesten/str0m/pull/676
In case we received a newly nominated socket from `str0m` whilst our
connection was in idle mode, we mistakenly did not apply that and kept
using the old one. ICE would still be functioning in this case because
`str0m` would have updated its internal state but we would be sending
packets into Nirvana.
I don't think that this is likely to be hit in production though as it
would be quite unusual to receive a new nomination whilst the connection
was completely idle.
When encrypting IP packets, `snownet` needs to prepare a buffer where
the encrypted packet is going to end up. Depending on whether we are
sending data via a relayed connection or direct, this buffer needs to be
offset by 4 bytes to allow for the 4-byte channel-data header of the
TURN protocol.
At present, we always first encrypt the packet and then on-demand move
the packet by 4-bytes to the left if we **don't** need to send it via a
relay. Internally, this translates to a `memmove` instruction which
actually turns out to be very cheap (I couldn't measure a speed
difference between this and `main`).
All of this code has grown historically though so I figured, it is
better to clean it up a bit to first evaluate, whether we have a direct
or relayed connection and based on that, write the encrypted packet
directly to the front of the buffer or offset it by 4 bytes.
Profiling has shown that using a spinlock-based buffer pool is
marginally (~1%) faster than the mutex-based one because it resolves
contention quicker.