In order to allow the portal to more easily classify, what kind of
component is connecting, we extend the `get_user_agent` header to
include a component type instead of the generic `connlib/`.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
With the introduction of the pre-resolved Sentry host, all Firezone
clients now require Internet on startup. That is a signficant usability
hit that we can easily fix by simply falling back to resolving the host
on-demand.
We always end up allow this lint when it pops up so we can also just
allow it for the whole repo in general. Most of the time, the reason for
too many arguments are borrow-checker limitations of Rust where mutable
references need to be tracked explicitly.
Right now, the Client event-loops have a channel with 1000 items for
sending new resource lists and updates to the TUN device to the host
app. This is kind of unnecessary as we always only care about the last
version of these. Intermediate updates that the host app doesn't process
are effectively irrelevant.
We've had an issue before where a bug in the portal caused us to receive
many updates to resources which ended up crashing Client apps because
this channel filled up.
To be more resilient on this front, we refactor the Client event loop to
use a `watch` channel for this. Watch channels only retain the last
value that got sent into them.
Our Sentry client needs to resolve DNS before being able to send logs or
errors to the backend. Currently, this DNS resolution happens on-demand
as we don't take any control of the underlying HTTP client.
In addition, this will use HTTP/1.1 by default which isn't as efficient
as it could be, especially with concurrent requests.
Finally, if we decide to ever proxy all Sentry for traffic through our
own domain, we have to take control of the underlying client anyway.
To resolve all of the above, we create a custom `TransportFactory` where
we reuse the existing `ReqwestHttpTransport` but provide an already
configured `reqwest::Client` that always uses HTTP/2 with a
pre-configured set of DNS records for the given ingest host.
By default, dropping a `tokio` runtime waits until all tasks have
finished. The tasks we spawn within `connlib` can have complex
dependencies with each other. To ensure that we can shut down in any
case and don't hang, we apply a timeout of 1s to the runtime.
These don't really tell us much. It appears that Windows is sometimes
failing to access the pipe but then succeeds on the next attempt, hence
why we have the retry loop in the first place. Logging a warning here
just spams Sentry unnecessarily.
Despite still being in development, the `tauri-specta` project already
proves to be quite useful. It allows us to generate TypeScript bindings
for our commands and events, creating a type-safe contract between the
frontend and the backend.
For example, this ensures that the TypeScript code calls a command
actually with the required parameters and thus avoids runtime failures.
Similarly, the frontend can listen on type-safe events without having to
use any magic strings.
When looking through customer logs, we see a lot of "Resolved best route
outside of tunnel" messages. Those get logged every time we need to
rerun our re-implementation of Windows' weighting algorithm as to which
source interface / IP a packet should be sent from.
Currently, this gets cached in every socket instance so for the
peer-to-peer socket, this is only computed once per destination IP.
However, for DNS queries, we make a new socket for every query. Using a
new source port DNS queries is recommended to avoid fingerprinting of
DNS queries. Using a new socket also means that we need to re-run this
algorithm every time we make a DNS query which is why we see this log so
often.
To fix this, we need to share this cache across all UDP sockets. Cache
invalidation is one of the hardest problems in computer science and this
instance is no different. This cache needs to be reset every time we
roam as that changes the weighting of which source interface to use.
To achieve this, we extend the `SocketFactory` trait with a `reset`
method. This method is called whenever we roam and can then reset a
shared cache inside the `UdpSocketFactory`. The "source IP resolver"
function that is passed to the UDP socket now simply accesses this
shared cache and inserts a new entry when it needs to resolve the IP.
As an added benefit, this may speed up DNS queries on Windows a bit
(although I haven't benchmarked it). It should certainly drastically
reduce the amount of syscalls we make on Windows.
Rust 1.88 has been released and brings with it a quite exciting feature:
let-chains! It allows us to mix-and-match `if` and `let` expressions,
therefore often reducing the "right-drift" of the relevant code, making
it easier to read.
Rust.188 also comes with a new clippy lint that warns when creating a
mutable reference from an immutable pointer. Attempting to fix this
revealed that this is exactly what we are doing in the eBPF kernel.
Unfortunately, it doesn't seem to be possible to design this in a way
that is both accepted by the borrow-checker AND by the eBPF verifier.
Hence, we simply make the function `unsafe` and document for the
programmer, what needs to be upheld.
At present, our primary indicator as to whether telemetry is active is
whether we have a Sentry session. For our analytics events however, we
currently require passing in the Firezone ID and API url again. This
makes it difficult to send analytics events from areas of the code that
don't have this information available.
To still allow for that, we integrate the `analytics` module more
tightly with the Sentry session. This allows us to drop two parameters
from the `$identify` event and also means we now respect the
`NO_TELEMETRY` setting for these events except for `new_session`. This
event is sent regardless because it allows us to track, how many on-prem
installations of Firezone are out there.
A recent release of `tslink` now supports configuration via the
`package.metadata` table which resolved a warning about "unknown key"
that we have seeing for a while.
This log is misplaced within the current `try_connect` function because
that will also be called when we didn't have Internet while we tried to
first connect and then witnessed a network change (in which case the
token is cached in the `WaitingForNetwork` session state).
Thus, we move the log to the `Connect` msg handler where we shouldn't
have any existing session.
Bumps [tslink](https://github.com/icsmw/tslink) from 0.3.0 to 0.4.2.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/icsmw/tslink/blob/master/changelog.md">tslink's
changelog</a>.</em></p>
<blockquote>
<h1>0.4.2 (08.06.2025)</h1>
<h2>Changes</h2>
<ul>
<li>Migrate settings to <code>package.metadata.tslink</code></li>
</ul>
<h1>0.4.1 (08.06.2025)</h1>
<h2>Changes</h2>
<ul>
<li>Add support arrays in the context of <code>const</code></li>
</ul>
<h1>0.4.0 (08.06.2025)</h1>
<h2>Features</h2>
<ul>
<li>Add support of <code>const</code> for primitive types</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/icsmw/tslink/commits">compare view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
You can trigger a rebase of this PR by commenting `@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
> **Note**
> Automatic rebases have been disabled on this pull request as it has
been open for over 30 days.
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Instead of conditionally enabling the `logs` feature in the Sentry
client, we always enable it and control via the `tracing` integration,
which events should get forwarded to Sentry. The feature-flag check
accesses only shared-memory and is therefore really fast.
We already re-evaluate feature flags on a timer which means this boolean
will flip over automatically and logs will be streamed to Sentry.
Customer hit what seems to be a rare race condition where we try to
connect whilst we already have a session. I don't know which state it is
in so I am replacing it with a WARN log to learn more about this in
Sentry in case it gets hit again.
I believe some of the recent changes around how we load the
`firezone-id.json` from the GUI client surfaced that we in fact don't
always have access to it. Previously, this was silenced because we would
only optionally add it as context to the Sentry client.
Now, we need it to initialise telemetry so we know whether or not to
send logs to Sentry.
In order to be able to access the file, we need to change the config's
directory and the file to be owned by the `firezone-client` group.
The GUI client binary performs quite a few checks prior to setting up
logging. In order to log at least something, we have a bootstrap logger
config that logs to stdout based on the `RUST_LOG` env var.
However, in the context of an error, the logger guard was dropped to
early and therefore we couldn't actually see the error.
To fix this, we pass a mutable `Option` in to `try_main` instead. This
allows the function to drop the bootstrap logger once the real one is
set up but also keep logging using the bootstrap logger in case of an
error.
Originally, we introduced these to gather some data from logs / warnings
that we considered to be too spammy. We've since merged a
burst-protection that will at most submit the same event once every 5
minutes.
The data from the telemetry spans themselves have not been used at all.
I suspect that the new Windows runners are "too fast" and we hit a race
condition in the use of the keyring on Windows which causes failing CI
jobs. The attempt to fix this is to sleep for 1 seconds before every
assert in the test.
Sentry has a new "Logs" feature where we can stream logs directly to
Sentry. Doing this for all Clients and Gateways would be way too much
data to collect though.
In order to aid debugging from customer installations, we add a
PostHog-managed feature flag that - if set to `true` - enables the
streaming of logs to Sentry. This feature flag is evaluated every time
the telemetry context is initialised:
- For all FFI usages of connlib, this happens every time a new session
is created.
- For the Windows/Linux Tunnel service, this also happens every time we
create a new session.
- For the Headless Client and Gateway, it happens on startup and
afterwards, every minute. The feature-flag context itself is only
checked every 5 minutes though so it might take up to 5 minutes before
this takes effect.
The default value - like all feature flags - is `false`. Therefore, if
there is any issue with the PostHog service, we will fallback to the
previous behaviour where logs are simply stored locally.
Resolves: #9600
In order to more easily target customers with certain feature flags, we
include the `account_slug` in the `$identify` event to PostHog. This
will allow us to create Cohorts in PostHog and enable / disable feature
flags for all installations of Firezone for a particular customer.
At present, listening for DNS server change and network change events is
handled in the GUI client. Upon an event, a message is sent to the
tunnel service which then applies the new state to `connlib`.
We can avoid some of this boilerplate by moving these listeners to the
tunnel service as part of the handler. As a result, we get a few
improvements:
- We don't need to ignore these events if we don't have a session
because the lifetime of these listeners is tied to the IPC handler on
the service side.
- We need fewer IPC messages
- We can retry the connection directly from within the tunnel service in
case we have no Internet at the time of startup
- We can more easily model out the state machine of a connlib session in
the tunnel service
- On Linux, this means we no longer shell out to `resolvectl` from the
GUI process, unifying access to the "resolvers" from the tunnel service
- On Windows, we no longer need admin privileges on the GUI client for
optimized network-change detection. This now happens in the Tunnel
process which already runs as admin.
Resolves: #9465
With the introduction of the "connect on start" configuration option, we
introduced a bug where the GUI client said "Signed in as ..." even
though we did not have a `connlib` session. The tray-menu handles this
state correctly and clicking sign out and sign in restores Firezone to a
functional state.
This disparity happened because we assumed that having a token means we
must have a session.
To fix this, we introduce a new `SessionViewModel` that combines the
state of the auth session and the `connlib` state. Only if we have both
do we infer that we are "signed in". This also requires us to introduce
an intermediary state where we are "loading". This is represented as a
spinner in the UI.
Last but not least, this also removes the automated hiding of the client
window. In a prior design, the only job of this window was to show the
"Sign in" button so it wasn't useful beyond clicking that. Now that we
show more things in this window, automatically hiding it might confuse
the user.
Here is what this new design looks like:
[Login
flow](https://github.com/user-attachments/assets/276e390b-4837-48e2-aaf1-eea007472816)
As a result of other improvements around "zero-click sign-in", the user
often doesn't even have to switch to the browser window because sign-in
happens in the background. Unfortunately, the tab still remains open but
that is outside of our control (at least on Linux).
As part of investigating #9400, I refactored the startup code of the GUI
client to play around with the initialization order of the runtime and
other parts. After finding the root cause (fixed in #9469), I figured it
would still be nice to land these improvements.
This refactors the app startup:
- Only initialize what is absolutely necessary outside of `try_main`:
The runtime and the telemetry instance.
- Settings are loaded earlier
- Telemetry is initializer earlier
- Add a bootstrap logger that logs to stdout whilst we are booting
- Re-order some code inside `gui::run` to bundle the initialization of
Tauri into a single command chain
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
When introducing the MDM config, we took into account the log directives
from the config when applying it via the GUI but failed to apply the new
directives on startup.
When debugging logging is enabled, we see that Tauri currently iterates
through several possible paths of the frontend page before it finds the
one that it actually renders. We can fix this by specifying the URL
without the directory path.
```
2025-06-06T08:23:06.538Z DEBUG tauri::manager: Asset `src-frontend/index.html` not found; fallback to src-frontend/index.html.html
2025-06-06T08:23:06.538Z DEBUG tauri::manager: Asset `src-frontend/index.html` not found; fallback to src-frontend/index.html/index.html
2025-06-06T08:23:06.539Z DEBUG tauri::manager: Asset `src-frontend/index.html` not found; fallback to index.html
2025-06-06T08:23:06.566Z DEBUG firezone_gui_client::controller: Starting new instance of `Controller`
```
The GUI client's update checker is a task that runs concurrently to the
main event-loop of the client. When a new update is found, it sends a
notification through a channel. When the update checker is disabled via
MDM, this channel is instantly closed, resulting in `None` being read as
part of the event-loop in the client.
Previously, we would then return
`Poll::Ready(EventloopTick::UpdateNotification(None))` and then simply
ignore this event. This however is wrong. Returning `Poll::Ready` means
that the future needs to be polled again, resulting in an effective
busy-loop where we constantly emit the above event and then ignore it.
Due to the priority order that was previously defined, this led to us
never checking another channel: The one where we receive deep-links from
2nd GUI instances.
As a result, signing into the client when the update checker was
disabled does not work and instead, the newly launched instance hangs
because it can never send over the deep-link.
Resolves: #9420
As part of the introduction of General settings, we split up "Advanced
settings" and also renamed one of the fields. Upon first start, the
settings are migrated to the new format. What we failed to notice is
that one the next subsequent start, the legacy settings struct will fail
to parse the now migrated configuration and fall back to the default.
This then appears as if the settings are not getting saved.
Resolves: #9417
---------
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
This PR adds basic analytics to `connlib` by sending two events to
PostHog:
1. `new_session` which is sent every time we establish a new session
with a Firezone backend. This could be our production or staging
instance but also a session to an on-premise installation of Firezone.
We include the API URL in the event payload to further distinguish
these.
2. `$identify` to link the client + version as well as the operating
system to the user. The user is identified by the Firezone ID.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>