Our Sentry client needs to resolve DNS before being able to send logs or
errors to the backend. Currently, this DNS resolution happens on-demand
as we don't take any control of the underlying HTTP client.
In addition, this will use HTTP/1.1 by default which isn't as efficient
as it could be, especially with concurrent requests.
Finally, if we decide to ever proxy all Sentry for traffic through our
own domain, we have to take control of the underlying client anyway.
To resolve all of the above, we create a custom `TransportFactory` where
we reuse the existing `ReqwestHttpTransport` but provide an already
configured `reqwest::Client` that always uses HTTP/2 with a
pre-configured set of DNS records for the given ingest host.
By default, dropping a `tokio` runtime waits until all tasks have
finished. The tasks we spawn within `connlib` can have complex
dependencies with each other. To ensure that we can shut down in any
case and don't hang, we apply a timeout of 1s to the runtime.
When looking through customer logs, we see a lot of "Resolved best route
outside of tunnel" messages. Those get logged every time we need to
rerun our re-implementation of Windows' weighting algorithm as to which
source interface / IP a packet should be sent from.
Currently, this gets cached in every socket instance so for the
peer-to-peer socket, this is only computed once per destination IP.
However, for DNS queries, we make a new socket for every query. Using a
new source port DNS queries is recommended to avoid fingerprinting of
DNS queries. Using a new socket also means that we need to re-run this
algorithm every time we make a DNS query which is why we see this log so
often.
To fix this, we need to share this cache across all UDP sockets. Cache
invalidation is one of the hardest problems in computer science and this
instance is no different. This cache needs to be reset every time we
roam as that changes the weighting of which source interface to use.
To achieve this, we extend the `SocketFactory` trait with a `reset`
method. This method is called whenever we roam and can then reset a
shared cache inside the `UdpSocketFactory`. The "source IP resolver"
function that is passed to the UDP socket now simply accesses this
shared cache and inserts a new entry when it needs to resolve the IP.
As an added benefit, this may speed up DNS queries on Windows a bit
(although I haven't benchmarked it). It should certainly drastically
reduce the amount of syscalls we make on Windows.
As a first step in preparation for sending OTEL metrics from Clients and
Gateways to a cloud-hosted OTEL collector, we extend the CLI of the
Gateway with configuration options to provide a gRPC endpoint to an OTEL
collector.
If `FIREZONE_METRICS` is set to `otel-collector` and an endpoint is
configured via `OTLP_GRPC_ENDPOINT`, we will report our metrics to that
collector.
The future plan for extending this is such that if `FIREZONE_METRICS` is
set to `otel-collector` (which will likely be the default) and no
`OTLP_GRPC_ENDPOINT` is set, then we will use our own, hosted OTEL
collector and report metrics IF the `export-metrics` feature-flag is set
to `true`.
This is a similar integration as we have done it with streaming logs to
Sentry. We can therefore enable it on a similar granularity as we do
with the logs and e.g. only enable it for the `firezone` account to
start with.
In meantime, customers can already make use of those metrics if they'd
like by using the current integration.
Resolves: #1550
Related: #7419
---------
Co-authored-by: Antoine Labarussias <antoinelabarussias@gmail.com>
At present, our primary indicator as to whether telemetry is active is
whether we have a Sentry session. For our analytics events however, we
currently require passing in the Firezone ID and API url again. This
makes it difficult to send analytics events from areas of the code that
don't have this information available.
To still allow for that, we integrate the `analytics` module more
tightly with the Sentry session. This allows us to drop two parameters
from the `$identify` event and also means we now respect the
`NO_TELEMETRY` setting for these events except for `new_session`. This
event is sent regardless because it allows us to track, how many on-prem
installations of Firezone are out there.
Instead of conditionally enabling the `logs` feature in the Sentry
client, we always enable it and control via the `tracing` integration,
which events should get forwarded to Sentry. The feature-flag check
accesses only shared-memory and is therefore really fast.
We already re-evaluate feature flags on a timer which means this boolean
will flip over automatically and logs will be streamed to Sentry.
Originally, we introduced these to gather some data from logs / warnings
that we considered to be too spammy. We've since merged a
burst-protection that will at most submit the same event once every 5
minutes.
The data from the telemetry spans themselves have not been used at all.
Sentry has a new "Logs" feature where we can stream logs directly to
Sentry. Doing this for all Clients and Gateways would be way too much
data to collect though.
In order to aid debugging from customer installations, we add a
PostHog-managed feature flag that - if set to `true` - enables the
streaming of logs to Sentry. This feature flag is evaluated every time
the telemetry context is initialised:
- For all FFI usages of connlib, this happens every time a new session
is created.
- For the Windows/Linux Tunnel service, this also happens every time we
create a new session.
- For the Headless Client and Gateway, it happens on startup and
afterwards, every minute. The feature-flag context itself is only
checked every 5 minutes though so it might take up to 5 minutes before
this takes effect.
The default value - like all feature flags - is `false`. Therefore, if
there is any issue with the PostHog service, we will fallback to the
previous behaviour where logs are simply stored locally.
Resolves: #9600
In order to more easily target customers with certain feature flags, we
include the `account_slug` in the `$identify` event to PostHog. This
will allow us to create Cohorts in PostHog and enable / disable feature
flags for all installations of Firezone for a particular customer.
A bit of legacy that we have inherited around our Firezone ID is that
the ID stored on the user's device is sha'd before being passed to the
portal as the "external ID". This makes it difficult to correlate IDs in
Sentry and PostHog with the data we have in the portal. For Sentry and
PostHog, we submit the raw UUID stored on the user's device.
As a first step in overcoming this, we embed an "external ID" in those
services as well IF the provided Firezone ID is a valid UUID. This will
allow us to immediately correlate those events.
As a second step, we automatically generate all new Firezone IDs for the
Windows and Linux Client as `hex(sha256(uuid))`. These won't parse as
valid UUIDs and therefore will be submitted as is to the portal.
As a third step, we update all documentation around generating Firezone
IDs to use `uuidgen | sha256` instead of just `uuidgen`. This is
effectively the equivalent of (2) but for the Headless Client and
Gateway where the Firezone ID can be configured via environment
variables.
Resolves: #9382
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Bumps [nix](https://github.com/nix-rust/nix) from 0.29.0 to 0.30.1.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/nix-rust/nix/blob/master/CHANGELOG.md">nix's
changelog</a>.</em></p>
<blockquote>
<h2>[0.30.1] - 2025-05-04</h2>
<h3>Fixed</h3>
<ul>
<li>doc.rs build
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2634">#2634</a>)</li>
</ul>
<h2>[0.30.0] - 2025-04-29</h2>
<h3>Added</h3>
<ul>
<li>Add socket option <code>IPV6_PKTINFO</code> for BSDs/Linux/Android,
also
<code>IPV6_RECVPKTINFO</code> for DragonFlyBSD
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2113">#2113</a>)</li>
<li>Add <code>fcntl</code>'s <code>F_PREALLOCATE</code> constant for
Apple targets.
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2393">#2393</a>)</li>
<li>Improve support for extracting the TTL / Hop Limit from incoming
packets
and support for DSCP (ToS / Traffic Class).
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2425">#2425</a>)</li>
<li>Add socket option IP_TOS (nix::sys::socket::sockopt::IpTos)
IPV6_TCLASS
(nix::sys::socket::sockopt::Ipv6TClass) on Android/FreeBSD
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2464">#2464</a>)</li>
<li>Add <code>SeekData</code> and <code>SeekHole</code> to
<code>Whence</code> for hurd and apple targets
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2473">#2473</a>)</li>
<li>Add <code>From</code> trait implementation between
<code>SocketAddr</code> and <code>Sockaddr</code>,
<code>Sockaddr6</code> (<a
href="https://redirect.github.com/nix-rust/nix/pull/2474">#2474</a>)</li>
<li>Added wrappers for <code>posix_spawn</code> API
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2475">#2475</a>)</li>
<li>Add the support for Emscripten.
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2477">#2477</a>)</li>
<li>Add fcntl constant <code>F_RDADVISE</code> for Apple target
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2480">#2480</a>)</li>
<li>Add fcntl constant <code>F_RDAHEAD</code> for Apple target
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2482">#2482</a>)</li>
<li>Add <code>F_LOG2PHYS</code> and <code>F_LOG2PHYS_EXT</code> for
Apple target
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2483">#2483</a>)</li>
<li><code>MAP_SHARED_VALIDATE</code> was added for all linux targets.
& <code>MAP_SYNC</code> was added
for linux with the exclusion of mips architecures, and uclibc
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2499">#2499</a>)</li>
<li>Add
<code>getregs()</code>/<code>getregset()</code>/<code>setregset()</code>
for Linux/musl/aarch64
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2502">#2502</a>)</li>
<li>Add FcntlArgs <code>F_TRANSFEREXTENTS</code> constant for Apple
targets
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2504">#2504</a>)</li>
<li>Add <code>MapFlags::MAP_STACK</code> in <code>sys::man</code> for
netbsd
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2526">#2526</a>)</li>
<li>Add support for <code>libc::LOCAL_PEERTOKEN</code> in
<code>getsockopt</code>.
(<a
href="https://redirect.github.com/nix-rust/nix/pull/2529">#2529</a>)</li>
<li>Add support for <code>syslog</code>, <code>openlog</code>,
<code>closelog</code> on all <code>unix</code>.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3cf9007216"><code>3cf9007</code></a>
chore: drop 0.30.1</li>
<li><a
href="2845ab9e4e"><code>2845ab9</code></a>
Compile sys::mman on Redox (<a
href="https://redirect.github.com/nix-rust/nix/issues/2637">#2637</a>)</li>
<li><a
href="fccb4abfc8"><code>fccb4ab</code></a>
Fix fuchsia target triple to unbreak docs.rs build (<a
href="https://redirect.github.com/nix-rust/nix/issues/2634">#2634</a>)</li>
<li><a
href="b834171547"><code>b834171</code></a>
ci: disable hurd (<a
href="https://redirect.github.com/nix-rust/nix/issues/2638">#2638</a>)</li>
<li><a
href="9c97e1df15"><code>9c97e1d</code></a>
Clippy cleanup: dangerous_implicit_autorefs and
uninlined_format_args</li>
<li><a
href="989291d5bf"><code>989291d</code></a>
chore: release 0.30.0</li>
<li><a
href="6a1c5b8d5b"><code>6a1c5b8</code></a>
Remove Copy from PollFd (<a
href="https://redirect.github.com/nix-rust/nix/issues/2631">#2631</a>)</li>
<li><a
href="eba0f41bff"><code>eba0f41</code></a>
chore: pin libc to 0.2.171 & bump CI image (<a
href="https://redirect.github.com/nix-rust/nix/issues/2632">#2632</a>)</li>
<li><a
href="b561476e1d"><code>b561476</code></a>
socket::sockopt AttachReusePortCbpf for Linux addition. (<a
href="https://redirect.github.com/nix-rust/nix/issues/2621">#2621</a>)</li>
<li><a
href="684b79edb6"><code>684b79e</code></a>
Add sockopt::PeerPidfd (SO_PEERPIDFD) sockopt support to socket::sockopt
(<a
href="https://redirect.github.com/nix-rust/nix/issues/2620">#2620</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/nix-rust/nix/compare/v0.29.0...v0.30.1">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
This PR adds basic analytics to `connlib` by sending two events to
PostHog:
1. `new_session` which is sent every time we establish a new session
with a Firezone backend. This could be our production or staging
instance but also a session to an on-premise installation of Firezone.
We include the API URL in the event payload to further distinguish
these.
2. `$identify` to link the client + version as well as the operating
system to the user. The user is identified by the Firezone ID.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
As relict from very early designs of `connlib`, the `Callbacks` trait is
still present and defines how the host app receives events from a
running `Session`. Callbacks are not a great design pattern however
because they force the running code, i.e. `connlib`s event-loop to
execute unknown code. For example, if that code panics, all of `connlib`
is taken down. Additionally, not all consumers may want to receive
events via callbacks. The GUI and headless client for example already
have their own event-loop in which they process all kinds of things.
Having to deal with the `Callbacks` interface introduces an odd
indirection here.
To fix this, we instead return an `EventStream` when constructing a
`Session`. This essentially aligns the API of `Session` with that of a
channel. You receive two handles, one for sending in commands and one
for receiving events. A `Session` will automatically spawn itself onto
the given runtime so progress is made even if one does not poll on these
channel handles.
This greatly simplifies the code:
- We get to delete the `Callbacks` interface.
- We can delete the threaded callback adapter. This was only necessary
because we didn't want to block `connlib` with the handling of the
event. By using a channel for events, this is automatically guaranteed.
- The GUI and headless client can directly integrate the event handling
in their event-loop, without having to create an indirection with a
channel.
- It is now clear that only the Apple and Android FFI layers actually
use callbacks to communicate these events.
- We net-delete 100 LoC
The name IPC service is not very descriptive. By nature of being
separate processes, we need to use IPC to communicate between them. The
important thing is that the service process has control over the tunnel.
Therefore, we rename everything to "Tunnel service".
The only part that is not changed are historic changelog entries.
Resolves: #9048
The current `rust/` directory is a bit of a wild-west in terms of how
the crates are organised. Most of them are simply at the top-level when
in reality, they are all `connlib`-related. The Apple and Android FFI
crates - which are entrypoints in the Rust code are defined several
layers deep.
To improve the situation, we move around and rename several crates. The
end result is that all top-level crates / directories are:
- Either entrypoints into the Rust code, i.e. applications such as
Gateway, Relay or a Client
- Or crates shared across all those entrypoints, such as `telemetry` or
`logging`
The module and crate structure around the GUI client and its background
service are currently a mess of circular dependencies. Most of the
service implementation actually sits in `firezone-headless-client`
because the headless-client and the service share certain modules. We
have recently moved most of these to `firezone-bin-shared` which is the
correct place for these modules.
In order to move the background service to `firezone-gui-client`, we
need to untangle a few more things in the GUI client. Those are done
commit-by-commit in this PR. With that out the way, we can finally move
the service module to the GUI client; where is should actually live
given that it has nothing to do with the headless client.
As a result, the headless-client is - as one would expect - really just
a thin wrapper around connlib itself and is reduced down to 4 files with
this PR.
To make things more consistent in the GUI client, we move the `main.rs`
file also into `bin/`. By convention `bin/` is where you define binaries
if a crate has more than one. cargo will then build all of them.
Eventually, we can optimise the compile-times for `firezone-gui-client`
by splitting it into multiple crates:
- Shared structs like IPC messages
- Background service
- GUI client
This will be useful because it allows only re-compiling of the GUI
client alone if nothing in `connlib` changes and vice versa.
Resolves: #6913Resolves: #5754
Both `device_id` and `device_info` are used by the headless-client and
the GUI client / IPC service. They should therefore be defined in the
`bin-shared` crate.
Currently, the platform-specific code for controlling DNS resolution on
a system sits in `firezone-headless-client`. This code is also used by
the GUI client. This creates a weird compile-time dependency from the
GUI client to the headless client.
For other components that have platform-specific implementations, we use
the `firezone-bin-shared` crate. As a first step of resolving the
compile-time dependency, we move the `dns_control` module to
`firezone-bin-shared`.
The `signals` module isn't something headless-client specific and should
live in our `bin-shared` crate. Once the `ipc_service` module is
decoupled from the headless-client crate, it will be used by both the
headless client and IPC service (which then will be defined in the GUI
client crate).
The `known_dirs` module is used across the headless-client and the GUI
client. It should live in `bin-shared` where all the other
cross-platform modules are.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
The `uptime` module from `firezone-headless-client` is also used in the
GUI client. In order to decouple this dependency, we move the module to
`bin-shared`, next to the other cross-plaform modules.
Once we start collecting metrics across various Clients and Gateways,
these metrics need to be tagged with the correct `service.name`,
`service.version` as well as an instance ID to differentiate metrics
from different instances.
Currently, when `connlib`'s log file gets deleted, we write logs into
nirvana until the corresponding process gets restarted. This is painful
for users to do because they need to restart the IPC service or Network
Extension. Instead, we can simply check if the log file exists prior to
writing to it and re-create it if it doesn't.
Resolves: #6850
Related: #7569
When working on the Rust code of Firezone from a MacOS computer, it is
useful to have pretty much all of the code at least compile to ensure
detect problems early. Eventually, once we target features like a
headless MacOS client, some of these stubs will actually be filled in an
be functional.
Having multiple threads for reading and writing the TUN device can cause
packet re-orderings on the client. All other clients only use a single
TUN thread, so aligning this value means a more consistent behaviour of
Firezone across all platforms.
This PR adds opentelemetry-based packet counter metrics to `connlib`. By
default, the collection of these metrics of disabled. Without a
registered metrics-provider, gathering these metrics are effectively
no-ops. They will still incur 1 or 2 function calls per packet but that
should be negligible compared to other operations such as encryption /
decryption.
With this system in place, we can in the future add more metrics to make
debugging easier.
Bumps [windows-service](https://github.com/mullvad/windows-service-rs)
from 0.7.0 to 0.8.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/mullvad/windows-service-rs/blob/main/CHANGELOG.md">windows-service's
changelog</a>.</em></p>
<blockquote>
<h2>[0.8.0] - 2025-02-19</h2>
<h3>Added</h3>
<ul>
<li>Add missing ServiceAccess flags <code>READ_CONTROL</code>,
<code>WRITE_DAC</code> and <code>WRITE_OWNER</code>.</li>
</ul>
<h3>Changed</h3>
<ul>
<li>Upgrade <code>windows-sys</code> dependency to 0.59 and bump the
MSRV to 1.60.0.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ffaaf80ae3"><code>ffaaf80</code></a>
Bump version to 0.8.0 and add changelog</li>
<li><a
href="c6afc56e86"><code>c6afc56</code></a>
Bump windows-sys version to 0.59</li>
<li><a
href="96efa4ee71"><code>96efa4e</code></a>
Merge commit '9dc8af8'</li>
<li><a
href="9dc8af8513"><code>9dc8af8</code></a>
Add missing standard access rights</li>
<li>See full diff in <a
href="https://github.com/mullvad/windows-service-rs/compare/v0.7.0...v0.8.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
As a follow-up from #7959, we can now simplify the error handling a fair
bit as all codepaths that can fail in the client are threaded back to
the main function.
Within the event-loop, we already react to the channel being closed
which happens when the `Sender` within the `Session` gets dropped. As
such, there is no need to send an explicit `Stop` command, dropping the
`Session` is equivalent.
As it turns out, `swift-bridge` already calls `Drop` for us when the
last pointer is set to `nil`:
280a9dd999/swift/apple/FirezoneNetworkExtension/Connlib/Generated/connlib-client-apple/connlib-client-apple.swift (L24-L28)
Thus, we can also remove the explicit `disconnect` call to
`WrappedSession` entirely.
In order for search-domains to work on Windows, we need to set the
`SearchList` registry key for our interface. This will result in Windows
sending us a DNS query with the expanded domain name from the search
list which we can then process like normal DNS queries.
Related: #8410