Commit Graph

53 Commits

Author SHA1 Message Date
Thomas Eizinger
85ab395276 chore(telemetry): flush before ending session (#9150)
I am not sure if this is currently breaking anything but it seems more
correct to flush all events first and then end the session.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2025-05-15 11:58:58 +00:00
Thomas Eizinger
37529803ce build(rust): bump otel ecosystem crates to 0.29 (#9029) 2025-05-05 12:33:07 +00:00
Thomas Eizinger
ea5709e8da chore(rust): initialise OTEL with useful metadata (#8945)
Once we start collecting metrics across various Clients and Gateways,
these metrics need to be tagged with the correct `service.name`,
`service.version` as well as an instance ID to differentiate metrics
from different instances.
2025-05-01 05:19:07 +00:00
Thomas Eizinger
07a82d2254 chore(relay): remove feature flag for eBPF TURN router (#8681)
The original idea of this feature flag was that we can easily disable
the eBPF router in case it causes issues in production. However,
something seems to be not working in reliably turning this on / off.
Without an explicit toggle of the feature-flag, the eBPF program doesn't
seem to be loaded correctly. The uncertainty in this makes me not the
trust the metrics that we are seeing because we don't know, whether
really all relays are using the eBPF router to relay TURN traffic.

In order to draw truthful conclusions as too how much traffic we are
relaying via eBPF, this patch removes the feature flag again. As of
#8656, we can disable the eBPF program by not setting the
`EBPF_OFFLOADING` env variable. This requires a re-deploy / restart of
relays to take effect which isn't quite as fast as toggling a feature
flag but much reliable and easier to maintain.
2025-04-07 03:31:22 +00:00
Thomas Eizinger
941ef6c668 feat(relay): introduce feature-flag for toggling eBPF program (#8650)
This PR implements a feature-flag in PostHog that we can use to toggle
the use of the eBPF data plane at runtime. At every tick of the
event-loop, the relay will compare the (cached) configuration of the
eBPF program with the (cached) value of the feature-flag. If they
differ, the flag will be updated and upon the next packet, the eBPF
program will act accordingly.

Feature-flags are re-evaluated every 5 minutes, meaning there is some
delay until this gets applied.

The default value of our all our feature-flags is `false`, meaning if
there is some problem with evaluating them, we'd turn the eBPF data
plane off. Performing routing in userspace is slower but it is a safer
default.

Resolves: #8548
2025-04-04 02:51:52 +00:00
Thomas Eizinger
3ce3c03291 fix(telemetry): introduce staging and prod PostHog projects (#8647)
As per PostHog's recommendation [0], we now use different projects to
manage the feature-flags. This allows us to turn feature flags in
staging or production on / off without affecting the other.

[0]: https://posthog.com/tutorials/multiple-environments
2025-04-04 01:56:28 +00:00
Thomas Eizinger
8ee1cb9e89 feat(telemetry): include environment in decide request (#8616)
This allows us to toggle feature-flags based on environments.
2025-04-03 11:25:03 +00:00
Thomas Eizinger
84a2c275ca build(rust): upgrade to Rust 1.85 and Edition 2024 (#8240)
Updates our codebase to the 2024 Edition. For highlights on what
changes, see the following blogpost:
https://blog.rust-lang.org/2025/02/20/Rust-1.85.0.html
2025-03-19 02:58:55 +00:00
Thomas Eizinger
e54a7c2d64 feat(connlib): regularly evaluate feature flags (#8467)
In order to be able to dynamically configure long-running applications
such as the Gateway via feature-flags, we need to regularly re-evaluate
them by sending another POST request to the `/decide` endpoint.

To do this without impacting anything else, we create a separate runtime
that is lazily initialised on first access and use that to run the async
code for connecting to the PostHog service. In addition to that, we also
spawn a task that re-evaluates the feature flags for the currently set
user in the Sentry context every 5 minutes.

Resolves: #8454

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-03-17 23:50:54 +00:00
Thomas Eizinger
d05226211b fix(connlib): don't respond to LLMNR queries with NXDOMAIN (#8426)
I suspect that one issue as part local discovery is that we respond to
LLMNR queries with NXDOMAIN if the domain isn't a resource. This is
probably wrong. LLMNR works over multicast so if a particular interface
can't respond to a query with records, it should probably not respond at
all.

Related: #8266
2025-03-13 20:36:01 +00:00
Thomas Eizinger
e32eeec78d fix(telemetry): correctly deserialise feature flags (#8425)
Our Posthog integration was so lenient in regards to errors that I
didn't even notice at all that we failed to deserialise them correctly.
In Posthog, I configured the feature flags with `kebab-case` but we
tried to deserialise them as `snake_case`.
2025-03-13 09:30:10 +00:00
Thomas Eizinger
c51488bda4 refactor(connlib): use RwLock for feature flags (#8397)
Most of the time, these flags are only read from and not written thus.
By using a read-lock, we make sure that even when we use feature-flags
from multiple threads, they don't cause any contention.
2025-03-10 06:10:51 +00:00
Thomas Eizinger
d46ce9ab94 chore(connlib): setup feature-flag infrastructure (#8382)
In order to more safely roll out certain changes, being able to
runtime-toggle features is crucial. For this purpose, we build a simple
integration with Posthog that allows us to evaluate feature flags based
on the Firezone ID of a Client or Gateway.

The feature flags are also set in a dedicated context for Sentry events.
This allows us to see, which feature flags were active when a certain
error is logged to Sentry.
2025-03-08 02:07:46 +00:00
Thomas Eizinger
72782b8389 fix(gui-client): update telemetry context on new session (#8152)
Every time we start a new session, our telemetry context potentially
changes, i.e. the user may sign into a new account. This should ensure
that both the IPC service and the GUI always use the most up-to-date
`account_slug` as part of Sentry events. In addition, this will also set
the `account_slug` for clients that just signed in. Previously, the
`account_slug` would only get populated on the next start of the client.
2025-02-17 03:29:08 +00:00
Thomas Eizinger
c5381b0e54 fix(telemetry): always clear previous Sentry session (#8075)
We have a bug in our Rust telemetry code where starting a new telemetry
session for an **unsupported** environment doesn't stop the previous one
if one already exists.

This results in very confusing Sentry issues that cannot be correlated
to our infrastructure.
2025-02-11 00:54:35 +00:00
Thomas Eizinger
c595bb872d chore(telemetry): use same DSN for GUI and IPC service (#7667)
The application-split itself doesn't really warrant having two different
Sentry projects.

1. The location of the panic / log already tells us, which component is
failing.
2. Both of the projects are built with Rust so the same "platform"
setting applies.
3. Reducing the number of Sentry projects makes things easier to manage.
4. The binaries are started as independent processes, so the two Sentry
contexts don't interfere.

What we should keep in mind is that one instance of an application will
now log into Sentry twice using the same DSN. I _think_ this means that
the number of sessions listed in Sentry will be double the number of
actual client-runs. The same is true for the Apple client though and
once we integrate Sentry for Android, the same will apply there so
relative to each other, those numbers still make sense.
2025-01-05 18:26:45 +00:00
Thomas Eizinger
4dea4049da fix(telemetry): remove need for tracking our own user state (#7561)
Previously, we needed to track our own user state in order to set the
whole thing in Sentry. That was necessary because Sentry didn't allow us
to _retrieve_ the current user of the scope but always required the full
user to be set. This was changed
https://github.com/getsentry/sentry-rust/pull/715, which allows us to
remove some of that code and hopefully mitigating any sort of lingering
state when it comes to telemetry sessions.
2024-12-20 04:27:34 +00:00
Thomas Eizinger
98be884c3a fix(telemetry): dispose previous session when starting new one (#7542)
For persistent applications like the IPC service, it is possible that
telemetry gets initialised with different parameters depending on what
the user logs in with. Currently, only the first one is persisted and
all consecutive ones are ignored, leading to events that may be wrongly
tagged for a certain user / environment.

To fix this, we only skip the init if we are still in the same
environment. Otherwise, the close the previous session and initialise a
new one.

Fixes: #7525.
2024-12-17 16:22:38 +00:00
Thomas Eizinger
87c3e4dd86 fix(telemetry): disable for unofficial environments (#7482)
On the one hand, learning about in which edgecases our software fails is
useful and thus having telemetry also active for self-hosted users is
beneficial. On the other hand, we have neither control nor a contact to
those self-hosted and whatever they are doing might spam our Sentry
account with errors that we can't do anything about.

To mitigate this, we disable telemetry for self-hosted users with the
next release.

Once we have more resources, we can consider enabling this again.
2024-12-11 19:03:48 +00:00
Thomas Eizinger
9b8e4d1764 chore(telemetry): remove outdated comments (#7483)
We are no longer using `ArcSwap` here.
2024-12-11 19:02:30 +00:00
Thomas Eizinger
dd6b52b236 chore(rust): share edition key via workspace table (#7451) 2024-12-03 00:28:06 +00:00
Thomas Eizinger
8bc1277c24 fix(telemetry): include span attributes in breadcrumbs (#7421)
This is another attempt at fixing #7386. Previous PR was #7379. The
difference is, this time it works! In the following screenshot,
`handle_input` is a currently active span.


![image](https://github.com/user-attachments/assets/0845d566-8ca7-4ba2-8786-9c5819cdfd48)

I had to make some patches to Sentry, most notably:

- https://github.com/getsentry/sentry-rust/pull/708
- https://github.com/getsentry/sentry-rust/pull/712

The way we configure Sentry is quite tricky:

First and foremost, we need to understand that the `tracing` adapter for
Sentry has a `span_filter` configuration. When a span gets filtered out
there, the rest of `sentry-tracing` never sees the data in that span.
Thus, in order to capture variables from spans, we need to have a fairly
generous span filter. In this PR, we change this span filter to include
all spans except those on TRACE level.

Secondly, by default, the Sentry SDK doesn't send any spans to the
backend, i.e. the sampling rate is 0. Previously, we set the sampling
rate to 1.0 because the `span_filter` was already filtering out all
non-telemetry spans. A telemetry span is a concept that we invented. It
is a span that gets sampled at _creation_ time with a probability of 1%.
This is useful because creating a lot of spans is also expensive, so we
don't want to do it e.g. on a per-packet basis. With just these
configuration options, we now have a problem: We don't want to submit
all spans to Sentry but we need the `span_filter` to allow all spans
otherwise we can't capture the contextual fields from the span in
breadcrumbs. Luckily, the Sentry SDK has another configuration option:
`traces_sampler`.

The `traces_sampler` gets to compute a sampling rate for each individual
span. This allows us to discard all spans from being sent to Sentry
unless they are `telemetry` spans.

Resolves: #7386.
2024-12-02 20:00:35 +00:00
Thomas Eizinger
973a806707 feat(relay): add Sentry crash reporting (#7406)
In addition to monitoring clients and gateways, it is also useful to
monitor relays in the same way. This gives us alerts on ERROR and WARN
messages logged by the relay as well as panics.
2024-11-28 21:53:21 +00:00
Thomas Eizinger
2c26fc9c0e ci: lint Rust dependencies using cargo deny (#7390)
One of Rust's promises is "if it compiles, it works". However, there are
certain situations in which this isn't true. In particular, when using
dynamic typing patterns where trait objects are downcast to concrete
types, having two versions of the same dependency can silently break
things.

This happened in #7379 where I forgot to patch a certain Sentry
dependency. A similar problem exists with our `tracing-stackdriver`
dependency (see #7241).

Lastly, duplicate dependencies increase the compile-times of a project,
so we should aim for having as few duplicate versions of a particular
dependency as possible in our dependency graph.

This PR introduces `cargo deny`, a linter for Rust dependencies. In
addition to linting for duplicate dependencies, it also enforces that
all dependencies are compatible with an allow-list of licenses and it
warns when a dependency is referred to from multiple crates without
introducing a workspace dependency. Thanks to existing tooling
(https://github.com/mainmatter/cargo-autoinherit), transitioning all
dependencies to workspace dependencies was quite easy.

Resolves: #7241.
2024-11-22 00:17:28 +00:00
Thomas Eizinger
186c485280 revert: include span fields in breadcrumb messages (#7384)
Reverts #7379.

Unfortunately, this doesn't actually work because those fields are only
recorded as part of spans that get sampled, see
https://github.com/getsentry/sentry-rust/issues/617#issuecomment-2487058619.
If we were to start recording all spans, we'd have a massive overhead
and send lots of spans to Sentry.
2024-11-21 01:17:52 +00:00
Thomas Eizinger
244816d678 chore(telemetry): don't send sentry alerts in CI (#7383)
Sending Sentry alerts in CI unnecessarily consumes our quota.
2024-11-20 05:18:01 +00:00
Thomas Eizinger
b4ab569af3 feat(telemetry): include span fields in breadcrumb messages (#7379)
This switches our `sentry-tracing` dependency to a fork that includes
https://github.com/getsentry/sentry-rust/pull/708. Recording our span
fields with breadcrumbs is important to provide accurate context of the
message. Without the span fields, the messages give us a lot less
information.

Since the last release, the open issue on `flush` having a flipped
return value got fixed as well.
2024-11-19 18:39:45 +00:00
Thomas Eizinger
48ba2869a8 chore(rust): ban the use of .unwrap except in tests (#7319)
Using the clippy lint `unwrap_used`, we can automatically lint against
all uses of `.unwrap()` on `Result` and `Option`. This turns up quite a
few results actually. In most cases, they are invariants that can't
actually be hit. For these, we change them to `Option`. In other cases,
they can actually be hit. For example, if the user supplies an invalid
log-filter.

Activating this lint ensures the compiler will yell at us every time we
use `.unwrap` to double-check whether we do indeed want to panic here.

Resolves: #7292.
2024-11-13 03:59:22 +00:00
Thomas Eizinger
3e18fa8ca2 chore(telemetry): misc. clean-up (#7326)
Bundles together several minor improvements around telemetry:

- Removes the obsolete "Firezone" context: This is now included in the
user context as of #7310.
- Entirely encapsulates `sentry` within the `telemetry` module
- Concludes sessions that were not explicitly closed as "abnormal"
2024-11-13 00:16:44 +00:00
Thomas Eizinger
488c599d5b chore(telemetry): capture Firezone ID and account in user ctx (#7310)
Sentry has a feature called the "User context" which allows us to assign
events to individual users. This in turn will give us statistics in
Sentry, how many users are affected by a certain issue.

Unfortunately, Sentry's user context cannot be built-up step-by-step but
has to be set as a whole. To achieve this, we need to slightly refactor
`Telemetry` to not be `clone`d and instead passed around by mutable
reference.

Resolves: #7248.
Related: https://github.com/getsentry/sentry-rust/issues/706.
2024-11-11 19:50:14 +00:00
Thomas Eizinger
e261cb3c27 chore: remove git_version! (#7270)
Reading the Git version requires the entire Git repository to be
present, including all tags. The tags are only created _after_ the
artifact is being built, when we publish the release. Therefore, these
tags are never included in the actual released binary.

For Sentry, we use the `CARGO_PKG_VERSION` variable instead. This
doesn't tell us whether somebody built a client from source and then
used it so there could be some confusion in Sentry events. It is quite
unlikely that this happens though so for the majority of Sentry alerts,
this will give us the correct version.

For the Android client, we also depend on the `GITHUB_SHA` env variable
at compile-time. We do the same thing for the GUI client here.

Resolves: #6925.
2024-11-07 22:56:17 +00:00
Thomas Eizinger
47e45a3cf3 chore(telemetry): improve telemetry spans and events (#7206)
DNS resolution is a critical part of `connlib`. If it is slow for
whatever reason, users will notice this. To make sure we notice as well,
we add `telemetry` spans to the client's and gateway's DNS resolution.
For the client, this applies to all DNS queries that we forward to the
upstream servers. For the gateway, this applies to all DNS resources.

In addition to those IO operations, we also instrument the
`match_resource_linear` function. This function operates in `O(n)` of
all defined DNS resources. It _should_ be fast enough to not create an
impact but it can't hurt to measure this regardless.

Lastly, we also instrument `refresh_translations` on the gateway.
Refreshing the DNS resolution of a DNS resource should really only
happen, when the previous IP addresses become stale yet the user is
still trying to send traffic to them. We don't actually have any data on
how often that happens. By instrumenting it, we can gather some of this
data.

To make sure that none of these telemetry events and spans hurt the
end-user performance, we introduce macros to `firezone-logging` that
sample the creation of these events and spans at a rate of 1%. I ran a
flamegraph and none of these even showed up. The most critical one here
is probably the `match_resource_linear` span because it happens on every
DNS query.

Resolves: #7198.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2024-11-06 01:17:57 +00:00
dependabot[bot]
a2828a217b build(deps): Bump thiserror from 1.0.64 to 1.0.68 in /rust (#7260)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.64 to
1.0.68.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/dtolnay/thiserror/releases">thiserror's
releases</a>.</em></p>
<blockquote>
<h2>1.0.68</h2>
<ul>
<li>Handle incomplete expressions more robustly in format arguments,
such as while code is being typed (<a
href="https://redirect.github.com/dtolnay/thiserror/issues/341">#341</a>,
<a
href="https://redirect.github.com/dtolnay/thiserror/issues/344">#344</a>)</li>
</ul>
<h2>1.0.67</h2>
<ul>
<li>Improve expression syntax support inside format arguments (<a
href="https://redirect.github.com/dtolnay/thiserror/issues/335">#335</a>,
<a
href="https://redirect.github.com/dtolnay/thiserror/issues/337">#337</a>,
<a
href="https://redirect.github.com/dtolnay/thiserror/issues/339">#339</a>,
<a
href="https://redirect.github.com/dtolnay/thiserror/issues/340">#340</a>)</li>
</ul>
<h2>1.0.66</h2>
<ul>
<li>Improve compile error on malformed format attribute (<a
href="https://redirect.github.com/dtolnay/thiserror/issues/327">#327</a>)</li>
</ul>
<h2>1.0.65</h2>
<ul>
<li>Ensure OUT_DIR is left with deterministic contents after build
script execution (<a
href="https://redirect.github.com/dtolnay/thiserror/issues/325">#325</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8d06fb5549"><code>8d06fb5</code></a>
Release 1.0.68</li>
<li><a
href="372fd8a71a"><code>372fd8a</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/thiserror/issues/344">#344</a>
from dtolnay/binop</li>
<li><a
href="08f89925bf"><code>08f8992</code></a>
Disregard equality binop in fallback parser</li>
<li><a
href="d2a823d2ae"><code>d2a823d</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/thiserror/issues/343">#343</a>
from dtolnay/unnamed</li>
<li><a
href="b3bf7a6f69"><code>b3bf7a6</code></a>
Add logic to determine whether unnamed fmt arguments are present</li>
<li><a
href="490f9c017b"><code>490f9c0</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/thiserror/issues/342">#342</a>
from dtolnay/synfull</li>
<li><a
href="7daf1b169d"><code>7daf1b1</code></a>
Defer is_syn_full() call until first expression</li>
<li><a
href="c92ac9940b"><code>c92ac99</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/thiserror/issues/341">#341</a>
from dtolnay/parsescan</li>
<li><a
href="40a53f7f33"><code>40a53f7</code></a>
Interleave Expr parsing and scanning better</li>
<li><a
href="925f2dde77"><code>925f2dd</code></a>
Release 1.0.67</li>
<li>Additional commits viewable in <a
href="https://github.com/dtolnay/thiserror/compare/1.0.64...1.0.68">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=thiserror&package-manager=cargo&previous-version=1.0.64&new-version=1.0.68)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-04 19:06:15 +00:00
Thomas Eizinger
5564e578fe fix(telemetry): flush sentry.io events in dedicated task (#7205)
`sentry`'s transport layer appears to be using blocking IO for flushing
events. Performing blocking IO within a future that is running on a
worker-thread of tokio causes this operation to hang and eventually
time-out after 5 seconds. As a result, many events - especially traces -
don't get flushed to sentry when an app is being shut down.

To fix this, we make `Telemetry::stop` an `async fn` and offload the
flushing to a task on tokio's thread-pool for blocking IO.
2024-11-01 15:52:09 +00:00
Reactor Scram
51250faa0d chore(telemetry): make the firezone device ID a context not a tag (#7179)
Closes #7175 

Also fixes a bug with the initialization order of Tokio and Sentry.

Previously:
1. Start Tokio, executor threads inherit main thread context
2. Load device ID and set it on the main telemetry hub

Now:
1. Load device ID and set it on the main telemetry hub
2. Start Tokio, executor threads inherit main thread context

The context and possibly tags didn't seem to propagate from the main hub
if we set them after the worker threads spawned.

Based on this understanding, the IPC service process is still wrong, but
a fix will have to wait, because telemetry in the IPC service is more
complicated than in the GUI process.

<img width="818" alt="image"
src="https://github.com/user-attachments/assets/9c9efec8-fc55-4863-99eb-5fe9ba5b36fa">
2024-10-30 21:27:17 +00:00
Thomas Eizinger
e0d82eef27 fix(connlib): correctly categorise CI environment in Sentry (#7173) 2024-10-30 14:11:06 +00:00
Thomas Eizinger
7037830b19 chore(connlib): submit DEBUG events as breadcrumbs (#7177)
This should give us much more context for a particular error without
having to bother a customer with sending us the logs / digging for them
ourselves in our staging or production environment.

Resolves: #7176.
2024-10-29 23:39:07 +00:00
Thomas Eizinger
62c29705cf chore(connlib): sort DSNs alphabetically (#7178) 2024-10-29 20:37:15 +00:00
Thomas Eizinger
82fcad0a3b refactor(rust): only send telemetry spans to Sentry (#7153)
With the introduction of the `tracing-sentry` integration in #7105, we
started sending tracing spans to Sentry. By default, all spans with
level INFO and above get sampled at the configured rate and sent to
Sentry.

This results in a lot of useless transaction in Sentry because we use
INFO level spans in multiple places in connlib to attach contextual
information like the current connection ID.

This PR introduces the concept of `telemetry` spans which - similar to
the `telemetry` log target in #7147 - qualifies a span for being sent to
Sentry. By convention, these are also defined as requiring the TRACE
level. This ensures we won't ever see them as part of regular log
output.
2024-10-24 20:25:26 +00:00
Thomas Eizinger
5cf105f073 chore(android): start telemetry together with connlib session (#7151)
As a first step for integration Sentry into the Android app, we launch
the Sentry Rust agent as soon as a `connlib` session starts up. At a
later point, we can also integrate Sentry into the Android app itself
using the Java / Kotlin SDK.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2024-10-24 20:03:06 +00:00
Thomas Eizinger
c12a02e348 chore(apple): start telemetry together with connlib session (#7152)
This starts up telemetry together with each `connlib` session. At a
later point, we can also integrate the native Swift SDK into the MacOS /
iOS app to catch non-connlib specific problems.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2024-10-24 19:59:52 +00:00
Thomas Eizinger
5f91259d31 chore(rust): capture backtraces for panics (#7133)
Sentry by default has an integration to capture stacktraces for panics,
we just need to enable it. Here is what this looks like:
https://firezone-inc.sentry.io/issues/6013299023

Resolves: #7132.
2024-10-24 14:18:40 +00:00
Thomas Eizinger
45a36ea190 chore: categorise docker-compose env in Sentry (#7128)
Resolves: #7125.
2024-10-24 14:08:31 +00:00
Thomas Eizinger
990324b2ec chore(rust): enable sentry-tracing integration (#7105)
Using the `sentry-tracing` integration, we can automatically capture
events based on what we log via `tracing`. The mapping is defined as
follows:

- ERROR: Gets captured as a fatal error
- WARN: Gets captured as a message
- INFO: Gets captured as a breadcrumb
- `_`: Does not get captured at all

If telemetry isn't active / configured, this integration does nothing.
It is therefore safe to just always enable it.
2024-10-22 23:23:49 +00:00
Thomas Eizinger
b7b7626cfa feat(gateway): add error reporting via Sentry (#7103)
Similar to the GUI and headless clients, adding error reporting via
Sentry should give us much better insight into how well gateways are
performing.

Resolves: #7099.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-10-22 20:40:28 +00:00
Reactor Scram
4bfdf9b20b chore(rust/gui-client): report account slug to Sentry (#7097)
Closes #7087

<img width="375" alt="image"
src="https://github.com/user-attachments/assets/7fcf0f08-019c-4e48-9c1b-f038638ce930">
2024-10-22 17:17:47 +00:00
Reactor Scram
03c0325a0b chore(rust/telemetry): remove a test that is flaky when Sentry is down (#7079)
They had an incident today that caused our CI to start failing
https://status.sentry.io/incidents/rqfy3wcbkgl5
e.g.
https://github.com/firezone/firezone/actions/runs/11372533569/job/31637085363
2024-10-17 01:21:47 +00:00
Reactor Scram
41635937c7 chore(rust/gtk-client): fix missing icon (#6958)
Also implement `set_tray_icon`
2024-10-08 20:13:59 +00:00
Thomas Eizinger
02b0e1dc8d chore: don't report authentication errors to sentry (#6948)
Do we want to track 401s in sentry? If we see a lot of them, something
is likely wrong but I guess there is some level of 401s that users will
just run into.

Is there a way of marking these as "might not be a really bad error"?

---------

Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
2024-10-08 06:26:39 +00:00
Reactor Scram
b3d9cebe53 chore(rust/telemetry): add firezone ID (formerly device ID) to sentry as a tag (#6946)
This makes it easier to ignore random issues from my dev system.

Also added OS tag (`linux` or `windows`) since that doesn't seem to be a
default for Sentry.

```[tasklist]
- [ ] Bikeshed the name `firezone_id` since it'll be hard to change later
```

<img width="367" alt="image"
src="https://github.com/user-attachments/assets/2e936aea-5c36-4208-965a-c578ff8407b7">
2024-10-07 20:13:48 +00:00