Closes#7175
Also fixes a bug with the initialization order of Tokio and Sentry.
Previously:
1. Start Tokio, executor threads inherit main thread context
2. Load device ID and set it on the main telemetry hub
Now:
1. Load device ID and set it on the main telemetry hub
2. Start Tokio, executor threads inherit main thread context
The context and possibly tags didn't seem to propagate from the main hub
if we set them after the worker threads spawned.
Based on this understanding, the IPC service process is still wrong, but
a fix will have to wait, because telemetry in the IPC service is more
complicated than in the GUI process.
<img width="818" alt="image"
src="https://github.com/user-attachments/assets/9c9efec8-fc55-4863-99eb-5fe9ba5b36fa">
This should give us much more context for a particular error without
having to bother a customer with sending us the logs / digging for them
ourselves in our staging or production environment.
Resolves: #7176.
With the introduction of the `tracing-sentry` integration in #7105, we
started sending tracing spans to Sentry. By default, all spans with
level INFO and above get sampled at the configured rate and sent to
Sentry.
This results in a lot of useless transaction in Sentry because we use
INFO level spans in multiple places in connlib to attach contextual
information like the current connection ID.
This PR introduces the concept of `telemetry` spans which - similar to
the `telemetry` log target in #7147 - qualifies a span for being sent to
Sentry. By convention, these are also defined as requiring the TRACE
level. This ensures we won't ever see them as part of regular log
output.
As a first step for integration Sentry into the Android app, we launch
the Sentry Rust agent as soon as a `connlib` session starts up. At a
later point, we can also integrate Sentry into the Android app itself
using the Java / Kotlin SDK.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
This starts up telemetry together with each `connlib` session. At a
later point, we can also integrate the native Swift SDK into the MacOS /
iOS app to catch non-connlib specific problems.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Using the `sentry-tracing` integration, we can automatically capture
events based on what we log via `tracing`. The mapping is defined as
follows:
- ERROR: Gets captured as a fatal error
- WARN: Gets captured as a message
- INFO: Gets captured as a breadcrumb
- `_`: Does not get captured at all
If telemetry isn't active / configured, this integration does nothing.
It is therefore safe to just always enable it.
Similar to the GUI and headless clients, adding error reporting via
Sentry should give us much better insight into how well gateways are
performing.
Resolves: #7099.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Do we want to track 401s in sentry? If we see a lot of them, something
is likely wrong but I guess there is some level of 401s that users will
just run into.
Is there a way of marking these as "might not be a really bad error"?
---------
Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
This makes it easier to ignore random issues from my dev system.
Also added OS tag (`linux` or `windows`) since that doesn't seem to be a
default for Sentry.
```[tasklist]
- [ ] Bikeshed the name `firezone_id` since it'll be hard to change later
```
<img width="367" alt="image"
src="https://github.com/user-attachments/assets/2e936aea-5c36-4208-965a-c578ff8407b7">
Closes#6854
- Sets release version from the GUI Client / Headless Client version
instead of the `firezone-telemetry` version
- Set environment to "production" and "staging" for well-known API URLs,
and "self-hosted" for others, since environments in Sentry can't have
slashes in them
- Sets API URL as a tag
- Sets release to `unit test` for unit testing `firezone-telemetry`
itself, since it has no good version number
<img width="398" alt="image"
src="https://github.com/user-attachments/assets/86f71193-2511-45c1-8304-413db8e5ef90">
Refs #6138
Sentry is always enabled for now. In the near future we'll make it
opt-out per device and opt-in per org (see #6138 for details)
- Replaces the `crash_handling` module
- Catches panics in GUI process, tunnel daemon, and Headless Client
- Added a couple "breadcrumbs" to play with that feature
- User ID is not set yet
- Environment is set to the API URL, e.g. `wss://api.firezone.dev`
- Reports panics from the connlib async task
- Release should be automatically pulled from the Cargo version which we
automatically set in the version Makefile
Example screenshot of sentry.io with a caught panic:
<img width="861" alt="image"
src="https://github.com/user-attachments/assets/c5188d86-10d0-4d94-b503-3fba51a21a90">