Commit Graph

19 Commits

Author SHA1 Message Date
Reactor Scram
51250faa0d chore(telemetry): make the firezone device ID a context not a tag (#7179)
Closes #7175 

Also fixes a bug with the initialization order of Tokio and Sentry.

Previously:
1. Start Tokio, executor threads inherit main thread context
2. Load device ID and set it on the main telemetry hub

Now:
1. Load device ID and set it on the main telemetry hub
2. Start Tokio, executor threads inherit main thread context

The context and possibly tags didn't seem to propagate from the main hub
if we set them after the worker threads spawned.

Based on this understanding, the IPC service process is still wrong, but
a fix will have to wait, because telemetry in the IPC service is more
complicated than in the GUI process.

<img width="818" alt="image"
src="https://github.com/user-attachments/assets/9c9efec8-fc55-4863-99eb-5fe9ba5b36fa">
2024-10-30 21:27:17 +00:00
Thomas Eizinger
e0d82eef27 fix(connlib): correctly categorise CI environment in Sentry (#7173) 2024-10-30 14:11:06 +00:00
Thomas Eizinger
7037830b19 chore(connlib): submit DEBUG events as breadcrumbs (#7177)
This should give us much more context for a particular error without
having to bother a customer with sending us the logs / digging for them
ourselves in our staging or production environment.

Resolves: #7176.
2024-10-29 23:39:07 +00:00
Thomas Eizinger
62c29705cf chore(connlib): sort DSNs alphabetically (#7178) 2024-10-29 20:37:15 +00:00
Thomas Eizinger
82fcad0a3b refactor(rust): only send telemetry spans to Sentry (#7153)
With the introduction of the `tracing-sentry` integration in #7105, we
started sending tracing spans to Sentry. By default, all spans with
level INFO and above get sampled at the configured rate and sent to
Sentry.

This results in a lot of useless transaction in Sentry because we use
INFO level spans in multiple places in connlib to attach contextual
information like the current connection ID.

This PR introduces the concept of `telemetry` spans which - similar to
the `telemetry` log target in #7147 - qualifies a span for being sent to
Sentry. By convention, these are also defined as requiring the TRACE
level. This ensures we won't ever see them as part of regular log
output.
2024-10-24 20:25:26 +00:00
Thomas Eizinger
5cf105f073 chore(android): start telemetry together with connlib session (#7151)
As a first step for integration Sentry into the Android app, we launch
the Sentry Rust agent as soon as a `connlib` session starts up. At a
later point, we can also integrate Sentry into the Android app itself
using the Java / Kotlin SDK.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2024-10-24 20:03:06 +00:00
Thomas Eizinger
c12a02e348 chore(apple): start telemetry together with connlib session (#7152)
This starts up telemetry together with each `connlib` session. At a
later point, we can also integrate the native Swift SDK into the MacOS /
iOS app to catch non-connlib specific problems.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2024-10-24 19:59:52 +00:00
Thomas Eizinger
5f91259d31 chore(rust): capture backtraces for panics (#7133)
Sentry by default has an integration to capture stacktraces for panics,
we just need to enable it. Here is what this looks like:
https://firezone-inc.sentry.io/issues/6013299023

Resolves: #7132.
2024-10-24 14:18:40 +00:00
Thomas Eizinger
45a36ea190 chore: categorise docker-compose env in Sentry (#7128)
Resolves: #7125.
2024-10-24 14:08:31 +00:00
Thomas Eizinger
990324b2ec chore(rust): enable sentry-tracing integration (#7105)
Using the `sentry-tracing` integration, we can automatically capture
events based on what we log via `tracing`. The mapping is defined as
follows:

- ERROR: Gets captured as a fatal error
- WARN: Gets captured as a message
- INFO: Gets captured as a breadcrumb
- `_`: Does not get captured at all

If telemetry isn't active / configured, this integration does nothing.
It is therefore safe to just always enable it.
2024-10-22 23:23:49 +00:00
Thomas Eizinger
b7b7626cfa feat(gateway): add error reporting via Sentry (#7103)
Similar to the GUI and headless clients, adding error reporting via
Sentry should give us much better insight into how well gateways are
performing.

Resolves: #7099.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-10-22 20:40:28 +00:00
Reactor Scram
4bfdf9b20b chore(rust/gui-client): report account slug to Sentry (#7097)
Closes #7087

<img width="375" alt="image"
src="https://github.com/user-attachments/assets/7fcf0f08-019c-4e48-9c1b-f038638ce930">
2024-10-22 17:17:47 +00:00
Reactor Scram
03c0325a0b chore(rust/telemetry): remove a test that is flaky when Sentry is down (#7079)
They had an incident today that caused our CI to start failing
https://status.sentry.io/incidents/rqfy3wcbkgl5
e.g.
https://github.com/firezone/firezone/actions/runs/11372533569/job/31637085363
2024-10-17 01:21:47 +00:00
Reactor Scram
41635937c7 chore(rust/gtk-client): fix missing icon (#6958)
Also implement `set_tray_icon`
2024-10-08 20:13:59 +00:00
Thomas Eizinger
02b0e1dc8d chore: don't report authentication errors to sentry (#6948)
Do we want to track 401s in sentry? If we see a lot of them, something
is likely wrong but I guess there is some level of 401s that users will
just run into.

Is there a way of marking these as "might not be a really bad error"?

---------

Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
2024-10-08 06:26:39 +00:00
Reactor Scram
b3d9cebe53 chore(rust/telemetry): add firezone ID (formerly device ID) to sentry as a tag (#6946)
This makes it easier to ignore random issues from my dev system.

Also added OS tag (`linux` or `windows`) since that doesn't seem to be a
default for Sentry.

```[tasklist]
- [ ] Bikeshed the name `firezone_id` since it'll be hard to change later
```

<img width="367" alt="image"
src="https://github.com/user-attachments/assets/2e936aea-5c36-4208-965a-c578ff8407b7">
2024-10-07 20:13:48 +00:00
Reactor Scram
01530aa934 fix(rust/telemetry): report correct environment when missing trailing slash (#6929)
Closes #6928

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-10-07 14:35:43 +00:00
Reactor Scram
d2a8155ba7 fix(rust/client): set sentry release version and environment correctly (#6855)
Closes #6854 


- Sets release version from the GUI Client / Headless Client version
instead of the `firezone-telemetry` version
- Set environment to "production" and "staging" for well-known API URLs,
and "self-hosted" for others, since environments in Sentry can't have
slashes in them
- Sets API URL as a tag
- Sets release to `unit test` for unit testing `firezone-telemetry`
itself, since it has no good version number

<img width="398" alt="image"
src="https://github.com/user-attachments/assets/86f71193-2511-45c1-8304-413db8e5ef90">
2024-09-30 16:24:39 +00:00
Reactor Scram
05a2b28d9f feat(rust/gui-client): add sentry.io error reporting (#6782)
Refs #6138 

Sentry is always enabled for now. In the near future we'll make it
opt-out per device and opt-in per org (see #6138 for details)

- Replaces the `crash_handling` module
- Catches panics in GUI process, tunnel daemon, and Headless Client
- Added a couple "breadcrumbs" to play with that feature
- User ID is not set yet
- Environment is set to the API URL, e.g. `wss://api.firezone.dev`
- Reports panics from the connlib async task
- Release should be automatically pulled from the Cargo version which we
automatically set in the version Makefile

Example screenshot of sentry.io with a caught panic:

<img width="861" alt="image"
src="https://github.com/user-attachments/assets/c5188d86-10d0-4d94-b503-3fba51a21a90">
2024-09-27 16:34:54 +00:00