firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 18:18:55 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	9b8e4d1764	chore(telemetry): remove outdated comments (#7483 ) We are no longer using `ArcSwap` here.	2024-12-11 19:02:30 +00:00
Thomas Eizinger	dd6b52b236	chore(rust): share edition key via workspace table (#7451 )	2024-12-03 00:28:06 +00:00
Thomas Eizinger	8bc1277c24	fix(telemetry): include span attributes in breadcrumbs (#7421 ) This is another attempt at fixing #7386. Previous PR was #7379. The difference is, this time it works! In the following screenshot, `handle_input` is a currently active span. ![image](https://github.com/user-attachments/assets/0845d566-8ca7-4ba2-8786-9c5819cdfd48) I had to make some patches to Sentry, most notably: - https://github.com/getsentry/sentry-rust/pull/708 - https://github.com/getsentry/sentry-rust/pull/712 The way we configure Sentry is quite tricky: First and foremost, we need to understand that the `tracing` adapter for Sentry has a `span_filter` configuration. When a span gets filtered out there, the rest of `sentry-tracing` never sees the data in that span. Thus, in order to capture variables from spans, we need to have a fairly generous span filter. In this PR, we change this span filter to include all spans except those on TRACE level. Secondly, by default, the Sentry SDK doesn't send any spans to the backend, i.e. the sampling rate is 0. Previously, we set the sampling rate to 1.0 because the `span_filter` was already filtering out all non-telemetry spans. A telemetry span is a concept that we invented. It is a span that gets sampled at _creation_ time with a probability of 1%. This is useful because creating a lot of spans is also expensive, so we don't want to do it e.g. on a per-packet basis. With just these configuration options, we now have a problem: We don't want to submit all spans to Sentry but we need the `span_filter` to allow all spans otherwise we can't capture the contextual fields from the span in breadcrumbs. Luckily, the Sentry SDK has another configuration option: `traces_sampler`. The `traces_sampler` gets to compute a sampling rate for each individual span. This allows us to discard all spans from being sent to Sentry unless they are `telemetry` spans. Resolves: #7386.	2024-12-02 20:00:35 +00:00
Thomas Eizinger	973a806707	feat(relay): add Sentry crash reporting (#7406 ) In addition to monitoring clients and gateways, it is also useful to monitor relays in the same way. This gives us alerts on ERROR and WARN messages logged by the relay as well as panics.	2024-11-28 21:53:21 +00:00
Thomas Eizinger	2c26fc9c0e	ci: lint Rust dependencies using `cargo deny` (#7390 ) One of Rust's promises is "if it compiles, it works". However, there are certain situations in which this isn't true. In particular, when using dynamic typing patterns where trait objects are downcast to concrete types, having two versions of the same dependency can silently break things. This happened in #7379 where I forgot to patch a certain Sentry dependency. A similar problem exists with our `tracing-stackdriver` dependency (see #7241). Lastly, duplicate dependencies increase the compile-times of a project, so we should aim for having as few duplicate versions of a particular dependency as possible in our dependency graph. This PR introduces `cargo deny`, a linter for Rust dependencies. In addition to linting for duplicate dependencies, it also enforces that all dependencies are compatible with an allow-list of licenses and it warns when a dependency is referred to from multiple crates without introducing a workspace dependency. Thanks to existing tooling (https://github.com/mainmatter/cargo-autoinherit), transitioning all dependencies to workspace dependencies was quite easy. Resolves: #7241.	2024-11-22 00:17:28 +00:00
Thomas Eizinger	186c485280	revert: include span fields in breadcrumb messages (#7384 ) Reverts #7379. Unfortunately, this doesn't actually work because those fields are only recorded as part of spans that get sampled, see https://github.com/getsentry/sentry-rust/issues/617#issuecomment-2487058619. If we were to start recording all spans, we'd have a massive overhead and send lots of spans to Sentry.	2024-11-21 01:17:52 +00:00
Thomas Eizinger	244816d678	chore(telemetry): don't send sentry alerts in CI (#7383 ) Sending Sentry alerts in CI unnecessarily consumes our quota.	2024-11-20 05:18:01 +00:00
Thomas Eizinger	b4ab569af3	feat(telemetry): include span fields in breadcrumb messages (#7379 ) This switches our `sentry-tracing` dependency to a fork that includes https://github.com/getsentry/sentry-rust/pull/708. Recording our span fields with breadcrumbs is important to provide accurate context of the message. Without the span fields, the messages give us a lot less information. Since the last release, the open issue on `flush` having a flipped return value got fixed as well.	2024-11-19 18:39:45 +00:00
Thomas Eizinger	48ba2869a8	chore(rust): ban the use of `.unwrap` except in tests (#7319 ) Using the clippy lint `unwrap_used`, we can automatically lint against all uses of `.unwrap()` on `Result` and `Option`. This turns up quite a few results actually. In most cases, they are invariants that can't actually be hit. For these, we change them to `Option`. In other cases, they can actually be hit. For example, if the user supplies an invalid log-filter. Activating this lint ensures the compiler will yell at us every time we use `.unwrap` to double-check whether we do indeed want to panic here. Resolves: #7292.	2024-11-13 03:59:22 +00:00
Thomas Eizinger	3e18fa8ca2	chore(telemetry): misc. clean-up (#7326 ) Bundles together several minor improvements around telemetry: - Removes the obsolete "Firezone" context: This is now included in the user context as of #7310. - Entirely encapsulates `sentry` within the `telemetry` module - Concludes sessions that were not explicitly closed as "abnormal"	2024-11-13 00:16:44 +00:00
Thomas Eizinger	488c599d5b	chore(telemetry): capture Firezone ID and account in user ctx (#7310 ) Sentry has a feature called the "User context" which allows us to assign events to individual users. This in turn will give us statistics in Sentry, how many users are affected by a certain issue. Unfortunately, Sentry's user context cannot be built-up step-by-step but has to be set as a whole. To achieve this, we need to slightly refactor `Telemetry` to not be `clone`d and instead passed around by mutable reference. Resolves: #7248. Related: https://github.com/getsentry/sentry-rust/issues/706.	2024-11-11 19:50:14 +00:00
Thomas Eizinger	e261cb3c27	chore: remove `git_version!` (#7270 ) Reading the Git version requires the entire Git repository to be present, including all tags. The tags are only created _after_ the artifact is being built, when we publish the release. Therefore, these tags are never included in the actual released binary. For Sentry, we use the `CARGO_PKG_VERSION` variable instead. This doesn't tell us whether somebody built a client from source and then used it so there could be some confusion in Sentry events. It is quite unlikely that this happens though so for the majority of Sentry alerts, this will give us the correct version. For the Android client, we also depend on the `GITHUB_SHA` env variable at compile-time. We do the same thing for the GUI client here. Resolves: #6925.	2024-11-07 22:56:17 +00:00
Thomas Eizinger	47e45a3cf3	chore(telemetry): improve telemetry spans and events (#7206 ) DNS resolution is a critical part of `connlib`. If it is slow for whatever reason, users will notice this. To make sure we notice as well, we add `telemetry` spans to the client's and gateway's DNS resolution. For the client, this applies to all DNS queries that we forward to the upstream servers. For the gateway, this applies to all DNS resources. In addition to those IO operations, we also instrument the `match_resource_linear` function. This function operates in `O(n)` of all defined DNS resources. It _should_ be fast enough to not create an impact but it can't hurt to measure this regardless. Lastly, we also instrument `refresh_translations` on the gateway. Refreshing the DNS resolution of a DNS resource should really only happen, when the previous IP addresses become stale yet the user is still trying to send traffic to them. We don't actually have any data on how often that happens. By instrumenting it, we can gather some of this data. To make sure that none of these telemetry events and spans hurt the end-user performance, we introduce macros to `firezone-logging` that sample the creation of these events and spans at a rate of 1%. I ran a flamegraph and none of these even showed up. The most critical one here is probably the `match_resource_linear` span because it happens on every DNS query. Resolves: #7198. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2024-11-06 01:17:57 +00:00
dependabot[bot]	a2828a217b	build(deps): Bump thiserror from 1.0.64 to 1.0.68 in /rust (#7260 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.64 to 1.0.68. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/dtolnay/thiserror/releases">thiserror's releases</a>.</em></p> <blockquote> <h2>1.0.68</h2> <ul> <li>Handle incomplete expressions more robustly in format arguments, such as while code is being typed (<a href="https://redirect.github.com/dtolnay/thiserror/issues/341">#341</a>, <a href="https://redirect.github.com/dtolnay/thiserror/issues/344">#344</a>)</li> </ul> <h2>1.0.67</h2> <ul> <li>Improve expression syntax support inside format arguments (<a href="https://redirect.github.com/dtolnay/thiserror/issues/335">#335</a>, <a href="https://redirect.github.com/dtolnay/thiserror/issues/337">#337</a>, <a href="https://redirect.github.com/dtolnay/thiserror/issues/339">#339</a>, <a href="https://redirect.github.com/dtolnay/thiserror/issues/340">#340</a>)</li> </ul> <h2>1.0.66</h2> <ul> <li>Improve compile error on malformed format attribute (<a href="https://redirect.github.com/dtolnay/thiserror/issues/327">#327</a>)</li> </ul> <h2>1.0.65</h2> <ul> <li>Ensure OUT_DIR is left with deterministic contents after build script execution (<a href="https://redirect.github.com/dtolnay/thiserror/issues/325">#325</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`8d06fb5549`"><code>8d06fb5</code></a> Release 1.0.68</li> <li><a href="`372fd8a71a`"><code>372fd8a</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/thiserror/issues/344">#344</a> from dtolnay/binop</li> <li><a href="`08f89925bf`"><code>08f8992</code></a> Disregard equality binop in fallback parser</li> <li><a href="`d2a823d2ae`"><code>d2a823d</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/thiserror/issues/343">#343</a> from dtolnay/unnamed</li> <li><a href="`b3bf7a6f69`"><code>b3bf7a6</code></a> Add logic to determine whether unnamed fmt arguments are present</li> <li><a href="`490f9c017b`"><code>490f9c0</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/thiserror/issues/342">#342</a> from dtolnay/synfull</li> <li><a href="`7daf1b169d`"><code>7daf1b1</code></a> Defer is_syn_full() call until first expression</li> <li><a href="`c92ac9940b`"><code>c92ac99</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/thiserror/issues/341">#341</a> from dtolnay/parsescan</li> <li><a href="`40a53f7f33`"><code>40a53f7</code></a> Interleave Expr parsing and scanning better</li> <li><a href="`925f2dde77`"><code>925f2dd</code></a> Release 1.0.67</li> <li>Additional commits viewable in <a href="https://github.com/dtolnay/thiserror/compare/1.0.64...1.0.68">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=thiserror&package-manager=cargo&previous-version=1.0.64&new-version=1.0.68)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-11-04 19:06:15 +00:00
Thomas Eizinger	5564e578fe	fix(telemetry): flush sentry.io events in dedicated task (#7205 ) `sentry`'s transport layer appears to be using blocking IO for flushing events. Performing blocking IO within a future that is running on a worker-thread of tokio causes this operation to hang and eventually time-out after 5 seconds. As a result, many events - especially traces - don't get flushed to sentry when an app is being shut down. To fix this, we make `Telemetry::stop` an `async fn` and offload the flushing to a task on tokio's thread-pool for blocking IO.	2024-11-01 15:52:09 +00:00
Reactor Scram	51250faa0d	chore(telemetry): make the firezone device ID a context not a tag (#7179 ) Closes #7175 Also fixes a bug with the initialization order of Tokio and Sentry. Previously: 1. Start Tokio, executor threads inherit main thread context 2. Load device ID and set it on the main telemetry hub Now: 1. Load device ID and set it on the main telemetry hub 2. Start Tokio, executor threads inherit main thread context The context and possibly tags didn't seem to propagate from the main hub if we set them after the worker threads spawned. Based on this understanding, the IPC service process is still wrong, but a fix will have to wait, because telemetry in the IPC service is more complicated than in the GUI process. <img width="818" alt="image" src="https://github.com/user-attachments/assets/9c9efec8-fc55-4863-99eb-5fe9ba5b36fa">	2024-10-30 21:27:17 +00:00
Thomas Eizinger	e0d82eef27	fix(connlib): correctly categorise CI environment in Sentry (#7173 )	2024-10-30 14:11:06 +00:00
Thomas Eizinger	7037830b19	chore(connlib): submit DEBUG events as breadcrumbs (#7177 ) This should give us much more context for a particular error without having to bother a customer with sending us the logs / digging for them ourselves in our staging or production environment. Resolves: #7176.	2024-10-29 23:39:07 +00:00
Thomas Eizinger	62c29705cf	chore(connlib): sort DSNs alphabetically (#7178 )	2024-10-29 20:37:15 +00:00
Thomas Eizinger	82fcad0a3b	refactor(rust): only send `telemetry` spans to Sentry (#7153 ) With the introduction of the `tracing-sentry` integration in #7105, we started sending tracing spans to Sentry. By default, all spans with level INFO and above get sampled at the configured rate and sent to Sentry. This results in a lot of useless transaction in Sentry because we use INFO level spans in multiple places in connlib to attach contextual information like the current connection ID. This PR introduces the concept of `telemetry` spans which - similar to the `telemetry` log target in #7147 - qualifies a span for being sent to Sentry. By convention, these are also defined as requiring the TRACE level. This ensures we won't ever see them as part of regular log output.	2024-10-24 20:25:26 +00:00
Thomas Eizinger	5cf105f073	chore(android): start telemetry together with connlib session (#7151 ) As a first step for integration Sentry into the Android app, we launch the Sentry Rust agent as soon as a `connlib` session starts up. At a later point, we can also integrate Sentry into the Android app itself using the Java / Kotlin SDK. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2024-10-24 20:03:06 +00:00
Thomas Eizinger	c12a02e348	chore(apple): start telemetry together with connlib session (#7152 ) This starts up telemetry together with each `connlib` session. At a later point, we can also integrate the native Swift SDK into the MacOS / iOS app to catch non-connlib specific problems. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2024-10-24 19:59:52 +00:00
Thomas Eizinger	5f91259d31	chore(rust): capture backtraces for panics (#7133 ) Sentry by default has an integration to capture stacktraces for panics, we just need to enable it. Here is what this looks like: https://firezone-inc.sentry.io/issues/6013299023 Resolves: #7132.	2024-10-24 14:18:40 +00:00
Thomas Eizinger	45a36ea190	chore: categorise docker-compose env in Sentry (#7128 ) Resolves: #7125.	2024-10-24 14:08:31 +00:00
Thomas Eizinger	990324b2ec	chore(rust): enable `sentry-tracing` integration (#7105 ) Using the `sentry-tracing` integration, we can automatically capture events based on what we log via `tracing`. The mapping is defined as follows: - ERROR: Gets captured as a fatal error - WARN: Gets captured as a message - INFO: Gets captured as a breadcrumb - `_`: Does not get captured at all If telemetry isn't active / configured, this integration does nothing. It is therefore safe to just always enable it.	2024-10-22 23:23:49 +00:00
Thomas Eizinger	b7b7626cfa	feat(gateway): add error reporting via Sentry (#7103 ) Similar to the GUI and headless clients, adding error reporting via Sentry should give us much better insight into how well gateways are performing. Resolves: #7099. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-10-22 20:40:28 +00:00
Reactor Scram	4bfdf9b20b	chore(rust/gui-client): report account slug to Sentry (#7097 ) Closes #7087 <img width="375" alt="image" src="https://github.com/user-attachments/assets/7fcf0f08-019c-4e48-9c1b-f038638ce930">	2024-10-22 17:17:47 +00:00
Reactor Scram	03c0325a0b	chore(rust/telemetry): remove a test that is flaky when Sentry is down (#7079 ) They had an incident today that caused our CI to start failing https://status.sentry.io/incidents/rqfy3wcbkgl5 e.g. https://github.com/firezone/firezone/actions/runs/11372533569/job/31637085363	2024-10-17 01:21:47 +00:00
Reactor Scram	41635937c7	chore(rust/gtk-client): fix missing icon (#6958 ) Also implement `set_tray_icon`	2024-10-08 20:13:59 +00:00
Thomas Eizinger	02b0e1dc8d	chore: don't report authentication errors to sentry (#6948 ) Do we want to track 401s in sentry? If we see a lot of them, something is likely wrong but I guess there is some level of 401s that users will just run into. Is there a way of marking these as "might not be a really bad error"? --------- Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>	2024-10-08 06:26:39 +00:00
Reactor Scram	b3d9cebe53	chore(rust/telemetry): add firezone ID (formerly device ID) to sentry as a tag (#6946 ) This makes it easier to ignore random issues from my dev system. Also added OS tag (`linux` or `windows`) since that doesn't seem to be a default for Sentry. ```[tasklist] - [ ] Bikeshed the name `firezone_id` since it'll be hard to change later ``` <img width="367" alt="image" src="https://github.com/user-attachments/assets/2e936aea-5c36-4208-965a-c578ff8407b7">	2024-10-07 20:13:48 +00:00
Reactor Scram	01530aa934	fix(rust/telemetry): report correct environment when missing trailing slash (#6929 ) Closes #6928 --------- Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-10-07 14:35:43 +00:00
Reactor Scram	d2a8155ba7	fix(rust/client): set sentry release version and environment correctly (#6855 ) Closes #6854 - Sets release version from the GUI Client / Headless Client version instead of the `firezone-telemetry` version - Set environment to "production" and "staging" for well-known API URLs, and "self-hosted" for others, since environments in Sentry can't have slashes in them - Sets API URL as a tag - Sets release to `unit test` for unit testing `firezone-telemetry` itself, since it has no good version number <img width="398" alt="image" src="https://github.com/user-attachments/assets/86f71193-2511-45c1-8304-413db8e5ef90">	2024-09-30 16:24:39 +00:00
Reactor Scram	05a2b28d9f	feat(rust/gui-client): add sentry.io error reporting (#6782 ) Refs #6138 Sentry is always enabled for now. In the near future we'll make it opt-out per device and opt-in per org (see #6138 for details) - Replaces the `crash_handling` module - Catches panics in GUI process, tunnel daemon, and Headless Client - Added a couple "breadcrumbs" to play with that feature - User ID is not set yet - Environment is set to the API URL, e.g. `wss://api.firezone.dev` - Reports panics from the connlib async task - Release should be automatically pulled from the Cargo version which we automatically set in the version Makefile Example screenshot of sentry.io with a caught panic: <img width="861" alt="image" src="https://github.com/user-attachments/assets/c5188d86-10d0-4d94-b503-3fba51a21a90">	2024-09-27 16:34:54 +00:00

34 Commits