Commit Graph

93 Commits

Author SHA1 Message Date
Thomas Eizinger
3022c019e1 chore(connlib): set user.account_slug for Sentry logs (#10815)
By default, the Sentry SDK doesn't include custom user attributes when
it sends logs. To make viewing logs easier, we add the `account_slug`
attribute to all logs that are posted to Sentry.
2025-11-10 04:08:45 +00:00
Thomas Eizinger
f4216710e0 fix(telemetry): don't append duplicate attributes in Sentry log (#10819)
When we are building the log message that is sent to Sentry, we append
several attributes to mimic the formatting that we get from
`tracing_subscriber::fmt`. To do that, we strip the span name from the
attribute which can result in us processing the same attribute such as
`cid` twice: Once from a span and once from the actual log message. In
order to not append the same message twice, we check for its presence in
the attributes map first.

This avoids having message in Sentry such as:

```
Sampled relay cid=c18e1da8-8ef8-4e11-a325-28d6b387d503 rid=3af15c76-9e84-46a6-90e1-63ecb2bc9f80 cid=c18e1da8-8ef8-4e11-a325-28d6b387d503
```
2025-11-10 01:42:01 +00:00
dependabot[bot]
1ac1bb044a build(deps): bump the sentry group in /rust with 2 updates (#10727)
Bumps the sentry group in /rust with 2 updates:
[sentry](https://github.com/getsentry/sentry-rust) and
[sentry-tracing](https://github.com/getsentry/sentry-rust).

Updates `sentry` from 0.42.0 to 0.43.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/getsentry/sentry-rust/releases">sentry's
releases</a>.</em></p>
<blockquote>
<h2>0.43.0</h2>
<h3>Breaking changes</h3>
<ul>
<li>ref(tracing): rework tracing to Sentry span name/op conversion (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/887">#887</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>The <code>tracing</code> integration now uses the tracing span name
as the Sentry span name by default.</li>
<li>Before this change, the span name would be set based on the
<code>tracing</code> span target
(<code>&lt;module&gt;::&lt;function&gt;</code> when using the
<code>tracing::instrument</code> macro).</li>
<li>The <code>tracing</code> integration now uses <code>&lt;span
target&gt;::&lt;span name&gt;</code> as the default Sentry span op (i.e.
<code>&lt;module&gt;::&lt;function&gt;</code> when using
<code>tracing::instrument</code>).</li>
<li>Before this change, the span op would be set based on the
<code>tracing</code> span name.</li>
<li>Read below to learn how to customize the span name and op.</li>
<li>When upgrading, please ensure to adapt any queries, metrics or
dashboards to use the new span names/ops.</li>
</ul>
</li>
<li>ref(tracing): use standard code attributes (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/899">#899</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>Logs now carry the attributes <code>code.module.name</code>,
<code>code.file.path</code> and <code>code.line.number</code>
standardized in OTEL to surface the respective information, in contrast
with the previously sent <code>tracing.module_path</code>,
<code>tracing.file</code> and <code>tracing.line</code>.</li>
</ul>
</li>
<li>fix(actix): capture only server errors (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/877">#877</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>The Actix integration now properly honors the
<code>capture_server_errors</code> option (enabled by default),
capturing errors returned by middleware only if they are server errors
(HTTP status code 5xx).</li>
<li>Previously, if a middleware were to process the request after the
Sentry middleware and return an error, our middleware would always
capture it and send it to Sentry, regardless if it was a client, server
or some other kind of error.</li>
<li>With this change, we capture errors returned by middleware only if
those errors can be classified as server errors.</li>
<li>There is no change in behavior when it comes to errors returned by
services, in which case the Sentry middleware only captures server
errors exclusively.</li>
</ul>
</li>
<li>fix: send trace origin correctly (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/906">#906</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li><code>TraceContext</code> now has an additional field
<code>origin</code>, used to report which integration created a
transaction.</li>
</ul>
</li>
</ul>
<h3>Behavioral changes</h3>
<ul>
<li>feat(tracing): send both breadcrumbs and logs by default (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/878">#878</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>If the <code>logs</code> feature flag is enabled, and
<code>enable_logs: true</code> is set on your client options, the
default Sentry <code>tracing</code> layer now sends logs for all events
at or above INFO.</li>
</ul>
</li>
</ul>
<h3>Features</h3>
<ul>
<li>
<p>ref(tracing): rework tracing to Sentry span name/op conversion (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/887">#887</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a></p>
<ul>
<li>Additional special fields have been added that allow overriding
certain data on the Sentry span:
<ul>
<li><code>sentry.op</code>: override the Sentry span op.</li>
<li><code>sentry.name</code>: override the Sentry span name.</li>
<li><code>sentry.trace</code>: given a string matching a valid
<code>sentry-trace</code> header (sent automatically by client SDKs),
continues the distributed trace instead of starting a new one. If the
value is not a valid <code>sentry-trace</code> header or a trace is
already started, this value is ignored.</li>
</ul>
</li>
<li><code>sentry.op</code> and <code>sentry.name</code> can also be
applied retroactively by declaring fields with value
<code>tracing::field::Empty</code> and then recorded using
<code>tracing::Span::record</code>.</li>
<li>Example usage:
<pre lang="rust"><code>#[tracing::instrument(skip_all, fields(
    sentry.op = &quot;http.server&quot;,
    sentry.name = &quot;GET /payments&quot;,
sentry.trace =
headers.get(&quot;sentry-trace&quot;).unwrap_or(&amp;&quot;&quot;.to_owned()),
))]
async fn handle_request(headers: std::collections::HashMap&lt;String,
String&gt;) {
    // ...
}
</code></pre>
</li>
<li>Additional attributes are sent along with each span by default:
<ul>
<li><code>sentry.tracing.target</code>: corresponds to the
<code>tracing</code> span's <code>metadata.target()</code></li>
<li><code>code.module.name</code>, <code>code.file.path</code>,
<code>code.line.number</code></li>
</ul>
</li>
</ul>
</li>
<li>
<p>feat(core): add Response context (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/874">#874</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a></p>
<ul>
<li>The <code>Response</code> context can now be attached to events, to
include information about HTTP responses such as headers, cookies and
status code.</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/getsentry/sentry-rust/blob/master/CHANGELOG.md">sentry's
changelog</a>.</em></p>
<blockquote>
<h2>0.43.0</h2>
<h3>Breaking changes</h3>
<ul>
<li>ref(tracing): rework tracing to Sentry span name/op conversion (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/887">#887</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>The <code>tracing</code> integration now uses the tracing span name
as the Sentry span name by default.</li>
<li>Before this change, the span name would be set based on the
<code>tracing</code> span target
(<code>&lt;module&gt;::&lt;function&gt;</code> when using the
<code>tracing::instrument</code> macro).</li>
<li>The <code>tracing</code> integration now uses <code>&lt;span
target&gt;::&lt;span name&gt;</code> as the default Sentry span op (i.e.
<code>&lt;module&gt;::&lt;function&gt;</code> when using
<code>tracing::instrument</code>).</li>
<li>Before this change, the span op would be set based on the
<code>tracing</code> span name.</li>
<li>Read below to learn how to customize the span name and op.</li>
<li>When upgrading, please ensure to adapt any queries, metrics or
dashboards to use the new span names/ops.</li>
</ul>
</li>
<li>ref(tracing): use standard code attributes (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/899">#899</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>Logs now carry the attributes <code>code.module.name</code>,
<code>code.file.path</code> and <code>code.line.number</code>
standardized in OTEL to surface the respective information, in contrast
with the previously sent <code>tracing.module_path</code>,
<code>tracing.file</code> and <code>tracing.line</code>.</li>
</ul>
</li>
<li>fix(actix): capture only server errors (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/877">#877</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>The Actix integration now properly honors the
<code>capture_server_errors</code> option (enabled by default),
capturing errors returned by middleware only if they are server errors
(HTTP status code 5xx).</li>
<li>Previously, if a middleware were to process the request after the
Sentry middleware and return an error, our middleware would always
capture it and send it to Sentry, regardless if it was a client, server
or some other kind of error.</li>
<li>With this change, we capture errors returned by middleware only if
those errors can be classified as server errors.</li>
<li>There is no change in behavior when it comes to errors returned by
services, in which case the Sentry middleware only captures server
errors exclusively.</li>
</ul>
</li>
<li>fix: send trace origin correctly (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/906">#906</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li><code>TraceContext</code> now has an additional field
<code>origin</code>, used to report which integration created a
transaction.</li>
</ul>
</li>
</ul>
<h3>Behavioral changes</h3>
<ul>
<li>feat(tracing): send both breadcrumbs and logs by default (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/878">#878</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>If the <code>logs</code> feature flag is enabled, and
<code>enable_logs: true</code> is set on your client options, the
default Sentry <code>tracing</code> layer now sends logs for all events
at or above INFO.</li>
</ul>
</li>
</ul>
<h3>Features</h3>
<ul>
<li>
<p>ref(tracing): rework tracing to Sentry span name/op conversion (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/887">#887</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a></p>
<ul>
<li>Additional special fields have been added that allow overriding
certain data on the Sentry span:
<ul>
<li><code>sentry.op</code>: override the Sentry span op.</li>
<li><code>sentry.name</code>: override the Sentry span name.</li>
<li><code>sentry.trace</code>: given a string matching a valid
<code>sentry-trace</code> header (sent automatically by client SDKs),
continues the distributed trace instead of starting a new one. If the
value is not a valid <code>sentry-trace</code> header or a trace is
already started, this value is ignored.</li>
</ul>
</li>
<li><code>sentry.op</code> and <code>sentry.name</code> can also be
applied retroactively by declaring fields with value
<code>tracing::field::Empty</code> and then recorded using
<code>tracing::Span::record</code>.</li>
<li>Example usage:
<pre lang="rust"><code>#[tracing::instrument(skip_all, fields(
    sentry.op = &quot;http.server&quot;,
    sentry.name = &quot;GET /payments&quot;,
sentry.trace =
headers.get(&quot;sentry-trace&quot;).unwrap_or(&amp;&quot;&quot;.to_owned()),
))]
async fn handle_request(headers: std::collections::HashMap&lt;String,
String&gt;) {
    // ...
}
</code></pre>
</li>
<li>Additional attributes are sent along with each span by default:
<ul>
<li><code>sentry.tracing.target</code>: corresponds to the
<code>tracing</code> span's <code>metadata.target()</code></li>
<li><code>code.module.name</code>, <code>code.file.path</code>,
<code>code.line.number</code></li>
</ul>
</li>
</ul>
</li>
<li>
<p>feat(core): add Response context (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/874">#874</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a></p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b08b24a057"><code>b08b24a</code></a>
release: 0.43.0</li>
<li><a
href="1c08ca8671"><code>1c08ca8</code></a>
ref(tracing): keep old span name as op instead of default (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/905">#905</a>)</li>
<li><a
href="75aff83c65"><code>75aff83</code></a>
fix(tracing): skip default span attributes when propagating to event (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/904">#904</a>)</li>
<li><a
href="6b61b31367"><code>6b61b31</code></a>
fix: send trace origin correctly (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/906">#906</a>)</li>
<li><a
href="75a8c03de7"><code>75a8c03</code></a>
ref(tracing): use standard code attributes (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/899">#899</a>)</li>
<li><a
href="bbd667ab00"><code>bbd667a</code></a>
meta: add pull request template (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/898">#898</a>)</li>
<li><a
href="5c8ab31b61"><code>5c8ab31</code></a>
ref(tracing): rework <code>tracing</code> to Sentry span name/op
conversion (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/887">#887</a>)</li>
<li><a
href="045c2e2fed"><code>045c2e2</code></a>
feat(tracing): send both breadcrumbs and logs by default (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/878">#878</a>)</li>
<li><a
href="a5932c0295"><code>a5932c0</code></a>
fix(transport): add rate limit for logs (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/894">#894</a>)</li>
<li><a
href="280dab99be"><code>280dab9</code></a>
build(deps): bump tracing-subscriber from 0.3.19 to 0.3.20 (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/891">#891</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/getsentry/sentry-rust/compare/0.42.0...0.43.0">compare
view</a></li>
</ul>
</details>
<br />

Updates `sentry-tracing` from 0.42.0 to 0.43.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/getsentry/sentry-rust/releases">sentry-tracing's
releases</a>.</em></p>
<blockquote>
<h2>0.43.0</h2>
<h3>Breaking changes</h3>
<ul>
<li>ref(tracing): rework tracing to Sentry span name/op conversion (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/887">#887</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>The <code>tracing</code> integration now uses the tracing span name
as the Sentry span name by default.</li>
<li>Before this change, the span name would be set based on the
<code>tracing</code> span target
(<code>&lt;module&gt;::&lt;function&gt;</code> when using the
<code>tracing::instrument</code> macro).</li>
<li>The <code>tracing</code> integration now uses <code>&lt;span
target&gt;::&lt;span name&gt;</code> as the default Sentry span op (i.e.
<code>&lt;module&gt;::&lt;function&gt;</code> when using
<code>tracing::instrument</code>).</li>
<li>Before this change, the span op would be set based on the
<code>tracing</code> span name.</li>
<li>Read below to learn how to customize the span name and op.</li>
<li>When upgrading, please ensure to adapt any queries, metrics or
dashboards to use the new span names/ops.</li>
</ul>
</li>
<li>ref(tracing): use standard code attributes (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/899">#899</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>Logs now carry the attributes <code>code.module.name</code>,
<code>code.file.path</code> and <code>code.line.number</code>
standardized in OTEL to surface the respective information, in contrast
with the previously sent <code>tracing.module_path</code>,
<code>tracing.file</code> and <code>tracing.line</code>.</li>
</ul>
</li>
<li>fix(actix): capture only server errors (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/877">#877</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>The Actix integration now properly honors the
<code>capture_server_errors</code> option (enabled by default),
capturing errors returned by middleware only if they are server errors
(HTTP status code 5xx).</li>
<li>Previously, if a middleware were to process the request after the
Sentry middleware and return an error, our middleware would always
capture it and send it to Sentry, regardless if it was a client, server
or some other kind of error.</li>
<li>With this change, we capture errors returned by middleware only if
those errors can be classified as server errors.</li>
<li>There is no change in behavior when it comes to errors returned by
services, in which case the Sentry middleware only captures server
errors exclusively.</li>
</ul>
</li>
<li>fix: send trace origin correctly (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/906">#906</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li><code>TraceContext</code> now has an additional field
<code>origin</code>, used to report which integration created a
transaction.</li>
</ul>
</li>
</ul>
<h3>Behavioral changes</h3>
<ul>
<li>feat(tracing): send both breadcrumbs and logs by default (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/878">#878</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>If the <code>logs</code> feature flag is enabled, and
<code>enable_logs: true</code> is set on your client options, the
default Sentry <code>tracing</code> layer now sends logs for all events
at or above INFO.</li>
</ul>
</li>
</ul>
<h3>Features</h3>
<ul>
<li>
<p>ref(tracing): rework tracing to Sentry span name/op conversion (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/887">#887</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a></p>
<ul>
<li>Additional special fields have been added that allow overriding
certain data on the Sentry span:
<ul>
<li><code>sentry.op</code>: override the Sentry span op.</li>
<li><code>sentry.name</code>: override the Sentry span name.</li>
<li><code>sentry.trace</code>: given a string matching a valid
<code>sentry-trace</code> header (sent automatically by client SDKs),
continues the distributed trace instead of starting a new one. If the
value is not a valid <code>sentry-trace</code> header or a trace is
already started, this value is ignored.</li>
</ul>
</li>
<li><code>sentry.op</code> and <code>sentry.name</code> can also be
applied retroactively by declaring fields with value
<code>tracing::field::Empty</code> and then recorded using
<code>tracing::Span::record</code>.</li>
<li>Example usage:
<pre lang="rust"><code>#[tracing::instrument(skip_all, fields(
    sentry.op = &quot;http.server&quot;,
    sentry.name = &quot;GET /payments&quot;,
sentry.trace =
headers.get(&quot;sentry-trace&quot;).unwrap_or(&amp;&quot;&quot;.to_owned()),
))]
async fn handle_request(headers: std::collections::HashMap&lt;String,
String&gt;) {
    // ...
}
</code></pre>
</li>
<li>Additional attributes are sent along with each span by default:
<ul>
<li><code>sentry.tracing.target</code>: corresponds to the
<code>tracing</code> span's <code>metadata.target()</code></li>
<li><code>code.module.name</code>, <code>code.file.path</code>,
<code>code.line.number</code></li>
</ul>
</li>
</ul>
</li>
<li>
<p>feat(core): add Response context (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/874">#874</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a></p>
<ul>
<li>The <code>Response</code> context can now be attached to events, to
include information about HTTP responses such as headers, cookies and
status code.</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/getsentry/sentry-rust/blob/master/CHANGELOG.md">sentry-tracing's
changelog</a>.</em></p>
<blockquote>
<h2>0.43.0</h2>
<h3>Breaking changes</h3>
<ul>
<li>ref(tracing): rework tracing to Sentry span name/op conversion (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/887">#887</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>The <code>tracing</code> integration now uses the tracing span name
as the Sentry span name by default.</li>
<li>Before this change, the span name would be set based on the
<code>tracing</code> span target
(<code>&lt;module&gt;::&lt;function&gt;</code> when using the
<code>tracing::instrument</code> macro).</li>
<li>The <code>tracing</code> integration now uses <code>&lt;span
target&gt;::&lt;span name&gt;</code> as the default Sentry span op (i.e.
<code>&lt;module&gt;::&lt;function&gt;</code> when using
<code>tracing::instrument</code>).</li>
<li>Before this change, the span op would be set based on the
<code>tracing</code> span name.</li>
<li>Read below to learn how to customize the span name and op.</li>
<li>When upgrading, please ensure to adapt any queries, metrics or
dashboards to use the new span names/ops.</li>
</ul>
</li>
<li>ref(tracing): use standard code attributes (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/899">#899</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>Logs now carry the attributes <code>code.module.name</code>,
<code>code.file.path</code> and <code>code.line.number</code>
standardized in OTEL to surface the respective information, in contrast
with the previously sent <code>tracing.module_path</code>,
<code>tracing.file</code> and <code>tracing.line</code>.</li>
</ul>
</li>
<li>fix(actix): capture only server errors (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/877">#877</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>The Actix integration now properly honors the
<code>capture_server_errors</code> option (enabled by default),
capturing errors returned by middleware only if they are server errors
(HTTP status code 5xx).</li>
<li>Previously, if a middleware were to process the request after the
Sentry middleware and return an error, our middleware would always
capture it and send it to Sentry, regardless if it was a client, server
or some other kind of error.</li>
<li>With this change, we capture errors returned by middleware only if
those errors can be classified as server errors.</li>
<li>There is no change in behavior when it comes to errors returned by
services, in which case the Sentry middleware only captures server
errors exclusively.</li>
</ul>
</li>
<li>fix: send trace origin correctly (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/906">#906</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li><code>TraceContext</code> now has an additional field
<code>origin</code>, used to report which integration created a
transaction.</li>
</ul>
</li>
</ul>
<h3>Behavioral changes</h3>
<ul>
<li>feat(tracing): send both breadcrumbs and logs by default (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/878">#878</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a>
<ul>
<li>If the <code>logs</code> feature flag is enabled, and
<code>enable_logs: true</code> is set on your client options, the
default Sentry <code>tracing</code> layer now sends logs for all events
at or above INFO.</li>
</ul>
</li>
</ul>
<h3>Features</h3>
<ul>
<li>
<p>ref(tracing): rework tracing to Sentry span name/op conversion (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/887">#887</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a></p>
<ul>
<li>Additional special fields have been added that allow overriding
certain data on the Sentry span:
<ul>
<li><code>sentry.op</code>: override the Sentry span op.</li>
<li><code>sentry.name</code>: override the Sentry span name.</li>
<li><code>sentry.trace</code>: given a string matching a valid
<code>sentry-trace</code> header (sent automatically by client SDKs),
continues the distributed trace instead of starting a new one. If the
value is not a valid <code>sentry-trace</code> header or a trace is
already started, this value is ignored.</li>
</ul>
</li>
<li><code>sentry.op</code> and <code>sentry.name</code> can also be
applied retroactively by declaring fields with value
<code>tracing::field::Empty</code> and then recorded using
<code>tracing::Span::record</code>.</li>
<li>Example usage:
<pre lang="rust"><code>#[tracing::instrument(skip_all, fields(
    sentry.op = &quot;http.server&quot;,
    sentry.name = &quot;GET /payments&quot;,
sentry.trace =
headers.get(&quot;sentry-trace&quot;).unwrap_or(&amp;&quot;&quot;.to_owned()),
))]
async fn handle_request(headers: std::collections::HashMap&lt;String,
String&gt;) {
    // ...
}
</code></pre>
</li>
<li>Additional attributes are sent along with each span by default:
<ul>
<li><code>sentry.tracing.target</code>: corresponds to the
<code>tracing</code> span's <code>metadata.target()</code></li>
<li><code>code.module.name</code>, <code>code.file.path</code>,
<code>code.line.number</code></li>
</ul>
</li>
</ul>
</li>
<li>
<p>feat(core): add Response context (<a
href="https://redirect.github.com/getsentry/sentry-rust/pull/874">#874</a>)
by <a href="https://github.com/lcian"><code>@​lcian</code></a></p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b08b24a057"><code>b08b24a</code></a>
release: 0.43.0</li>
<li><a
href="1c08ca8671"><code>1c08ca8</code></a>
ref(tracing): keep old span name as op instead of default (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/905">#905</a>)</li>
<li><a
href="75aff83c65"><code>75aff83</code></a>
fix(tracing): skip default span attributes when propagating to event (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/904">#904</a>)</li>
<li><a
href="6b61b31367"><code>6b61b31</code></a>
fix: send trace origin correctly (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/906">#906</a>)</li>
<li><a
href="75a8c03de7"><code>75a8c03</code></a>
ref(tracing): use standard code attributes (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/899">#899</a>)</li>
<li><a
href="bbd667ab00"><code>bbd667a</code></a>
meta: add pull request template (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/898">#898</a>)</li>
<li><a
href="5c8ab31b61"><code>5c8ab31</code></a>
ref(tracing): rework <code>tracing</code> to Sentry span name/op
conversion (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/887">#887</a>)</li>
<li><a
href="045c2e2fed"><code>045c2e2</code></a>
feat(tracing): send both breadcrumbs and logs by default (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/878">#878</a>)</li>
<li><a
href="a5932c0295"><code>a5932c0</code></a>
fix(transport): add rate limit for logs (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/894">#894</a>)</li>
<li><a
href="280dab99be"><code>280dab9</code></a>
build(deps): bump tracing-subscriber from 0.3.19 to 0.3.20 (<a
href="https://redirect.github.com/getsentry/sentry-rust/issues/891">#891</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/getsentry/sentry-rust/compare/0.42.0...0.43.0">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions


</details>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2025-11-02 22:31:43 +00:00
Thomas Eizinger
b11adfcfe4 feat(connlib): create flow on ICMP error "prohibited" (#10462)
In Firezone, a Client requests an "access authorization" for a Resource
on the fly when it sees the first packet for said Resource going through
the tunnel. If we don't have a connection to the Gateway yet, this is
also where we will establish a connection and create the WireGuard
tunnel.

In order for this to work, the access authorization state between the
Client and the Gateway MUST NOT get out of sync. If the Client thinks it
has access to a Resource, it will just route the traffic to the Gateway.
If the access authorization on the Gateway has expired or vanished
otherwise, the packets will be black-holed.

Starting with #9816, the Gateway sends ICMP errors back to the
application whenever it filters a packet. This can happen either because
the access authorization is gone or because the traffic wasn't allowed
by the specific filter rules on the Resource.

With this patch, the Client will attempt to create a new flow (i.e.
re-authorize) traffic for this resource whenever it sees such an ICMP
error, therefore acting as a way of synchronizing the view of the world
between Client and Gateway should they ever run out of sync.

Testing turned out to be a bit tricky. If we let the authorization on
the Gateway lapse naturally, we portal will also toggle the Resource off
and on on the Client, resulting in "flushing" the current
authorizations. Additionally, it the Client had only access to one
Resource, then the Gateway will gracefully close the connection, also
resulting in the Client creating a new flow for the next packet.

To actually trigger this new behaviour we need to:

- Access at least two resources via the same Gateway
- Directly send `reject_access` to the Gateway for this particular
resource

To achieve this, we dynamically eval some code on the API node and
instruct the Gateway channel to send `reject_access`. The connection
stays intact because there is still another active access authorization
but packets for the other resource are answered with ICMP errors.

To achieve a safe roll-out, the new behaviour is feature-flagged. In
order to still test it, we now also allow feature flags to be set via
env variables.

Resolves: #10074

---------

Co-authored-by: Mariusz Klochowicz <mariusz@klochowicz.com>
2025-09-30 08:23:39 +00:00
Thomas Eizinger
aa68029a33 feat(gateway): use hickory resolver to resolve A/AAAA queries (#10373)
At present, the Gateway performs DNS resolution for A & AAAA queries via
`libc`. The `resolve` system call only provides us with the resolved IPs
but not any of the metadata around the query such as TTL. As a result,
we can only cache DNS queries for a static amount of time, currently
30s. It would be more correct to cache them for their TTL instead.

To do so, we re-introduce `hickory-resolver` to our codebase.
Deliberately, we only use it for resolving A and AAAA records on the
Gateway for now. DNS resolution for SRV & TXT records happens one layer
below and uses the same infrastructure as DNS resolution on the Client.

Merging this is difficult however because the Gateway still supports the
control protocol of 1.3.x clients. That one requires DNS resolution
prior to setting up the connection of DNS resources which means it needs
to happen in the event-loop of the Gateway binary and cannot be moved
into the `Tunnel` where DNS resolution for Client and SRV/TXT records
happen.

Once we can drop support for 1.3.x clients, this Gateway's event-loop
will simplify drastically which will allow us to refactor this to a more
unified approach of DNS resolution. Until then, we can at least fix the
hardcoded TTL by using `hickory-resolver` in the event-loop.

The functionality is guarded behind a feature-flag which - as usual - is
off by default (i.e. for as long as we haven't fetched the flags). The
feature flag is already configured to `true` for staging and production
so we can test the new behaviour.

Resolves: #8232
Related: #10385
2025-09-23 06:00:16 +00:00
Thomas Eizinger
8877f3d7c2 chore(telemetry): remove span name from attributes in Sentry (#10278)
Before sending logs to Sentry, we perform a pass over them to make them
somewhat look like the output of `tracing_subscriber::fmt`. In
particular, we trim the span name from fields in order to shorten the
message. In our logger config, we don't render the span name at all and
just append all fields at the end of the message.

Sentry supports filtering by field names but unfortunately, those cannot
contain a colon (`:`). Given that we already trim the span name in the
actual message, it also makes sense to remove the span name from the
actual attributes. That allows us to actually filter by these attributes
and has the additional advantage that fields from different spans with
the same name are merged. This is especially useful because we purposely
reuse names like `cid` to refer the current connection from different
spans.
2025-09-02 13:59:53 +00:00
Thomas Eizinger
e84bdc5566 refactor(connlib): periodically record queue depths (#10242)
Instead of recording the queue depths on every event-loop tick, we now
record them once a second by setting a Gauge. Not only is that a simpler
instrument to work with but it is significantly more performant. The
current version - when metrics are enabled - takes on quite a bit of CPU
time.

Resolves: #10237
2025-09-02 02:57:36 +00:00
Thomas Eizinger
90803d50b1 chore(telemetry): use Firezone-specific ingest hosts (#10271)
These give us more control over where this traffic goes. For example,
based on this, we will be able to exclude this traffic from the Internet
Resource.
2025-09-01 07:36:51 +00:00
Thomas Eizinger
9cddfe59fa fix(rust): don't require Internet on startup (#10264)
With the introduction of the pre-resolved Sentry host, all Firezone
clients now require Internet on startup. That is a signficant usability
hit that we can easily fix by simply falling back to resolving the host
on-demand.
2025-09-01 01:31:05 +00:00
Thomas Eizinger
da802323e4 feat(telemetry): pre-resolve PostHog ingest host (#10207)
In order to effectively share the HTTP client for requests to PostHog,
we pre-resolve the IPs of the host and create a lazily initialised
`reqwest::Client` that gets shared between all analytics calls.
2025-08-22 13:19:53 +00:00
Thomas Eizinger
c70c88c856 build(deps): upgrade to opentelemetry 0.30 (#10239) 2025-08-21 22:47:39 +00:00
Thomas Eizinger
46afa52f78 feat(telemetry): pre-resolve Sentry ingest host (#10206)
Our Sentry client needs to resolve DNS before being able to send logs or
errors to the backend. Currently, this DNS resolution happens on-demand
as we don't take any control of the underlying HTTP client.

In addition, this will use HTTP/1.1 by default which isn't as efficient
as it could be, especially with concurrent requests.

Finally, if we decide to ever proxy all Sentry for traffic through our
own domain, we have to take control of the underlying client anyway.

To resolve all of the above, we create a custom `TransportFactory` where
we reuse the existing `ReqwestHttpTransport` but provide an already
configured `reqwest::Client` that always uses HTTP/2 with a
pre-configured set of DNS records for the given ingest host.
2025-08-21 03:28:05 +00:00
Thomas Eizinger
4e11112d9b feat(connlib): improve throughput on higher latencies (#10231)
Turns out the multi-threaded access of the TUN device on the Gateway
causes packet reordering which makes the TCP congestion controller
throttle the connection. Additionally, the default TX queue length of a
TUN device on Linux is only 500 packets.

With just a single thread and an increased TX queue length, we get a
throughput performance of just over 1 GBit/s for a 20ms link between
Client and Gateway with basically no packet drops:

```
Connecting to host 172.20.0.110, port 5201
[  5] local 100.79.130.70 port 49546 connected to 172.20.0.110 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   116 MBytes   977 Mbits/sec    0   6.40 MBytes       
[  5]   1.00-2.00   sec   137 MBytes  1.15 Gbits/sec    0   6.40 MBytes       
[  5]   2.00-3.00   sec   134 MBytes  1.13 Gbits/sec    0   6.40 MBytes       
[  5]   3.00-4.00   sec   136 MBytes  1.14 Gbits/sec   47   6.40 MBytes       
[  5]   4.00-5.00   sec   137 MBytes  1.15 Gbits/sec    0   6.40 MBytes       
[  5]   5.00-6.00   sec   138 MBytes  1.16 Gbits/sec    0   6.40 MBytes       
[  5]   6.00-7.00   sec   138 MBytes  1.15 Gbits/sec    0   6.40 MBytes       
[  5]   7.00-8.00   sec   138 MBytes  1.15 Gbits/sec    0   6.40 MBytes       
[  5]   8.00-9.00   sec   138 MBytes  1.16 Gbits/sec    0   6.40 MBytes       
[  5]   9.00-10.00  sec   138 MBytes  1.15 Gbits/sec    0   6.40 MBytes       
[  5]  10.00-11.00  sec   139 MBytes  1.17 Gbits/sec    0   6.40 MBytes       
[  5]  11.00-12.00  sec   139 MBytes  1.17 Gbits/sec    0   6.40 MBytes       
[  5]  12.00-13.00  sec   136 MBytes  1.14 Gbits/sec    0   6.40 MBytes       
[  5]  13.00-14.00  sec   139 MBytes  1.17 Gbits/sec    0   6.40 MBytes       
[  5]  14.00-15.00  sec   140 MBytes  1.17 Gbits/sec    0   6.40 MBytes       
[  5]  15.00-16.00  sec   138 MBytes  1.16 Gbits/sec    0   6.40 MBytes       
[  5]  16.00-17.00  sec   137 MBytes  1.15 Gbits/sec    0   6.40 MBytes       
[  5]  17.00-18.00  sec   139 MBytes  1.17 Gbits/sec    0   6.40 MBytes       
[  5]  18.00-19.00  sec   138 MBytes  1.16 Gbits/sec    0   6.40 MBytes       
[  5]  19.00-20.00  sec   136 MBytes  1.14 Gbits/sec    0   6.40 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-20.00  sec  2.67 GBytes  1.15 Gbits/sec   47             sender
[  5]   0.00-20.02  sec  2.67 GBytes  1.15 Gbits/sec                  receiver

iperf Done.

```

For further debugging in the future, we are now recording the send and
receive queue depths of both the TUN device and the UDP sockets. Neither
of those showed to be full in my testing which leads me to conclude that
it isn't any buffer inside Firezone that is too small here.

Related: #7452

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2025-08-20 23:08:56 +00:00
Thomas Eizinger
c8b01d9f43 fix(telemetry): timeout Sentry session shutdown within 1s (#10205) 2025-08-18 01:19:34 +00:00
Thomas Eizinger
1bdc5f0584 feat(telemetry): reuse connections to PostHog server (#10203) 2025-08-18 00:34:14 +00:00
Thomas Eizinger
ea6f1ce145 chore(telemetry): allow to dynamically change the log filter (#10065)
In addition to sending true/false for a feature-flag, PostHog also
allows us to send a payload with them. We can use this to carry the
log-filter we'd like to stream logs for. With this, we can dynamically
change which logs we are getting forwarded to Sentry.

Unfortunately, this cannot be done on a per-user basis, meaning we will
always have the same log filter for all users where the feature-flag is
enabled.
2025-08-02 10:23:35 +00:00
Thomas Eizinger
69f9a03ee8 refactor(connlib): simplify IpPacket struct (#9795)
With the removal of the NAT64/46 modules, we can now simplify the
internals of our `IpPacket` struct. The requirements for our `IpPacket`
struct are somewhat delicate.

On the one hand, we don't want to be overly restrictive in our parsing /
validation code because there is a lot of broken software out there that
doesn't necessarily follow RFCs. Hence, we want to be as lenient as
possible in what we accept.

On the other hand, we do need to verify certain aspects of the packet,
like the payload lengths. At the moment, we are somewhat too lenient
there which causes errors on the Gateway where we have to NAT or
otherwise manipulate the packets. See #9567 or #9552 for example.

To fix this, we make the parsing in the `IpPacket` constructor more
restrictive. If it is a UDP, TCP or ICMP packet, we attempt to fully
parse its headers and validate the payload lengths.

This parsing allows us to then rely on the integrity of the packet as
part of the implementation. This does create several code paths that can
in theory panic but in practice, should be impossible to hit. To ensure
that this does in fact not happen, we also tackle an issue that is long
overdue: Fuzzing.

Resolves: #6667 
Resolves: #9567
Resolves: #9552
2025-07-29 04:42:57 +00:00
Thomas Eizinger
dacc402721 chore(connlib): only log span field name into message (#9981)
When looking at logs, reducing noise is critical to make it easier to
spot important information. When sending logs to Sentry, we currently
append the fields of certain spans to message to make the output similar
to that of `tracing_subscriber::fmt`.

The actual name of a field inside a span is separated from the span name
by a colon. For example, here is a log message as we see it in Sentry
today:

> handle_input:class=success response
handle_input:from=C1A0479AA153FACA0722A5DF76343CF2BEECB10E:3478
handle_input:method=binding handle_input:rtt=34.7479ms
handle_input:tid=BB30E859ED88FFDF0786B634 request=["Software(snownet;
session=BCA42EF159C794F41AE45BF5099E54D3A193A7184C4D2C3560C2FE49C4C6CFB7)"]
response=["Software(firezone-relay; rev=e4ba5a69)",
"XorMappedAddress(B824B4035A78A6B188EF38BE13AA3C1B1B1196D6:52625)"]

Really, what we would like to see is only this:

> class=success response
from=C1A0479AA153FACA0722A5DF76343CF2BEECB10E:3478 method=binding
rtt=34.7479ms tid=BB30E859ED88FFDF0786B634 request=["Software(snownet;
session=BCA42EF159C794F41AE45BF5099E54D3A193A7184C4D2C3560C2FE49C4C6CFB7)"]
response=["Software(firezone-relay; rev=e4ba5a69)",
"XorMappedAddress(B824B4035A78A6B188EF38BE13AA3C1B1B1196D6:52625)"]

The duplication of `handle_input:` is just noise. In our local log
output, we already strip the name of the span to make it easier to read.
Here we now also do the same for the logs reported to Sentry.
2025-07-24 01:37:43 +00:00
Thomas Eizinger
82c4c39436 chore(telemetry): don't start in local environment (#9905) 2025-07-18 14:28:55 +00:00
Thomas Eizinger
a6ffdd2654 feat(snownet): reduce rekey-attempt-time to 15s (#9891)
From Sentry reports and user-submitted logs, we know that it is possible
for Client and Gateway to de-sync in regards to what each other's public
key is. In such a scenario, ICE will succeed to make a connection but
`boringtun` will fail to handshake a tunnel. By default, `boringtun`
tries for 90s to handshake a session before it gives up and expires it.

In Firezone, the ICE agent takes care of establishing connectivity
whereas `boringtun` itself just encrypts and decrypts packets. As such,
if ICE is working, we know that packets aren't getting lost but instead,
there must be some other issue as to why we cannot establish a session.

To improve the UX in these error cases, we reduce the rekey-attempt-time
to 15s. This roughly matches our ICE timeout. Those 15s count from the
moment we send the first handshake which is just after ICE completes.
Thus we can be sure that after at most 15s, we either have a working
WireGuard session or the connection gets cleaned up.

Related: #9890
Related: #9850
2025-07-17 00:50:31 +00:00
Thomas Eizinger
f5425ac8e4 fix(snownet): fail connection on handshake decryption errors (#9850)
As per the WireGuard paper, `boringtun` tries to handshake with the
remote peer for 90s before it gives up. This timeout is important
because when a session is discarded due to e.g. missing replies,
WireGuard attempts to handshake a new session. Without this timeout, we
would then try to handshake a session forever.

Unfortunately, `boringtun` does not distinguish a missing handshake
response from a bad one. Decryption errors whilst decoding a handshake
response are simply passed up to the upper layer, in our case `snownet`.

I am not sure how we can actually fail to decrypt a handshake but the
pattern we are seeing in customer logs is that this happens over and
over again, so there is no point in having `boringtun` retry the
handshake. Therefore, we immediately fail the connection when this
happens.

Failed connections are immediately removed, triggering the client send a
new connection-intent to the portal. Such a new connection intent will
then sync-up the state between Client and Gateway so both of them use
the most recent public key.

Resolves: #9845
2025-07-14 13:22:23 +00:00
Thomas Eizinger
cecca37073 feat(gateway): allow exporting metrics to an OTEL collector (#9838)
As a first step in preparation for sending OTEL metrics from Clients and
Gateways to a cloud-hosted OTEL collector, we extend the CLI of the
Gateway with configuration options to provide a gRPC endpoint to an OTEL
collector.

If `FIREZONE_METRICS` is set to `otel-collector` and an endpoint is
configured via `OTLP_GRPC_ENDPOINT`, we will report our metrics to that
collector.

The future plan for extending this is such that if `FIREZONE_METRICS` is
set to `otel-collector` (which will likely be the default) and no
`OTLP_GRPC_ENDPOINT` is set, then we will use our own, hosted OTEL
collector and report metrics IF the `export-metrics` feature-flag is set
to `true`.

This is a similar integration as we have done it with streaming logs to
Sentry. We can therefore enable it on a similar granularity as we do
with the logs and e.g. only enable it for the `firezone` account to
start with.

In meantime, customers can already make use of those metrics if they'd
like by using the current integration.

Resolves: #1550
Related: #7419

---------

Co-authored-by: Antoine Labarussias <antoinelabarussias@gmail.com>
2025-07-14 03:54:38 +00:00
Thomas Eizinger
70e4b6572f chore(rust): log environment when updating feature flags (#9855)
It is useful to know, which environment we've updated the feature-flags
for.
2025-07-13 17:27:10 +00:00
Thomas Eizinger
06f703a0b5 feat(telemetry): log use of map-enobufs-to-wouldblock (#9829)
In order to better track, how well our `ENOBUFS` mitigation is working,
we should log the use of our feature flag to PostHog. This will give us
some stats how often this is happening. That combined with the lack of
error reports should give us good confidence in permanently enabling
this behaviour.
2025-07-11 13:32:11 +00:00
Thomas Eizinger
a363f9e2fb chore: migrate service ID to hex-representation (#9836)
We aren't sending the OTEL metrics anywhere yet but it still makes sense
to also use the "newer" hex-representation of the Firezone ID here as
the service ID.
2025-07-11 12:03:50 +00:00
Thomas Eizinger
04499da11e feat(telemetry): grab env and distinct_id from Sentry session (#9801)
At present, our primary indicator as to whether telemetry is active is
whether we have a Sentry session. For our analytics events however, we
currently require passing in the Firezone ID and API url again. This
makes it difficult to send analytics events from areas of the code that
don't have this information available.

To still allow for that, we integrate the `analytics` module more
tightly with the Sentry session. This allows us to drop two parameters
from the `$identify` event and also means we now respect the
`NO_TELEMETRY` setting for these events except for `new_session`. This
event is sent regardless because it allows us to track, how many on-prem
installations of Firezone are out there.
2025-07-10 20:05:08 +00:00
Thomas Eizinger
13c8c70750 fix(connlib): treat ENOBUFS as EWOULDBLOCK (#9798)
Socket APIs across operating systems vary in how they handle
back-pressure. In most cases, a non-blocking socket should return
`EWOULDBLOCK` when it cannot send a given datagram and would have to
block to wait for resources to free up.

It appears that macOS doesn't always behave like that. In particular, we
are seeing error logs from a few users where sending a datagram fails
with

> No buffer space available (os error 55)

Digging through `libc`, I've found that this error is known as `ENOBUFS`
[0].

There are reports on the Apple developer forum [1] that recommend
retrying when this error happens. It is however unclear as to whether it
is entirely safe to map this error to `EWOULDBLOCK`. Other non-blocking
event-loop implementations [2] appear to do that but we don't know
whether it is fully correct.

At present, Firezone's behaviour here is to drop the packet. This means
the host networking stack has to fall-back to running into a timeout and
re-send the packet. This very likely negatively impacts the UX for the
users hitting this.

In order to validate this assumption, we implement a feature-flag. This
allows us to ship this code but switch back to the old behaviour, should
it negatively impact how Firezone behaves. In particular, if the
assumption that mapping `ENOBUFS` to `EWOULDBLOCK` is safe turns out
wrong and `kqueue` does in fact not signal readiness when more buffers
are available, then we may have missing wake-ups which would lead a
further delay in datagrams being sent.

[0]:
8e6f36c6ba/src/unix/bsd/apple/mod.rs (L2998)
[1]: https://developer.apple.com/forums/thread/42334
[2]:
aac866f399/src/unix/stream.c (L820)
2025-07-10 17:51:16 +00:00
Thomas Eizinger
a6796fe8b2 fix(telemetry): always use hex-encoded ID as user ID (#9781)
We are currently in the process of transitioning the Firezone Clients
away from always hashing the ID before sending it to the portal. This
will make lookups and correlation of data between our systems much
easier.

The way we are performing this migration is that new installations of
Firezone will directly generate a 64 char hex-string as the Firezone ID.
If the ID looks like a UUID (which is the old format), we still hash it
and send it to the portal, otherwise we send it as-is.

Presently, the telemetry integration with Sentry and PostHog do the
opposite. They always sets the Firezone ID as-is and includes an
`external_id` that is the hashed form if it detects that it is a UUID
(or in the case of PostHog, create an alias). It is much better to flip
this around and always set the hex-string as the user id. That way, we
can simply always filter by the `user.id` attribute in Sentry and always
refer to the ID that we are seeing in the portal.
2025-07-04 16:55:44 +00:00
Thomas Eizinger
8b001b3e8b refactor(telemetry): use atomics for feature-flags (#9783)
Feature flags may be accessed _very_ often such as on every log
statement with #9780. To make sure this is as performant as possible, we
move from an `RwLock` to atomic booleans with relaxed ordering.
2025-07-04 14:55:45 +00:00
Thomas Eizinger
ec2599d545 chore(rust): simplify stream logs feature (#9780)
Instead of conditionally enabling the `logs` feature in the Sentry
client, we always enable it and control via the `tracing` integration,
which events should get forwarded to Sentry. The feature-flag check
accesses only shared-memory and is therefore really fast.

We already re-evaluate feature flags on a timer which means this boolean
will flip over automatically and logs will be streamed to Sentry.
2025-07-04 14:51:53 +00:00
Thomas Eizinger
4be8e5458a chore(telemetry): don't fmt fields from the log crate (#9774)
Those are internal to `tracing` and don't need to be formatted into the
message we send to Sentry.
2025-07-04 00:03:18 +00:00
Thomas Eizinger
3961f6e299 chore(rust): ignore parent_span_id (#9738)
This field is included in the tracing logs but doesn't need to be
included in our message formatting as it is just noise for us.
2025-07-01 14:12:06 +00:00
Thomas Eizinger
d5be185ae4 chore(rust): remove telemetry spans and events (#9634)
Originally, we introduced these to gather some data from logs / warnings
that we considered to be too spammy. We've since merged a
burst-protection that will at most submit the same event once every 5
minutes.

The data from the telemetry spans themselves have not been used at all.
2025-06-25 17:15:57 +00:00
Thomas Eizinger
3b972643b1 feat(rust): stream logs to Sentry when enabled in PostHog (#9635)
Sentry has a new "Logs" feature where we can stream logs directly to
Sentry. Doing this for all Clients and Gateways would be way too much
data to collect though.

In order to aid debugging from customer installations, we add a
PostHog-managed feature flag that - if set to `true` - enables the
streaming of logs to Sentry. This feature flag is evaluated every time
the telemetry context is initialised:

- For all FFI usages of connlib, this happens every time a new session
is created.
- For the Windows/Linux Tunnel service, this also happens every time we
create a new session.
- For the Headless Client and Gateway, it happens on startup and
afterwards, every minute. The feature-flag context itself is only
checked every 5 minutes though so it might take up to 5 minutes before
this takes effect.

The default value - like all feature flags - is `false`. Therefore, if
there is any issue with the PostHog service, we will fallback to the
previous behaviour where logs are simply stored locally.

Resolves: #9600
2025-06-25 16:14:14 +00:00
Thomas Eizinger
d376a122e4 feat(telemetry): send account_slug to PostHog (#9636)
In order to more easily target customers with certain feature flags, we
include the `account_slug` in the `$identify` event to PostHog. This
will allow us to create Cohorts in PostHog and enable / disable feature
flags for all installations of Firezone for a particular customer.
2025-06-24 09:00:24 +00:00
Thomas Eizinger
a91dda139f feat(connlib): only conditionally hash firezone ID (#9633)
A bit of legacy that we have inherited around our Firezone ID is that
the ID stored on the user's device is sha'd before being passed to the
portal as the "external ID". This makes it difficult to correlate IDs in
Sentry and PostHog with the data we have in the portal. For Sentry and
PostHog, we submit the raw UUID stored on the user's device.

As a first step in overcoming this, we embed an "external ID" in those
services as well IF the provided Firezone ID is a valid UUID. This will
allow us to immediately correlate those events.

As a second step, we automatically generate all new Firezone IDs for the
Windows and Linux Client as `hex(sha256(uuid))`. These won't parse as
valid UUIDs and therefore will be submitted as is to the portal.

As a third step, we update all documentation around generating Firezone
IDs to use `uuidgen | sha256` instead of just `uuidgen`. This is
effectively the equivalent of (2) but for the Headless Client and
Gateway where the Firezone ID can be configured via environment
variables.

Resolves: #9382

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2025-06-24 07:05:48 +00:00
Thomas Eizinger
43db1e63e2 chore(telemetry): rate limit identical events to 1 per 5min (#9551)
It is in the nature of our application that errors may occur in rapid
succession if anything in the packet processing path fails. Most of the
time, these repeated errors don't add any additional information so
reporting one of them to Sentry is more than enough.

To achieve this, we add a `before_send` callback that utilizes a
concurrent cache with an upper bound of 10000 items and a TTL of 5
minutes. In other words, if we have submitted an event to Sentry that
had the exact same message in the last 5 minutes, we will not send it.

Internally, `moka` uses a concurrent hash map and therefore, the key is
hashed and not actually stored. Hash codes are u64, meaning the memory
footprint of this cache is only ~ 64kb (not accounting for constant
overhead of the cache internals).
2025-06-17 16:48:48 +00:00
Thomas Eizinger
182a560091 fix(telemetry): don't log events for local and CI env (#9492)
Avoids spamming PostHog with events from our CI or other instances of
the docker-compose setup.
2025-06-10 14:34:20 +00:00
Thomas Eizinger
365bc51ea9 build(deps): bump sentry to v0.38.1 (#9357)
Unfortunately, this pulls in a lot of dependencies that aren't actually
used due to a bug in `cargo`. See
https://github.com/getsentry/sentry-rust/issues/804#issuecomment-2929627500.
2025-06-06 11:25:01 +00:00
Thomas Eizinger
6ef079357c feat(connlib): add basic analytics about new sessions (#9379)
This PR adds basic analytics to `connlib` by sending two events to
PostHog:

1. `new_session` which is sent every time we establish a new session
with a Firezone backend. This could be our production or staging
instance but also a session to an on-premise installation of Firezone.
We include the API URL in the event payload to further distinguish
these.
2. `$identify` to link the client + version as well as the operating
system to the user. The user is identified by the Firezone ID.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-06-04 06:03:29 +00:00
Thomas Eizinger
85ab395276 chore(telemetry): flush before ending session (#9150)
I am not sure if this is currently breaking anything but it seems more
correct to flush all events first and then end the session.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2025-05-15 11:58:58 +00:00
Thomas Eizinger
37529803ce build(rust): bump otel ecosystem crates to 0.29 (#9029) 2025-05-05 12:33:07 +00:00
Thomas Eizinger
ea5709e8da chore(rust): initialise OTEL with useful metadata (#8945)
Once we start collecting metrics across various Clients and Gateways,
these metrics need to be tagged with the correct `service.name`,
`service.version` as well as an instance ID to differentiate metrics
from different instances.
2025-05-01 05:19:07 +00:00
Thomas Eizinger
07a82d2254 chore(relay): remove feature flag for eBPF TURN router (#8681)
The original idea of this feature flag was that we can easily disable
the eBPF router in case it causes issues in production. However,
something seems to be not working in reliably turning this on / off.
Without an explicit toggle of the feature-flag, the eBPF program doesn't
seem to be loaded correctly. The uncertainty in this makes me not the
trust the metrics that we are seeing because we don't know, whether
really all relays are using the eBPF router to relay TURN traffic.

In order to draw truthful conclusions as too how much traffic we are
relaying via eBPF, this patch removes the feature flag again. As of
#8656, we can disable the eBPF program by not setting the
`EBPF_OFFLOADING` env variable. This requires a re-deploy / restart of
relays to take effect which isn't quite as fast as toggling a feature
flag but much reliable and easier to maintain.
2025-04-07 03:31:22 +00:00
Thomas Eizinger
941ef6c668 feat(relay): introduce feature-flag for toggling eBPF program (#8650)
This PR implements a feature-flag in PostHog that we can use to toggle
the use of the eBPF data plane at runtime. At every tick of the
event-loop, the relay will compare the (cached) configuration of the
eBPF program with the (cached) value of the feature-flag. If they
differ, the flag will be updated and upon the next packet, the eBPF
program will act accordingly.

Feature-flags are re-evaluated every 5 minutes, meaning there is some
delay until this gets applied.

The default value of our all our feature-flags is `false`, meaning if
there is some problem with evaluating them, we'd turn the eBPF data
plane off. Performing routing in userspace is slower but it is a safer
default.

Resolves: #8548
2025-04-04 02:51:52 +00:00
Thomas Eizinger
3ce3c03291 fix(telemetry): introduce staging and prod PostHog projects (#8647)
As per PostHog's recommendation [0], we now use different projects to
manage the feature-flags. This allows us to turn feature flags in
staging or production on / off without affecting the other.

[0]: https://posthog.com/tutorials/multiple-environments
2025-04-04 01:56:28 +00:00
Thomas Eizinger
8ee1cb9e89 feat(telemetry): include environment in decide request (#8616)
This allows us to toggle feature-flags based on environments.
2025-04-03 11:25:03 +00:00
Thomas Eizinger
84a2c275ca build(rust): upgrade to Rust 1.85 and Edition 2024 (#8240)
Updates our codebase to the 2024 Edition. For highlights on what
changes, see the following blogpost:
https://blog.rust-lang.org/2025/02/20/Rust-1.85.0.html
2025-03-19 02:58:55 +00:00
Thomas Eizinger
e54a7c2d64 feat(connlib): regularly evaluate feature flags (#8467)
In order to be able to dynamically configure long-running applications
such as the Gateway via feature-flags, we need to regularly re-evaluate
them by sending another POST request to the `/decide` endpoint.

To do this without impacting anything else, we create a separate runtime
that is lazily initialised on first access and use that to run the async
code for connecting to the PostHog service. In addition to that, we also
spawn a task that re-evaluates the feature flags for the currently set
user in the Sentry context every 5 minutes.

Resolves: #8454

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-03-17 23:50:54 +00:00
Thomas Eizinger
d05226211b fix(connlib): don't respond to LLMNR queries with NXDOMAIN (#8426)
I suspect that one issue as part local discovery is that we respond to
LLMNR queries with NXDOMAIN if the domain isn't a resource. This is
probably wrong. LLMNR works over multicast so if a particular interface
can't respond to a query with records, it should probably not respond at
all.

Related: #8266
2025-03-13 20:36:01 +00:00