One of Rust's promises is "if it compiles, it works". However, there are
certain situations in which this isn't true. In particular, when using
dynamic typing patterns where trait objects are downcast to concrete
types, having two versions of the same dependency can silently break
things.
This happened in #7379 where I forgot to patch a certain Sentry
dependency. A similar problem exists with our `tracing-stackdriver`
dependency (see #7241).
Lastly, duplicate dependencies increase the compile-times of a project,
so we should aim for having as few duplicate versions of a particular
dependency as possible in our dependency graph.
This PR introduces `cargo deny`, a linter for Rust dependencies. In
addition to linting for duplicate dependencies, it also enforces that
all dependencies are compatible with an allow-list of licenses and it
warns when a dependency is referred to from multiple crates without
introducing a workspace dependency. Thanks to existing tooling
(https://github.com/mainmatter/cargo-autoinherit), transitioning all
dependencies to workspace dependencies was quite easy.
Resolves: #7241.
This switches our `sentry-tracing` dependency to a fork that includes
https://github.com/getsentry/sentry-rust/pull/708. Recording our span
fields with breadcrumbs is important to provide accurate context of the
message. Without the span fields, the messages give us a lot less
information.
Since the last release, the open issue on `flush` having a flipped
return value got fixed as well.
In order to avoid processing of responses of relays that somehow got
altered on the network path, we now use the client's `password` as a
shared secret for the relay to also authenticate its responses. This
means that not all message can be authenticated. In particular, BINDING
requests will still be unauthenticated.
Performing this validation now requires every component that crafts
input to the `Allocation` to include a valid `MessageIntegrity`
attribute. This is somewhat problematic for the regression tests of the
relay and the unit tests of `Allocation`. In both cases, we implement
workarounds so we don't have to actually compute a valid
`MessageIntegrity`. This is deemed acceptable because:
- Both of these are just tests.
- We do test the validation path using `tunnel_test` because there we
run an actual relay.
When debugging issues related to our TURN allocation code, we sometimes
only have the logs that code submitted to Sentry. As part of the event,
we submit the last 500 debug logs as breadcrumbs to give more context to
the error.
Unconditionally printing the attributes of each request-response pair
will help us in more easily diagnosing, why certain errors happen.
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.20 to 4.5.21.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/releases">clap's
releases</a>.</em></p>
<blockquote>
<h2>v4.5.21</h2>
<h2>[4.5.21] - 2024-11-13</h2>
<h3>Fixes</h3>
<ul>
<li><em>(parser)</em> Ensure defaults are filled in on error with
<code>ignore_errors(true)</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/blob/master/CHANGELOG.md">clap's
changelog</a>.</em></p>
<blockquote>
<h2>[4.5.21] - 2024-11-13</h2>
<h3>Fixes</h3>
<ul>
<li><em>(parser)</em> Ensure defaults are filled in on error with
<code>ignore_errors(true)</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="03d722625a"><code>03d7226</code></a>
chore: Release</li>
<li><a
href="3df70fb2b6"><code>3df70fb</code></a>
docs: Update changelog</li>
<li><a
href="3266c36abf"><code>3266c36</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5691">#5691</a>
from epage/custom</li>
<li><a
href="951762db57"><code>951762d</code></a>
feat(complete): Allow any OsString-compatible type to be a
CompletionCandidate</li>
<li><a
href="bb6493e890"><code>bb6493e</code></a>
feat(complete): Offer - as a path option</li>
<li><a
href="27b348dbcb"><code>27b348d</code></a>
refactor(complete): Simplify ArgValueCandidates code</li>
<li><a
href="49b8108f8c"><code>49b8108</code></a>
feat(complete): Add PathCompleter</li>
<li><a
href="82a360aa54"><code>82a360a</code></a>
feat(complete): Add ArgValueCompleter</li>
<li><a
href="47aedc6906"><code>47aedc6</code></a>
fix(complete): Ensure paths are sorted</li>
<li><a
href="431e2bc931"><code>431e2bc</code></a>
test(complete): Ensure ArgValueCandidates get filtered</li>
<li>Additional commits viewable in <a
href="https://github.com/clap-rs/clap/compare/clap_complete-v4.5.20...clap_complete-v4.5.21">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
In the latest version, we added a warning log to str0m when the maximum
number of candidate pairs is exceeded:
https://github.com/algesten/str0m/pull/587.
We only ever add the candidates of a single relay to an agent (2
candidates), plus at most 2 server-reflexive candidates and at most 2
host candidates. Unless there is a bug like what we fixed in #7334,
exceeding the default number of candidate _pairs_ (100) should never
happen.
In case it does, the newly added `warn` log in `str0m` will trigger a
Sentry alert.
It was already a bit sus that we didn't receive as many errors in Sentry
from the IPC service as from the GUI client. Turns out that we forgot to
initialise our `sentry_layer` there. Additionally, we also didn't
initialise the `LogTracer`, meaning we didn't capture logs from the
`log` crate which is used by some of the dependencies, for example
`wintun`.
Due to https://github.com/getsentry/sentry-rust/issues/702, errors which
are embedded as `tracing::Value` unfortunately get silently discarded
when reported as part of Sentry "Event"s and not "Exception"s.
The design idea of these telemetry events is that they aren't fatal
errors so we don't need to treat them with the highest priority. They
may also appear quite often, so to save performance and bandwidth, we
sample them at a rate of 1% at creation time.
In order to not lose the context of these errors, we instead format them
into the message. This makes them completely identical to the `debug!`
logs which we have on every call-site of `telemetry_event!` which
prompted me to make that implicit as part of creating the
`telemetry_event!`.
Resolves: #7343.
Adding more context to these errors makes it easier to identify, which
of the operations fails. In addition, we remove some usages of the "log
and return" anti-pattern to avoid duplicate reports of the same issue.
All our logic for handling errors is based on the error code. Even
though there should be a 1:1 mapping between error code and reason
phrase, I am seeing odd reports in Sentry for a case that we should be
handling but aren't.
I noticed that in case there is an error when reading from the TUN
device, we currently exit that thread and we don't have a mechanism at
the moment to restart it. Discarding the thread also means we can no
longer send new instances of `Tun` into it.
Instead of exiting the thread, we now just log the error and continue.
In case the error was caused by the FD being closed, we discard the
instance of `Tun` and wait for a new one.
Previously, we printed only the size of each individual packet in the
`wire::net` logs. This makes it impossible to tell whether or not GRO
was used to receive this packet. The total number of bytes can still be
computed by calculating `num_packets * segment_size + trailing_bytes`.
Thus, the new log is strictly superior.
With the parallelisation of TUN and UDP operations, we lost
backpressure: Packets can now be read quicker from the UDP sockets than
they can be sent out the TUN device, causing packet loss in extremely
high-throughput situations.
To avoid this, we don't directly send packets into the channel to the
TUN device thread. This channel is bounded, meaning sending can fail if
reading UDP packets is faster than writing packets to the TUN device.
Due to GRO, we may read multiple UDP packets in one go, requiring us to
write multiple IP packets to the TUN device as part of a single
iteration in the event-loop. Thus, we cannot know, how much space we
need in the channel for outgoing IP packets.
By introducing a dedicated buffer, we can temporarily hold on to all of
these packets and on the next call to `poll`, we flush them out into the
channel. If the channel is full, we will suspend and only continue once
there is space in the channel. This behaviour restores backpressue
because we won't read UDP packets from the socket unless we have space
to write the corresponding packet to the TUN device.
UDP itself actually doesn't have any backpressure, instead the packets
will simply get dropped once the receive buffer overflows. The UDP
packets however carry encrypted IP packets, meaning whatever protocol
sits inside these packets will detect the packet loss and should
throttle their sending-pace accordingly.
Currently, some errors are double-logged when we show them to the user
because of the `tracing::error!` statements within the generation of the
user-friendly error message for the error dialog.
To get rid of these, we generalise the `show_error_dialog` function to
take just the message and move the generation of the message to a
function on the `Error` itself. This also allows us to split out a
separate error type that is only used for the elevation check, thereby
reducing the complexity of the other error enum.
I think I finally understood and correctly traced, where the use of ANSI
escape codes came from. It turns out, the `with_ansi` switch on
`tracing_subscriber::fmt::Layer` is what you want to toggle. From there,
it trickles down to the `Writer` which we can then test for in our
`Format`.
Resolves: #7284.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
A TURN server that doesn't understand certain attributes should return
"Unknown attributes" as part of its response. Whilst we aim to be as
spec-compliant as possible, Firezone doesn't officially support other
TURN servers than our own relay.
If we encounter a TURN server that sends us an "Unknown attribute", we
now immediately fail this allocation and clear it as we cannot make any
more assumptions about what the connected relay actually supports.
Errors from the tunnel can potentially happen on a per-packet basis. In
order to not flood Sentry, reduce the log-level down to `debug` and only
report 1% of all errors. We did the same thing for the gateway in #7299.
In #7163, we introduced a shared cache of server-reflexive candidates
within a `snownet::Node`. What we unfortunately overlooked is that if a
node (i.e. a client or a gateway) is behind symmetric NAT, then we will
repeatedly create "new" server-reflexive candiates, thereby filling up
this cache.
This cache is used to initialise the agents with local candidates, which
manifests in us sending dozens if not hundreds of candidates to the
other party. Whilst not harmful in itself, it does create quite a lot of
spam. To fix this, we introduce a limit of only keeping around 1
server-reflexive candidate per IP version, i.e. only 1 IPv4 and IPv6
address.
At present, `connlib` only supports a single egress interface meaning
for now, we are fine with making this assumption.
In case we encounter a new candidate of the same kind and same IP
version, we evict the old one and replace it with the new one. Thus, for
subsequent connections, only the new candidate is used.
This one has been lurking in the codebase for a while. Fortunately, it
is not very critical because invalidation of server-reflexive addresses
happens pretty rarely.
If we see these, something fishy is going on (see #7332), so we should
definitely know about these by recording Sentry events. These can
potentially be per packet so we only send a telemetry event which gets
sampled at a rate of 1%.
Using the clippy lint `unwrap_used`, we can automatically lint against
all uses of `.unwrap()` on `Result` and `Option`. This turns up quite a
few results actually. In most cases, they are invariants that can't
actually be hit. For these, we change them to `Option`. In other cases,
they can actually be hit. For example, if the user supplies an invalid
log-filter.
Activating this lint ensures the compiler will yell at us every time we
use `.unwrap` to double-check whether we do indeed want to panic here.
Resolves: #7292.
This ensure that we run prettier across all supported filetypes to check
for any formatting / style inconsistencies. Previously, it was only run
for files in the website/ directory using a deprecated pre-commit
plugin.
The benefit to keeping this in our pre-commit config is that devs can
optionally run these checks locally with `pre-commit run --config
.github/pre-commit-config.yaml`.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Bundles together several minor improvements around telemetry:
- Removes the obsolete "Firezone" context: This is now included in the
user context as of #7310.
- Entirely encapsulates `sentry` within the `telemetry` module
- Concludes sessions that were not explicitly closed as "abnormal"
All warnings triggered events in Sentry. This particular warning is of
no concern, it simply means that the user clicked on "Sign out" while we
were trying to set up the tunnel.
Resolves: #7250.
Currently, we don't report very detailed errors when we fail to parse
certain IP packets. With this patch, we use `Result` in more places and
also extend the validation of IP packets to:
a) enforce a length of at most 1280 bytes. This should already be the
case due to our MTU but bad things may happen if that is off for some
reason
b) validate the entire IP packet instead of just its header
Our logging library `tracing` supports structured logging. Structured
logging means we can include values within a `tracing::Event` without
having to immediately format it as a string. Processing these values -
such as errors - as their original type allows the various `tracing`
layers to capture and represent them as they see fit.
One of these layers is responsible for sending ERROR and WARN events to
Sentry, as part of which `std::error::Error` values get automatically
captured as so-called "sentry exceptions".
Unfortunately, there is a caveat: If an `std::error::Error` value is
included in an event that does not get mapped to an exception, the
`error` field is completely lost. See
https://github.com/getsentry/sentry-rust/issues/702 for details.
To work around this, we introduce a `err_with_sources` adapter that an
error and all its sources together into a string. For all
`tracing::debug!` statements, we then use this to report these errors.
It is really unfortunate that we have to do this and cannot use the same
mechanism, regardless of the log level. However, until this is fixed
upstream, this will do and gives us better information in the log
submitted to Sentry.
With a retry-mechanism in place, there is no need to log a warning when
`connect_to_service` fails. Instead, we just log this as on DEBUG and
continue trying. If it fails after all attempts, the entire function
will bail out and we will receive a Sentry event from error handling
higher up the callstack.
This switches our dependency on `boringtun` over to our fork at
https://github.com/firezone/boringtun. The idea of the fork is to
carefully only patch selective parts such that upstream things later is
still possible. The complete diff can be seen here:
https://github.com/cloudflare/boringtun/compare/master...firezone:boringtun:master
So far, the only patches in the fork are dependency bumps, linter fixes,
adjustments to log levels and the removal of panics when the destination
buffer is too small.
The `Server::new` function already returns a `Future`. Calling `.await`
on that within an `async` block is equivalent to just calling the `new`
function itself.
Within our test suite, we "spin" for several (simulated) seconds after
each state transition to allow for packets being sent between the
different nodes. The test suite simulates different latencies by
delaying the delivery of some of these packets.
`connlib` has several timers for sending packets, i.e. STUN bindings, WG
keep-alives etc. These timers never end so we cannot simply spin "until
we no longer want to send any packets". Currently, we simply hard-stop
after a few seconds and drop the remaining packets and move on to the
next state transition.
At present, this isn't an issue because only our ICE agent adheres to
the simulated time advancement. `boringtun` is still impure and thus we
usually don't get to see any of the WireGuard packets like keep-alives
and session timeouts etc in our tests. The STUN messages are pretty
resilient to retransmissions so the current packet drop doesn't matter.
In the process of adopting our boringtun fork
(https://github.com/firezone/boringtun) where we will eventually fix the
time impurity, dropping some of these packets caused problems.
To fix this, we now drain all remaining packets that are sitting in the
"yet-to-be-delivered" buffer. These packets are delivered to an "inbox"
that is per-host, meaning the host (i.e. client, gateway or relay) will
still perceive the incoming packet with the correct latency.
We extract this functionality from #7120 because it is generally useful.
Instead of collapsing multiple of these errors into one, we emit a
dedicated error message for each case. This will allow us to distinguish
them within Sentry events.
Windows has some funny behaviour where creating the deep-link server
sometimes fails and we have to try again. Currently, each of these
operations is logged as a warning when it would actually succeed later.
These create unnecessary Sentry alerts.
If we run out of attempts to create the deep-link server (currently 10),
the entire function fails which will be logged as an error further down.
The last 500 INFO and DEBUG logs will be captured as breadcrumbs
together with the event, meaning we still get to see those error
messages on why it failed to create the deep-link server.
Resolves: #7238.
"Just let it crash" is terrible advice for software that is shipped to
end users. Where possible, we should use proper error handling and only
fail the current function / task that is active, e.g. drop a particular
packet instead of failing all of connlib. We more or less already do
that.
Activating the clippy lint `unwrap_in_result` surfaced a few more places
where we panic despite being in a function that is fallible already.
These cases can easily be converted to not panic and return an error
instead.
When `connlib` fails to establish a session, the GUI client currently
only captures the top-level error within `connect_to_firezone` because
it uses `.to_string()` for all errors. Unfortunately, that doesn't print
any of the sources of an error.
To conveniently capture all sources, we can use `anyhow` and its
alternate formatting using `format!("{e:#}")` (notice the `#`). Not all
errors within `connect_to_firezone` should be captured like this
however. Certain IO errors, in particular when trying to resolve the
domain of the portal, need to be captured separately because they may
resolve by themselves if we gain connectivity again. This is important,
otherwise we discard the users token when they boot-up a machine without
internet access yet Firezone is auto-starting.
To make this more ergonomic, we trim down `IpcServiceError` to two
variants: The IO variant we need to special-case and everything else.
This allows us to create `From` impls which "do the right thing" by
capturing more error information using `anyhow`'s alternate formatting.
Bumps [test-strategy](https://github.com/frozenlib/test-strategy) from
0.3.1 to 0.4.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c683eb3cf6"><code>c683eb3</code></a>
Version 0.4.0.</li>
<li><a
href="17706bcd1c"><code>17706bc</code></a>
Update MSRV to 1.70.0.</li>
<li><a
href="90a5efbf00"><code>90a5efb</code></a>
Update dependencies.</li>
<li><a
href="cff2ede71f"><code>cff2ede</code></a>
Changed the strategy generated by <code>#[filter(...)]</code> to reduce
`Too many local ...</li>
<li><a
href="34cc6d2545"><code>34cc6d2</code></a>
Update expected compile error message.</li>
<li><a
href="a4427e2d98"><code>a4427e2</code></a>
Update CI settings.</li>
<li><a
href="ecb7dbae04"><code>ecb7dba</code></a>
Clippy.</li>
<li><a
href="637f29e9c8"><code>637f29e</code></a>
Made it so an error occurs when an unsupported attribute is specified
for enu...</li>
<li><a
href="6d66057bb0"><code>6d66057</code></a>
Use <code>test</code> instead of <code>check</code> with <code>cargo
hack --rust-version</code>.</li>
<li><a
href="cee2ebbfe6"><code>cee2ebb</code></a>
Fix CI settings.</li>
<li>Additional commits viewable in <a
href="https://github.com/frozenlib/test-strategy/compare/v0.3.1...v0.4.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Within the Tauri client, we invoke commands from TypeScript on the Rust
side. These commands can sometimes fail, which is why these commands
return a `Result`.
Most of our commands actually only send messages through a channel to an
event-loop. This can only fail if the other side of the channel is
closed, which should(?) only happen if the program is shutting down or
some part of it crashed. Regardless, these errors can directly be
forwarded to the TypeScript code where they will get caught and logged
to the browser console.
In the future, we can install Sentry's TypeScript client in the GUI code
to automatically report errors on the TypeScript side too.
Resolves: #7256.