One of Rust's promises is "if it compiles, it works". However, there are
certain situations in which this isn't true. In particular, when using
dynamic typing patterns where trait objects are downcast to concrete
types, having two versions of the same dependency can silently break
things.
This happened in #7379 where I forgot to patch a certain Sentry
dependency. A similar problem exists with our `tracing-stackdriver`
dependency (see #7241).
Lastly, duplicate dependencies increase the compile-times of a project,
so we should aim for having as few duplicate versions of a particular
dependency as possible in our dependency graph.
This PR introduces `cargo deny`, a linter for Rust dependencies. In
addition to linting for duplicate dependencies, it also enforces that
all dependencies are compatible with an allow-list of licenses and it
warns when a dependency is referred to from multiple crates without
introducing a workspace dependency. Thanks to existing tooling
(https://github.com/mainmatter/cargo-autoinherit), transitioning all
dependencies to workspace dependencies was quite easy.
Resolves: #7241.
In #7389, we started tagging the built images with the branch name in
addition to other tags such as `latest` and the version number. That
wasn't enough unfortunately because we also need to tag the merged
manifest image that bundles the different architectures together as only
that one actually gets pushed to the registry.
Our `docker-compose` file references the images using the `main` tag.
However, doing a `docker compose pull` fails because the `main` tag
doesn't exist for the client, gateway and relay images.
In order for this tag to exist, we need to instruct the
`docker/metadata-action` to generate a tag for the current branch.
Hi @firezone/engineering , this is the following of
https://github.com/firezone/firezone/pull/6649
I forgot that people can be member of multiple OUs, this PR aims to add
support for this.
Imagine I have this OU architecture in my google workspace:
```mermaid
flowchart TD
A[Employees] --> B[Engineering]
A --> C[HR]
B --> D[Devs]
B --> E[Ops]
D --> F{me}
```
Currently in Firezone, I will only be a member of the Firezone Group
`OU: Devs`.
With this PR: I will be a member of `OU: Devs`, `OU: Engineering` and
`OU: Employees`
Co-authored-by: Antoine <antoinelabarussias@gmail.com>
This switches our `sentry-tracing` dependency to a fork that includes
https://github.com/getsentry/sentry-rust/pull/708. Recording our span
fields with breadcrumbs is important to provide accurate context of the
message. Without the span fields, the messages give us a lot less
information.
Since the last release, the open issue on `flush` having a flipped
return value got fixed as well.
In order to avoid processing of responses of relays that somehow got
altered on the network path, we now use the client's `password` as a
shared secret for the relay to also authenticate its responses. This
means that not all message can be authenticated. In particular, BINDING
requests will still be unauthenticated.
Performing this validation now requires every component that crafts
input to the `Allocation` to include a valid `MessageIntegrity`
attribute. This is somewhat problematic for the regression tests of the
relay and the unit tests of `Allocation`. In both cases, we implement
workarounds so we don't have to actually compute a valid
`MessageIntegrity`. This is deemed acceptable because:
- Both of these are just tests.
- We do test the validation path using `tunnel_test` because there we
run an actual relay.
When debugging issues related to our TURN allocation code, we sometimes
only have the logs that code submitted to Sentry. As part of the event,
we submit the last 500 debug logs as breadcrumbs to give more context to
the error.
Unconditionally printing the attributes of each request-response pair
will help us in more easily diagnosing, why certain errors happen.
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.20 to 4.5.21.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/releases">clap's
releases</a>.</em></p>
<blockquote>
<h2>v4.5.21</h2>
<h2>[4.5.21] - 2024-11-13</h2>
<h3>Fixes</h3>
<ul>
<li><em>(parser)</em> Ensure defaults are filled in on error with
<code>ignore_errors(true)</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/blob/master/CHANGELOG.md">clap's
changelog</a>.</em></p>
<blockquote>
<h2>[4.5.21] - 2024-11-13</h2>
<h3>Fixes</h3>
<ul>
<li><em>(parser)</em> Ensure defaults are filled in on error with
<code>ignore_errors(true)</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="03d722625a"><code>03d7226</code></a>
chore: Release</li>
<li><a
href="3df70fb2b6"><code>3df70fb</code></a>
docs: Update changelog</li>
<li><a
href="3266c36abf"><code>3266c36</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5691">#5691</a>
from epage/custom</li>
<li><a
href="951762db57"><code>951762d</code></a>
feat(complete): Allow any OsString-compatible type to be a
CompletionCandidate</li>
<li><a
href="bb6493e890"><code>bb6493e</code></a>
feat(complete): Offer - as a path option</li>
<li><a
href="27b348dbcb"><code>27b348d</code></a>
refactor(complete): Simplify ArgValueCandidates code</li>
<li><a
href="49b8108f8c"><code>49b8108</code></a>
feat(complete): Add PathCompleter</li>
<li><a
href="82a360aa54"><code>82a360a</code></a>
feat(complete): Add ArgValueCompleter</li>
<li><a
href="47aedc6906"><code>47aedc6</code></a>
fix(complete): Ensure paths are sorted</li>
<li><a
href="431e2bc931"><code>431e2bc</code></a>
test(complete): Ensure ArgValueCandidates get filtered</li>
<li>Additional commits viewable in <a
href="https://github.com/clap-rs/clap/compare/clap_complete-v4.5.20...clap_complete-v4.5.21">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
In the latest version, we added a warning log to str0m when the maximum
number of candidate pairs is exceeded:
https://github.com/algesten/str0m/pull/587.
We only ever add the candidates of a single relay to an agent (2
candidates), plus at most 2 server-reflexive candidates and at most 2
host candidates. Unless there is a bug like what we fixed in #7334,
exceeding the default number of candidate _pairs_ (100) should never
happen.
In case it does, the newly added `warn` log in `str0m` will trigger a
Sentry alert.
It was already a bit sus that we didn't receive as many errors in Sentry
from the IPC service as from the GUI client. Turns out that we forgot to
initialise our `sentry_layer` there. Additionally, we also didn't
initialise the `LogTracer`, meaning we didn't capture logs from the
`log` crate which is used by some of the dependencies, for example
`wintun`.
Explicitly creating the Sentry release allows us to associate the
commits since the last release with the new one. This might help us to
identify potential sources of regressions. For the current releases,
I've set them manually to ensure that this automation has something to
pick up on for the next release.
The releases will already exists prior to this because they are
automatically created when a client / gateway first logs in with a
certain version.
What this does it mark it as "finalized" and set the commit range
accordingly.
Resolves: #7358.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
We blew through our Edge Function invocation allotment (1M). Upon
investigating, it became clear the way we were doing caching previously
was for the app / page rendering. This is how Vercel
[instructs](https://vercel.com/docs/edge-network/caching#using-vercel-functions)
us to do it for Edge functions.
Why:
* The portal currently shows API clients in the Actors index list. Each
Actor in the list has a link to their own 'show' page. Prior to this
commit, selecting an API client from the list would result an error.
While API clients are technically an Actor, they aren't quite the same
as all other Actors because they are only used to configure the portal
for a given account. Because of this, they don't have the same
information to show as all other Actors. This commit sets the 'show' URL
for API clients to the 'settings' page to show the proper info for the
API client.
Fixes: #7370
Due to https://github.com/getsentry/sentry-rust/issues/702, errors which
are embedded as `tracing::Value` unfortunately get silently discarded
when reported as part of Sentry "Event"s and not "Exception"s.
The design idea of these telemetry events is that they aren't fatal
errors so we don't need to treat them with the highest priority. They
may also appear quite often, so to save performance and bandwidth, we
sample them at a rate of 1% at creation time.
In order to not lose the context of these errors, we instead format them
into the message. This makes them completely identical to the `debug!`
logs which we have on every call-site of `telemetry_event!` which
prompted me to make that implicit as part of creating the
`telemetry_event!`.
Resolves: #7343.
Adding more context to these errors makes it easier to identify, which
of the operations fails. In addition, we remove some usages of the "log
and return" anti-pattern to avoid duplicate reports of the same issue.
All our logic for handling errors is based on the error code. Even
though there should be a 1:1 mapping between error code and reason
phrase, I am seeing odd reports in Sentry for a case that we should be
handling but aren't.
I noticed that in case there is an error when reading from the TUN
device, we currently exit that thread and we don't have a mechanism at
the moment to restart it. Discarding the thread also means we can no
longer send new instances of `Tun` into it.
Instead of exiting the thread, we now just log the error and continue.
In case the error was caused by the FD being closed, we discard the
instance of `Tun` and wait for a new one.
Why:
* Recently we had an issue where a customer's payment info was
incorrectly entered, which caused the payment to not go through. When
something like this happens Stripe will send an update event with a
pending_update section (which we do not use currently). When the
customer fixes the payment info, and payment goes through we get another
update event with the correct subscription info, however, the previous
update (with the pending section) then gets expired and a
`pending_update_expired` event is sent to us. We had been inadvertantly
catching the event and updating the specified account with the info in
the event (which happened to be the previous state of the subscription)
Fixes: #7352
Previously, we printed only the size of each individual packet in the
`wire::net` logs. This makes it impossible to tell whether or not GRO
was used to receive this packet. The total number of bytes can still be
computed by calculating `num_packets * segment_size + trailing_bytes`.
Thus, the new log is strictly superior.
With the parallelisation of TUN and UDP operations, we lost
backpressure: Packets can now be read quicker from the UDP sockets than
they can be sent out the TUN device, causing packet loss in extremely
high-throughput situations.
To avoid this, we don't directly send packets into the channel to the
TUN device thread. This channel is bounded, meaning sending can fail if
reading UDP packets is faster than writing packets to the TUN device.
Due to GRO, we may read multiple UDP packets in one go, requiring us to
write multiple IP packets to the TUN device as part of a single
iteration in the event-loop. Thus, we cannot know, how much space we
need in the channel for outgoing IP packets.
By introducing a dedicated buffer, we can temporarily hold on to all of
these packets and on the next call to `poll`, we flush them out into the
channel. If the channel is full, we will suspend and only continue once
there is space in the channel. This behaviour restores backpressue
because we won't read UDP packets from the socket unless we have space
to write the corresponding packet to the TUN device.
UDP itself actually doesn't have any backpressure, instead the packets
will simply get dropped once the receive buffer overflows. The UDP
packets however carry encrypted IP packets, meaning whatever protocol
sits inside these packets will detect the packet loss and should
throttle their sending-pace accordingly.
Currently, some errors are double-logged when we show them to the user
because of the `tracing::error!` statements within the generation of the
user-friendly error message for the error dialog.
To get rid of these, we generalise the `show_error_dialog` function to
take just the message and move the generation of the message to a
function on the `Error` itself. This also allows us to split out a
separate error type that is only used for the elevation check, thereby
reducing the complexity of the other error enum.
I think I finally understood and correctly traced, where the use of ANSI
escape codes came from. It turns out, the `with_ansi` switch on
`tracing_subscriber::fmt::Layer` is what you want to toggle. From there,
it trickles down to the `Writer` which we can then test for in our
`Format`.
Resolves: #7284.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
A TURN server that doesn't understand certain attributes should return
"Unknown attributes" as part of its response. Whilst we aim to be as
spec-compliant as possible, Firezone doesn't officially support other
TURN servers than our own relay.
If we encounter a TURN server that sends us an "Unknown attribute", we
now immediately fail this allocation and clear it as we cannot make any
more assumptions about what the connected relay actually supports.
Errors from the tunnel can potentially happen on a per-packet basis. In
order to not flood Sentry, reduce the log-level down to `debug` and only
report 1% of all errors. We did the same thing for the gateway in #7299.
In #7163, we introduced a shared cache of server-reflexive candidates
within a `snownet::Node`. What we unfortunately overlooked is that if a
node (i.e. a client or a gateway) is behind symmetric NAT, then we will
repeatedly create "new" server-reflexive candiates, thereby filling up
this cache.
This cache is used to initialise the agents with local candidates, which
manifests in us sending dozens if not hundreds of candidates to the
other party. Whilst not harmful in itself, it does create quite a lot of
spam. To fix this, we introduce a limit of only keeping around 1
server-reflexive candidate per IP version, i.e. only 1 IPv4 and IPv6
address.
At present, `connlib` only supports a single egress interface meaning
for now, we are fine with making this assumption.
In case we encounter a new candidate of the same kind and same IP
version, we evict the old one and replace it with the new one. Thus, for
subsequent connections, only the new candidate is used.
In order to display better stacktraces when Firezone crashes, Sentry
needs debug symbols for our binaries. Debug symbols on Sentry are
retained for 90 days after they have been last used [0]. We can thus
simply upload them every time we build a binary on `main`.
For the moment, this only uploads them for the GUI client. Debug symbols
for the Android and Apple clients will be done in separate PRs.
[0]:
https://docs.sentry.io/platforms/native/data-management/debug-files/#retention-policy
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
This one has been lurking in the codebase for a while. Fortunately, it
is not very critical because invalidation of server-reflexive addresses
happens pretty rarely.
If we see these, something fishy is going on (see #7332), so we should
definitely know about these by recording Sentry events. These can
potentially be per packet so we only send a telemetry event which gets
sampled at a rate of 1%.
Using the clippy lint `unwrap_used`, we can automatically lint against
all uses of `.unwrap()` on `Result` and `Option`. This turns up quite a
few results actually. In most cases, they are invariants that can't
actually be hit. For these, we change them to `Option`. In other cases,
they can actually be hit. For example, if the user supplies an invalid
log-filter.
Activating this lint ensures the compiler will yell at us every time we
use `.unwrap` to double-check whether we do indeed want to panic here.
Resolves: #7292.
This ensure that we run prettier across all supported filetypes to check
for any formatting / style inconsistencies. Previously, it was only run
for files in the website/ directory using a deprecated pre-commit
plugin.
The benefit to keeping this in our pre-commit config is that devs can
optionally run these checks locally with `pre-commit run --config
.github/pre-commit-config.yaml`.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>