Currently, the Gateway logs all errors that happen when the event-loop
exits on ERROR level. This creates Sentry alerts for things like
"Unauthorized" errors or "404 Not found".
That isn't useful to us. To mitigate this, we polish the code a bit to
only log an ERROR when we actually fail to setup something during
startup (like the TUN device). In all other cases, we now log a more
user-friendly message on INFO but still exit with the appropriate exit
code (0 on CTRL+C, 1 on any other error).
In order to release the new control protocol to users, we need to bump
the versions of the clients to 1.4.0. The portal has a version gate to
only select gateways with version >= 1.4.0 for clients >= 1.4.0. Thus,
bumping these versions can only happen once testing has completed and
the gateway has actually been released as 1.4.0.
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
## Context
The Gateway implements a stateful NAT that translates the destination IP
and source protocol of every packet that targets a DNS resource IP. This
is necessary because the IPs for DNS resources are generated on the
client without actually performing a DNS lookup, instead it always
generates 4 IPv4 and 4 IPv6 addresses. On the Gateway, these IPs are
then assigned in a round-robin fashion to the actual IPs that the domain
resolves to, necessitating a NAT64/46 translation in case a domain only
resolves to IPs of one family.
A domain may resolve to a set of IPs but not all of these IPs may be
routable. Whilst an arguably poor practise of the domain administrator,
routing problems can occur for all kinds of reasons and are well handled
on the wider Internet.
When an IP packet cannot be routed further, the current routing node
generates an ICMP error describing the routing failure and sends it back
to the original sender. ICMP is a layer 4 protocol itself, same as TCP
and UDP. As such, sending out a UDP packet may result in receiving an
ICMP response. In order to allow the sender to learn, which packet
failed to route, the ICMP error embeds parts of the original packet in
its payload [0] [1].
The Gateway's NAT table uses parts of the layer 4 protocol as part of
its key; the UDP and TCP source port and the ICMP echo request
identifier (further referred to as "source protocol"). An ICMP error
message doesn't have any of these, meaning the lookup in the NAT table
currently fails and the ICMP error is silently dropped.
A lot of software implements a happy-eyeballs approach and probs for
IPv6 and IPv4 connectivity simulataneously. The absence of the ICMP
errors confuses that algorithm as it detects the packet loss and starts
retransmits instead of giving up.
## Solution
Upon receiving an ICMP error on the Gateway, we now extract the
partially embedded packet in the ICMP error payload. We use the
destination IP and source protocol of _that_ packet for the lookup in
the NAT table. This returns us the original (client-assigned)
destination IP and source protocol. In order for the Gateway's NAT to be
transparent, we need to patch the packet embedded in the ICMP error to
use the original destination and source protocol. We also have to
account for the fact that the original packet may have been translated
with NAT64/46 and translate it back. Finally, we generate an ICMP error
with the appropriate code and embed the patched packet in its payload.
## Test implementation
To test that this works for all kind of combinations, we extend
`tunnel_test` to sample a list of unreachable IPs from all IPs sampled
for DNS resources. Upon receiving a packet for one of these IPs, the
Gateway will send an ICMP error back instead of invoking its regular
echo reply logic. On the client-side, upon receiving an ICMP error, we
extract the originally failed packet from the body and treat it as a
successful response.
This may seem a bit hacky at first but is actually how operating systems
would treat ICMP errors as well. For example, a `TcpSocket::connect`
call (triggering a TCP SYN packet) may fail with an IO error if we
receive an ICMP error packet. Thus, in a way, the original packet got
answered, just not with what we expected.
In addition, by treating these ICMP errors as responses to the original
packet, we automatically perform other assertions on them, like ensuring
that they come from the right IP address, that there are no unexpected
packets etc.
## Test alternatives
It is tricky to solve this in other ways in the test suite because at
the time of generating a packet for a DNS resource, we don't know the
actual IP that is being targeted by a certain proxy IP unless we'd start
reimplementing the round-robin algorithm employed by the Gateway. To
"test" the transparency of the NAT, we'd like to avoid knowing about
these implementation details in the test.
## Future work
In this PR, we currently only deal with "Destination Unreachable" ICMP
errors. There are other ICMP messages such as ICMPv6's `PacketTooBig` or
`ParameterProblem`. We should eventually handle these as well. They are
being deferred because translating those between the different IP
versions is only partially implemented and would thus require more work.
The most pressing need is to translate destination unreachable errors to
enable happy-eyeballs algorithms to work correctly.
Resolves: #5614.
Resolves: #6371.
[0]: https://www.rfc-editor.org/rfc/rfc792
[1]: https://www.rfc-editor.org/rfc/rfc4443#section-3.1
One of Rust's promises is "if it compiles, it works". However, there are
certain situations in which this isn't true. In particular, when using
dynamic typing patterns where trait objects are downcast to concrete
types, having two versions of the same dependency can silently break
things.
This happened in #7379 where I forgot to patch a certain Sentry
dependency. A similar problem exists with our `tracing-stackdriver`
dependency (see #7241).
Lastly, duplicate dependencies increase the compile-times of a project,
so we should aim for having as few duplicate versions of a particular
dependency as possible in our dependency graph.
This PR introduces `cargo deny`, a linter for Rust dependencies. In
addition to linting for duplicate dependencies, it also enforces that
all dependencies are compatible with an allow-list of licenses and it
warns when a dependency is referred to from multiple crates without
introducing a workspace dependency. Thanks to existing tooling
(https://github.com/mainmatter/cargo-autoinherit), transitioning all
dependencies to workspace dependencies was quite easy.
Resolves: #7241.
In #7389, we started tagging the built images with the branch name in
addition to other tags such as `latest` and the version number. That
wasn't enough unfortunately because we also need to tag the merged
manifest image that bundles the different architectures together as only
that one actually gets pushed to the registry.
Our `docker-compose` file references the images using the `main` tag.
However, doing a `docker compose pull` fails because the `main` tag
doesn't exist for the client, gateway and relay images.
In order for this tag to exist, we need to instruct the
`docker/metadata-action` to generate a tag for the current branch.
Explicitly creating the Sentry release allows us to associate the
commits since the last release with the new one. This might help us to
identify potential sources of regressions. For the current releases,
I've set them manually to ensure that this automation has something to
pick up on for the next release.
The releases will already exists prior to this because they are
automatically created when a client / gateway first logs in with a
certain version.
What this does it mark it as "finalized" and set the commit range
accordingly.
Resolves: #7358.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
In order to display better stacktraces when Firezone crashes, Sentry
needs debug symbols for our binaries. Debug symbols on Sentry are
retained for 90 days after they have been last used [0]. We can thus
simply upload them every time we build a binary on `main`.
For the moment, this only uploads them for the GUI client. Debug symbols
for the Android and Apple clients will be done in separate PRs.
[0]:
https://docs.sentry.io/platforms/native/data-management/debug-files/#retention-policy
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
This ensure that we run prettier across all supported filetypes to check
for any formatting / style inconsistencies. Previously, it was only run
for files in the website/ directory using a deprecated pre-commit
plugin.
The benefit to keeping this in our pre-commit config is that devs can
optionally run these checks locally with `pre-commit run --config
.github/pre-commit-config.yaml`.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Refs #6145
This bundles aarch64 and x86_64 RPMs in CI and CD.
We'll need a 2nd PR to add everything to the changelog and knowledge
base, after the first release with RPMs is cut.
In the Rust code, we use `git describe` to determine the current version
of the code. This only works if tags are actually checked out. To save
time, the `actions/checkout` action by default only does a shallow-clone
of depth 1 without any tags. Due to that, all events in Sentry just show
up as a commit hash.
Closes#7171
If the assets aren't bundled, Tauri will warn about it in `tracing`,
that will get sent to Sentry, and then it will be interpreted as an
error.
Timeline to prove that this fixes the false positive error in Sentry,
all times UTC on October 29th:
- 21:01:26 - Most recent events in Sentry as of 21:20:19
- 21:11:09 - Restarted CI while CD is quiet
- 21:14:01 - First smoke test begins
- 21:19:39 - Last smoke test ends
If edns0 doesn't work correctly DNS servers might respond with messages
bigger than our maximum udp size.
In that case we need to truncate those messages when forwarding the
respond back to the interface and expect the OS to retry with TCP.
Otherwise we aren't able to allocate a packet big enough for this.
Fixes#7121
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
In order to release #6941, we need to bump the gateway's version to
1.4.0. The portal has a version gate that only allows connection clients
which have version >= 1.4.0. Thus, in order to test #6941 on staging,
the version must not yet be bumped and is thus split out into this PR.
Closes#4883
Refs #7005
Adds support for Ubuntu 24.04, drops support for Ubuntu 20.04
Known issues:
- On Ubuntu 22.04, sometimes GNOME shows the wrong tray icon
- On Ubuntu 24.04, the first time you open the tray menu, GNOME takes a
long time to open the menu.
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Currently, tests only send ICMP packets back and forth, to expand our
coverage and later on permit us cover filters and resource picking this
PR implements sending UDP and TCP packets as part of that logic too.
To make this PR simpler in this stage TCP packets don't track an actual
TCP connection, just that they are forwarded back and forth, this will
be fixed in a future PR by emulating TCP sockets.
We also unify how we handle CIDR/DNS/Non Resources to reduce the number
of transitions.
Fixes#7003
---------
Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
At present, `connlib` only supports DNS over UDP on port 53. Responses
over UDP are size-constrained on the IP MTU and thus, not all DNS
responses fit into a UDP packet. RFC9210 therefore mandates that all DNS
resolvers must also support DNS over TCP to overcome this limitation
[0].
Handling UDP packets is easy, handling TCP streams is more difficult
because we need to effectively implement a valid TCP state machine.
Building on top of a lot of earlier work (linked in issue), this is
relatively easy because we can now simply import
`dns_over_tcp::{Client,Server}` which do the heavy lifting of sending
and receiving the correct packets for us.
The main aspects of the integration that are worth pointing out are:
- We can handle at most 10 concurrent DNS TCP connections _per defined
resolver_. The assumption here is that most applications will first
query for DNS records over UDP and only fall back to TCP if the response
is truncated. Additionally, we assume that clients will close the TCP
connections once they no longer need it.
- Errors on the TCP stream to an upstream resolver result in `SERVFAIL`
responses to the client.
- All TCP connections to upstream resolvers get reset when we roam, all
currently ongoing queries will be answered with `SERVFAIL`.
- Upon network reset (i.e. roaming), we also re-allocate new local ports
for all TCP sockets, similar to our UDP sockets.
Resolves: #6140.
[0]: https://www.ietf.org/rfc/rfc9210.html#section-3-5
Since we've added these tests, `connlib`'s test coverage has increased
significantly to the point where we don't need all of them anymore.
Especially pretty much everything in regards to relays is unnecessary to
be tested using docker.
These integration tests are sometimes flaky due to docker not starting
or images failing to pull. Thus, having fewer of them is better because
it increases CI reliability. Also, there are only so many jobs that
GitHub will execute in parallel so having less jobs is better for that
too.
Resolves: #6451.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Refs #6927
This PR creates a GTK+ event loop, a blank window, and the tray menu. It
connects to the IPC service, you can sign in and everything, but the
About window, Settings window, and Welcome window aren't implemented.
We build a deb package in CI but it isn't pushed to the draft releases
in CD yet.

Pros over Iced:
- More mature
- Easy integration with `tray-icon`
- Small binaries (< 1 MB for this example)
Cons:
- GTK 3.x is abandoned as of March. GTK 4 isn't packaged for Ubuntu
20.04.
- Widgets might be hard to use
- Hard to set up on Windows, only using this for Linux for now
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
`SENTRY_ENVIRONMENT` is only read at run time, not at build time, and we
override the environment in our code anyway, so this env var was doing
nothing.
Our tests are pretty fast now, meaning we can afford running more
permutations. This makes it less likely to encounter flakes in the
"coverage" tests where we grep for certain log lines to ensure that the
tests hit certain code paths.
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Closes#6854
- Sets release version from the GUI Client / Headless Client version
instead of the `firezone-telemetry` version
- Set environment to "production" and "staging" for well-known API URLs,
and "self-hosted" for others, since environments in Sentry can't have
slashes in them
- Sets API URL as a tag
- Sets release to `unit test` for unit testing `firezone-telemetry`
itself, since it has no good version number
<img width="398" alt="image"
src="https://github.com/user-attachments/assets/86f71193-2511-45c1-8304-413db8e5ef90">
Refs #6138
Sentry is always enabled for now. In the near future we'll make it
opt-out per device and opt-in per org (see #6138 for details)
- Replaces the `crash_handling` module
- Catches panics in GUI process, tunnel daemon, and Headless Client
- Added a couple "breadcrumbs" to play with that feature
- User ID is not set yet
- Environment is set to the API URL, e.g. `wss://api.firezone.dev`
- Reports panics from the connlib async task
- Release should be automatically pulled from the Cargo version which we
automatically set in the version Makefile
Example screenshot of sentry.io with a caught panic:
<img width="861" alt="image"
src="https://github.com/user-attachments/assets/c5188d86-10d0-4d94-b503-3fba51a21a90">
In the past, we struggled a lot of the reproducibility of `tunnel_test`
failures because our input state and transition strategies were not
deterministic. In the end, we found out that it was due to the iteration
order of `HashMap`s.
To make sure this doesn't regress, we added a check to CI at the time
that compares the debug output of all regression seeds against a 2nd run
and ensures they are the same. That is overall a bit wonky.
We can do better by simple sampling a value from the strategy twice from
a test runner with the same seed. If the strategy is deterministic,
those need to be the same. We still rely on the debug output being
identical because:
a. Deriving `PartialEq` on everything is somewhat cumbersome
b. We actually care about the iteration order which a fancy `PartialEq`
implementation might ignore