This turns out to break things because we can no longer associate a
working but outdated IP with the DNS resource. Putting this up here in
case we want to merge a fix before we decide on a different one.
Reverts: #5435.
Extracted from https://github.com/firezone/firezone/pull/5426
- Replace `new` and `new_for_test` for IPC servers with `enum ServiceId`
- Rename `debug_command_setup` to `setup_stdout_logging`
It turned out there is no clever way to hide other platforms from
`cargo-mutants`, I thought I had such a way
Whenever we resolve a domain name to real IPs, we assign one proxy IP
per resolved IP. In case the DNS records for that domain actually
changed, we only appended the new proxy IPs to the list we assigned to
that domain.
If a domain no longer resolves to a certain IP, we should clear the
assigned proxy IP and stop returning in DNS responses. To achieve this,
we first remove all proxy IPs from our mapping of IP -> domain and then
add all _current_ proxy IPs back to the map.
When a user sends the first packet to a resource, we generate a
"connection intent" and consult the portal, which gateway to use for
this resource. This process is throttled to only generate a new intent
every 2s.
Once we know, which gateway to use for a certain resource, we initiate a
connection via snownet. This involves an OFFER-ANSWER handshake with the
gateway. A connection for which we have sent an offer and have not yet
received an answer is what we call a "pending connection".
In case the connection setup takes longer than 2s, we will generate
another connection intent which can point to the same gateway that we
are currently setting up a connection with.
Currently, encountering a "pending connection" during another connection
setup is treated as an error which results in some state being
cleaned-up / removed. This is where the bug surfaces: If we remove the
state for a resource as a result of a 2nd connection intent and then
receive the response of the first one, we will be left with no state
that knows about this resource.
We fix this by refactoring `create_or_reuse_connection` to be atomic in
regards to its state changes: All checks that fail the function are
moved to the top which means there is no state to clean up in case of an
error. Additionally, we model the case of a "pending connection" using
an `Option` to not flood the logs with "pending connection" warnings as
those are expected during normal operation.
Fixes: #5385
As part of #4994, the IP translation and mangling of packets to and from
DNS resources is moved to the gateway. This PR represents the
"gateway-half" of the required changes.
Eventually, the client will send a list of proxy IPs that it assigned
for a certain DNS resource. The gateway assigns each proxy IP to a real
IP and mangles outgoing and incoming traffic accordingly. There are a
number of things that we need to take care of as part of that:
- We need to implement NAT to correctly route traffic. Our NAT table
maps from source port* and destination IP to an assigned port* and real
IP. We say port* because that is only true for UDP and TCP. For ICMP, we
use the identifier.
- We need to translate between IPv4 and IPv6 in case a DNS resource e.g.
only resolves to IPv6 addresses but the client gave out an IPv4 proxy
address to the application. This translation is was added in #5364 and
is now being used here.
This PR is backwards-compatible because currently, clients don't send
any IPs to the gateway. No proxy IPs means we cannot do any translation
and thus, packets are simply routed through as is which is what the
current clients expect.
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
When we attempt to establish a connection to a gateway for a DNS
resource, the gateway must resolve the requested domain name before it
can accept the connection. Currently, this timeout is set to 60s which
is much longer than the client's connection timeout.
DNS resolution is typically a very fast protocol so reducing this
timeout to 5s should be safe. In addition, we add a compile-time
assertion that this timeout must be less than the client's connection
timeout.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
I tried to run the GUI client on my system but I think my glibc version
is too recent (2.38) and thus, it crashes after clicking on "Login".
These changes to the Nix script are necessary to at least build the
client.
You still can generate a link that will inject a text as long as it has
`@` in it - there is no good ways to validate emails other than just
check for that. The only *reliable* ways to fix that is to either remove
that text (making users more confused) or only show it if identity was
found (leaking the fact of it's existence).
Closes#4998
```[tasklist]
### Before merging
- [x] (failed) Figure out how to reconnect Firezone in Android
- [ ] How should the instructions for ChromeOS go? I assume it's a little different from Android
- [ ] Grep for TODOs in all user guides
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
We seem to be hitting `assert_receive`-style much more frequently after
"upgrading" to Enterprise Cloud (our credits expired, I was able to
renew them).
This updates the global timeout to 500ms for `assert_receive` to reduce
the likelihood `assert_push` and friends will time out on slow GH
runners.
E.g.
https://github.com/firezone/firezone/actions/runs/9556532328/job/26341986456
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Since we've decoupled the Gateway version and portal version, this fixes
an issue deploying to production where we override the Gateway binary
download version with the `TF_VAR_image_tag`, which no longer points to
a valid released binary.
Now, it will fallback to `latest`, which will download the latest
version of the published Gateway to use with the production deploy,
which is what we will expect our customers to be running as well.
This is a funny one. `cargo test -p firezone-headless-client -p
firezone-gui-client` actually passes, because the GUI client uses the
pipes feature, and Cargo apparently just does one build for both
packages. But if you build the headless Client by itself, it fails to
build.
I think this caused `cargo-mutants` to consider all its headless Client
mutants to be unviable, and so it didn't show coverage for that package.
This was needed to work around an issue with installing systemd Gateways
from our Terraform examples. Now that the publish workflow is fixed this
is no longer necessary.
Part of a yak shave to profile startup time for reducing it on Windows
#5026
Median of 3 runs:
- Windows 11 aarch64 Parallels VM - 4.8 s
- Windows 11 x86_64 laptop - 3.1 s (I thought it used to be slower)
- Windows Server 2022 VM - 22.2 s
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Currently, our CI tests that the latest released client and gateway are
compatible with the current portal. To allow for smooth upgrades of
deployed infrastructure, we also need to test that any changes we are
making to the gateway are compatible with the latest release of the
client. This allows customers to upgrade their gateways ahead of time
before we publish updates of the clients.
This PR adds a matrix to the compatibility tests to ensure just that.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
## The problem
To find the correct peer for a given resource we keep a map of
`resource_id -> gateway_id` in the client state called
`resources_gateways`.
For CIDR resource connlib when sees a packet it does the following
steps:
1. Find the packet's corresponding resource
2. Find the resource corresponding gateway
3. Find the peer corresponding to the gateway, if none, request
access/connection
The problem was that when roaming, we didn't cleanup the map between
`resource_id -> gateway_id` so if after disconnecting with a gateway we
created a new connection due to a another resource, in step 3, connlib
would find a connected gateway and not request access.
This would cause the client to send unallowed packets to the gateway.
## Steps to reproduce
1. Open the client
2. Ping a CIDR resource on a gateway
3. roam and wait until disconnection
4. Ping a different resource on the same gateway
5. Ping the same CIDR resource as in step 2
This will result in no reply for step 5
## The fix
Cleanup the `resource -> gateway` map after disconnecting with a
gateway.
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Because of https://github.com/vercel/next.js/issues/66891, we need
custom middleware to populate the version in multiple places of the
`destination` URL for redirect artifact permalinks.
Closes#5042
Smoke test plan:
- Install on a before-Firezone VM
- Confirm logs default to `str0m=warn,info`
- Set log filter to `debug` in GUI
- Restart IPC service
- Confirm logs are `debug`
- Clear settings back to default
- Restart IPC service
- Confirm logs are `str0m=warn,info`
Directions to apply new log level:
1. Put the new log filter in
2. Click "Apply"
3. Quit Firezone Client
4. Right-click on the Start Menu and click "Terminal (Admin)" to open a
Powershell prompt
5. Run `Restart-Service -Name FirezoneClientIpcService` (on Linux, `sudo
systemctl restart firezone-client-ipc.service`)
6. Re-open Firezone Client
```[tasklist]
- [x] Log the log filter maybe
- [x] Use `atomicwrites` to write the file
- [x] (cancelled) ~~Make the GUI write the file on boot if it's not there (saves a step when upgrading from older versions)~~
- [x] Windows smoke test
- [x] Fix permissions on `/var/lib/dev.firezone.client/config`
- [x] Fix Linux IPC service not loading the log filter file
- [x] Linux smoke test
- [ ] Make sure it's okay that users in `firezone-client` can change the device ID
- [ ] Update user guides to include restarting the computer or IPC service after updating the log level?
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Closes#4765
It turns out that if I don't join the worker thread explicitly it messes
up wintun a lot. I wonder if I should report that as a bug or what. It's
kind of our fault for keeping a handle to the `Session` alive in the
thread.
```[tasklist]
- [x] Move the debug command from `gui-client` to `headless-client`
- [x] Move the initialization out of `firezone-tunnel`, revert the `pub` changes, use `anyhow::Context`
```
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Bumps [tokio-tungstenite](https://github.com/snapview/tokio-tungstenite)
from 0.21.0 to 0.23.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/snapview/tokio-tungstenite/blob/master/CHANGELOG.md">tokio-tungstenite's
changelog</a>.</em></p>
<blockquote>
<h1>0.23.0</h1>
<ul>
<li>Update <code>tungstenite</code> to <code>0.23.0</code>.</li>
<li>Disable default features on TLS crates.</li>
</ul>
<h1>0.22.0</h1>
<ul>
<li>Update TLS dependencies.</li>
<li><del>Update <code>tungstenite</code> to match
<code>0.22.0</code>.</del></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/snapview/tokio-tungstenite/commits">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Gabi <gabrielalejandro7@gmail.com>