Closes#5464
These were silently broken, it was exporting an empty zip and passing
the test anyway. So this PR will cause the test to fail if the zip
wasn't fully exported, and then it will fix the export.
In our NAT table on the gateway, we try to first pick the external port
as the one on the packet that we want to translate. This makes that port
mapping consistent between NAT sessions in the majority of cases. In
case the port is taken, we iterate through two chained `Range`s that end
up cycling the entire port range.
[`RangeFrom`](https://doc.rust-lang.org/std/ops/struct.RangeFrom.html)
has a somewhat unexpected behaviour in regards to exhaustived ranges:
They panic when trying to access the next element. To avoid this, we
explicitly end the first range at `u16::MAX` which makes it an empty
range in case the source port is `u16::MAX`.
Without this, a < 1.1.0 client connecting to a > 1.1.0 gateway (i.e.
current main) causes lots of very strange logs that say:
> Assigned translation proxy_ip=X.X.X.X real_ip=X.X.X.X
Where X.X.X.X are the same IP.
Currently, we always emit a connection intent whenever we see a DNS
query for a domain of one of our DNS resources. However, especially for
wildcard DNS resources, we are very likely already connected to the
corresponding gateway. In that case, sending a connection intent
triggers another handshake with the portal only to learn that - surprise
- we should reuse a connection that we already have to that gateway.
We can short-circuit this by checking if we are already connected to the
gateway for this resource and directly requested access for the domain
name in question. We reuse the same event here as we do for refreshing
DNS resources. At a later stage, we should rename this to something else
to make this clearer.
Co-authored-by: Gabi <gabrielalejandro7@gmail.com>
This turns out to break things because we can no longer associate a
working but outdated IP with the DNS resource. Putting this up here in
case we want to merge a fix before we decide on a different one.
Reverts: #5435.
Extracted from https://github.com/firezone/firezone/pull/5426
- Replace `new` and `new_for_test` for IPC servers with `enum ServiceId`
- Rename `debug_command_setup` to `setup_stdout_logging`
It turned out there is no clever way to hide other platforms from
`cargo-mutants`, I thought I had such a way
Whenever we resolve a domain name to real IPs, we assign one proxy IP
per resolved IP. In case the DNS records for that domain actually
changed, we only appended the new proxy IPs to the list we assigned to
that domain.
If a domain no longer resolves to a certain IP, we should clear the
assigned proxy IP and stop returning in DNS responses. To achieve this,
we first remove all proxy IPs from our mapping of IP -> domain and then
add all _current_ proxy IPs back to the map.
When a user sends the first packet to a resource, we generate a
"connection intent" and consult the portal, which gateway to use for
this resource. This process is throttled to only generate a new intent
every 2s.
Once we know, which gateway to use for a certain resource, we initiate a
connection via snownet. This involves an OFFER-ANSWER handshake with the
gateway. A connection for which we have sent an offer and have not yet
received an answer is what we call a "pending connection".
In case the connection setup takes longer than 2s, we will generate
another connection intent which can point to the same gateway that we
are currently setting up a connection with.
Currently, encountering a "pending connection" during another connection
setup is treated as an error which results in some state being
cleaned-up / removed. This is where the bug surfaces: If we remove the
state for a resource as a result of a 2nd connection intent and then
receive the response of the first one, we will be left with no state
that knows about this resource.
We fix this by refactoring `create_or_reuse_connection` to be atomic in
regards to its state changes: All checks that fail the function are
moved to the top which means there is no state to clean up in case of an
error. Additionally, we model the case of a "pending connection" using
an `Option` to not flood the logs with "pending connection" warnings as
those are expected during normal operation.
Fixes: #5385
As part of #4994, the IP translation and mangling of packets to and from
DNS resources is moved to the gateway. This PR represents the
"gateway-half" of the required changes.
Eventually, the client will send a list of proxy IPs that it assigned
for a certain DNS resource. The gateway assigns each proxy IP to a real
IP and mangles outgoing and incoming traffic accordingly. There are a
number of things that we need to take care of as part of that:
- We need to implement NAT to correctly route traffic. Our NAT table
maps from source port* and destination IP to an assigned port* and real
IP. We say port* because that is only true for UDP and TCP. For ICMP, we
use the identifier.
- We need to translate between IPv4 and IPv6 in case a DNS resource e.g.
only resolves to IPv6 addresses but the client gave out an IPv4 proxy
address to the application. This translation is was added in #5364 and
is now being used here.
This PR is backwards-compatible because currently, clients don't send
any IPs to the gateway. No proxy IPs means we cannot do any translation
and thus, packets are simply routed through as is which is what the
current clients expect.
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
When we attempt to establish a connection to a gateway for a DNS
resource, the gateway must resolve the requested domain name before it
can accept the connection. Currently, this timeout is set to 60s which
is much longer than the client's connection timeout.
DNS resolution is typically a very fast protocol so reducing this
timeout to 5s should be safe. In addition, we add a compile-time
assertion that this timeout must be less than the client's connection
timeout.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
I tried to run the GUI client on my system but I think my glibc version
is too recent (2.38) and thus, it crashes after clicking on "Login".
These changes to the Nix script are necessary to at least build the
client.
You still can generate a link that will inject a text as long as it has
`@` in it - there is no good ways to validate emails other than just
check for that. The only *reliable* ways to fix that is to either remove
that text (making users more confused) or only show it if identity was
found (leaking the fact of it's existence).
Closes#4998
```[tasklist]
### Before merging
- [x] (failed) Figure out how to reconnect Firezone in Android
- [ ] How should the instructions for ChromeOS go? I assume it's a little different from Android
- [ ] Grep for TODOs in all user guides
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
We seem to be hitting `assert_receive`-style much more frequently after
"upgrading" to Enterprise Cloud (our credits expired, I was able to
renew them).
This updates the global timeout to 500ms for `assert_receive` to reduce
the likelihood `assert_push` and friends will time out on slow GH
runners.
E.g.
https://github.com/firezone/firezone/actions/runs/9556532328/job/26341986456
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Since we've decoupled the Gateway version and portal version, this fixes
an issue deploying to production where we override the Gateway binary
download version with the `TF_VAR_image_tag`, which no longer points to
a valid released binary.
Now, it will fallback to `latest`, which will download the latest
version of the published Gateway to use with the production deploy,
which is what we will expect our customers to be running as well.
This is a funny one. `cargo test -p firezone-headless-client -p
firezone-gui-client` actually passes, because the GUI client uses the
pipes feature, and Cargo apparently just does one build for both
packages. But if you build the headless Client by itself, it fails to
build.
I think this caused `cargo-mutants` to consider all its headless Client
mutants to be unviable, and so it didn't show coverage for that package.
This was needed to work around an issue with installing systemd Gateways
from our Terraform examples. Now that the publish workflow is fixed this
is no longer necessary.
Part of a yak shave to profile startup time for reducing it on Windows
#5026
Median of 3 runs:
- Windows 11 aarch64 Parallels VM - 4.8 s
- Windows 11 x86_64 laptop - 3.1 s (I thought it used to be slower)
- Windows Server 2022 VM - 22.2 s
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>