In `connlib`, traffic is sent through sockets via one of three ways:
1. Direct p2p traffic between clients and gateways: For these, we always
explicitly set the source IP (and thus interface).
2. UDP traffic to the relays: For these, we let the OS pick an
appropriate source interface.
3. WebSocket traffic over TCP to the portal: For this too, we let the OS
pick the source interface.
For (2) and (3), it is possible to run into routing loops, depending on
the routes that we have configured on the TUN device.
In Linux, we can prevent routing loops by marking a socket [0] and
repeating the mark when we add routes [1]. Packets sent via a marked
socket won't be routed by a rule that contains this mark. On Android, we
can do something similar by "protecting" a socket via a syscall on the
Java side [2].
On Windows, routing works slightly different. There, the source
interface is determined based on a computed metric [3] [4]. To prevent
routing loops on Windows, we thus need to find the "next best" interface
after our TUN interface. We can achieve this with a combination of
several syscalls:
1. List all interfaces on the machine
2. Ask Windows for the best route on each interface, except our TUN
interface.
3. Sort by Windows' routing metric and pick the lowest one (lower is
better).
Thanks to the abstraction of `SocketFactory` that we already previously
introduced, Integrating this into `connlib` isn't too difficult:
1. For TCP sockets, we simply resolve the best route after creating the
socket and then bind it to that local interface. That way, all packets
will always going via that interface, regardless of which routes are
present on our TUN interface.
2. UDP is connection-less so we need to decide per-packet, which
interface to use. "Pick the best interface for me" is modelled in
`connlib` via the `DatagramOut::src` field being `None`.
- To ensure those packets don't cause a routing loop, we introduce a
"source IP resolver" for our `UdpSocket`. This function gets called
every time we need to send a packet without a source IP.
- For improved performance, we cache these results. The Windows client
uses this source IP resolver to use the above devised strategy to find a
suitable source IP.
- In case the source IP resolution fails, we don't send the packet. This
is important, otherwise, the kernel might choose our TUN interface again
and trigger a routing loop.
The last remark to make here is that this also works for connection
roaming. The TCP socket gets thrown away when we reconnect to the
portal. Thus, the new socket will pick the new best interface as it is
re-created. The UDP sockets also get thrown away as part of roaming.
That clears the above cache which is what we want: Upon roaming, the
best interface for a given destination IP will likely have changed.
[0]:
59014a9622/rust/headless-client/src/linux.rs (L19-L29)
[1]:
59014a9622/rust/bin-shared/src/tun_device_manager/linux.rs (L204-L224)
[2]:
59014a9622/rust/connlib/clients/android/src/lib.rs (L535-L549)
[3]:
https://learn.microsoft.com/en-us/previous-versions/technet-magazine/cc137807(v=msdn.10)?redirectedfrom=MSDN
[4]:
https://learn.microsoft.com/en-us/windows-server/networking/technologies/network-subsystem/net-sub-interface-metricFixes: #5955.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.8.0 to 1.10.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/uuid-rs/uuid/releases">uuid's
releases</a>.</em></p>
<blockquote>
<h2>1.10.0</h2>
<h2>Deprecations</h2>
<p>This release deprecates and renames the following functions:</p>
<ul>
<li><code>Builder::from_rfc4122_timestamp</code> ->
<code>Builder::from_gregorian_timestamp</code></li>
<li><code>Builder::from_sorted_rfc4122_timestamp</code> ->
<code>Builder::from_sorted_gregorian_timestamp</code></li>
<li><code>Timestamp::from_rfc4122</code> ->
<code>Timestamp::from_gregorian</code></li>
<li><code>Timestamp::to_rfc4122</code> ->
<code>Timestamp::to_gregorian</code></li>
</ul>
<h2>What's Changed</h2>
<ul>
<li>Use const identifier in uuid macro by <a
href="https://github.com/Vrajs16"><code>@Vrajs16</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/764">uuid-rs/uuid#764</a></li>
<li>Rename most methods referring to RFC4122 by <a
href="https://github.com/Mikopet"><code>@Mikopet</code></a> / <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/765">uuid-rs/uuid#765</a></li>
<li>prepare for 1.10.0 release by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/766">uuid-rs/uuid#766</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/Vrajs16"><code>@Vrajs16</code></a> made
their first contribution in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/764">uuid-rs/uuid#764</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/uuid-rs/uuid/compare/1.9.1...1.10.0">https://github.com/uuid-rs/uuid/compare/1.9.1...1.10.0</a></p>
<h2>1.9.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Add an example of generating bulk v7 UUIDs by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/761">uuid-rs/uuid#761</a></li>
<li>Avoid taking the shared lock when getting usable bits in
Uuid::now_v7 by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/762">uuid-rs/uuid#762</a></li>
<li>Prepare for 1.9.1 release by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/763">uuid-rs/uuid#763</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/uuid-rs/uuid/compare/1.9.0...1.9.1">https://github.com/uuid-rs/uuid/compare/1.9.0...1.9.1</a></p>
<h2>1.9.0</h2>
<h2><code>Uuid::now_v7()</code> is guaranteed to be monotonic</h2>
<p>Before this release, <code>Uuid::now_v7()</code> would only use the
millisecond-precision timestamp for ordering. It now also uses a global
42-bit counter that's re-initialized each millisecond so that the
following will always pass:</p>
<pre lang="rust"><code>let a = Uuid::now_v7();
let b = Uuid::now_v7();
<p>assert!(a < b);<br />
</code></pre></p>
<h2>What's Changed</h2>
<ul>
<li>Add a get_node_id method for v1 and v6 UUIDs by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/748">uuid-rs/uuid#748</a></li>
<li>Update atomic and zerocopy to latest by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/750">uuid-rs/uuid#750</a></li>
<li>Add repository field to uuid-macro-internal crate by <a
href="https://github.com/paolobarbolini"><code>@paolobarbolini</code></a>
in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/752">uuid-rs/uuid#752</a></li>
<li>update docs to updated RFC (from 4122 to 9562) by <a
href="https://github.com/Mikopet"><code>@Mikopet</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/753">uuid-rs/uuid#753</a></li>
<li>Support counters in v7 UUIDs by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/755">uuid-rs/uuid#755</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/paolobarbolini"><code>@paolobarbolini</code></a>
made their first contribution in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/752">uuid-rs/uuid#752</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4b4c590ae3"><code>4b4c590</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/766">#766</a> from
uuid-rs/cargo/1.10.0</li>
<li><a
href="68eff32640"><code>68eff32</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/765">#765</a> from
uuid-rs/chore/time-fn-deprecations</li>
<li><a
href="3d5384da4b"><code>3d5384d</code></a>
update docs and deprecation messages for timestamp fns</li>
<li><a
href="de50f2091f"><code>de50f20</code></a>
renaming rfc4122 functions</li>
<li><a
href="4a8841792a"><code>4a88417</code></a>
prepare for 1.10.0 release</li>
<li><a
href="66b4fcef14"><code>66b4fce</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/764">#764</a> from
Vrajs16/main</li>
<li><a
href="8896e26c42"><code>8896e26</code></a>
Use expr instead of ident</li>
<li><a
href="09973d6aff"><code>09973d6</code></a>
Added changes</li>
<li><a
href="6edf3e8cd5"><code>6edf3e8</code></a>
Use const identifer in uuid macro</li>
<li><a
href="36e6f573aa"><code>36e6f57</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/763">#763</a> from
uuid-rs/cargo/1.9.1</li>
<li>Additional commits viewable in <a
href="https://github.com/uuid-rs/uuid/compare/1.8.0...1.10.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The different implementations of `Tun` are the last platform-specific
code within `firezone-tunnel`. By introducing a dedicated crate and a
`Tun` trait, we can move this code into (platform-specific) leaf crates:
- `connlib-client-android`
- `connlib-client-apple`
- `firezone-bin-shared`
Related: #4473.
---------
Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
We almost never `Debug`-print our IDs. Except in the proptests where the
test runner prints them. To allow for better use of full-text search,
apply the same formatting that we have for the `Display` output to the
`Debug` output as well.
In preparation for #2667, we add an `internet` variant to our list of
possible resource types. This is backwards-compatible with existing
clients and ensures that, once the portal starts sending Internet
resources to clients, they won't fail to deserialise these messages.
The portal will have a version check to not send this to older clients
anyway but the sooner we can land this, the better. It simplifies the
initial development as we start preparing for the next client release.
Adding new fields to a JSON message is always backwards-compatible so we
can extend this later with whatever we need.
Currently, the relationship between gateways, sites and resources is
modeled in an ad-hoc fashion within `tunnel_test`. The correct
relationship is:
- The portal knows about all sites.
- A resource can only be added for an existing site.
- One or more gateways belong to a single site.
To express this relationship in `tunnel_test`, we first sample between 1
and 3 sites. Then we sample between 1 and 3 gateways and assign them a
site each. When adding new resources, we sample a site that the resource
belongs to. Upon a connection intent, we sample a gateway from all
gateways that belong to the site that the resource is defined in.
In addition, this patch-set removes multi-site resources from the
`tunnel_test`. As far as connlib's routing logic is concerned, we route
packets to a resource on a selected gateway. How the portal selected the
site of the gateway doesn't matter to connlib and thus doesn't need to
be covered in these tests.
Extracted from #5840
Some cleanup on generating IPs and improve performance of picking a host
within an IP range by doing some math instead of iterating through the
ip range.
Connlib's routing logic and networking code is entirely platform
agnostic. The only platform-specific bit is how we interact with the TUN
device. From connlib's perspective though, all it needs is an interface
for reading and writing. How the device gets initialised and updated is
client-business.
For the most part, this is the same on all platforms: We call callbacks
and the client updates the state accordingly. The only annoying bit here
is that Android recreates the TUN interface on every update and thus our
old file descriptor is invalid. The current design works around this by
returning the new file descriptor on Android. This is a problematic
design for several reasons:
- It forces the callback handler to finish synchronously, and halting
connlib until this is complete.
- The synchronous nature also means we cannot replace the callbacks with
events as events don't have a return value.
To fix this, we introduce a new `set_tun` method on `Tunnel`. This moves
the business of how the `Tun` device is created up to the client. The
clients are already platform-specific so this makes sense. In a future
iteration, we can move all the various `Tun` implementations all the way
up to the client-specific crates, thus co-locating the platform-specific
code.
Initialising `Tun` from the outside surfaces another issue: The routes
are still set via the `Tun` handle on Windows. To fix this, we introduce
a `make_tun` function on `TunDeviceManager` in order for it to remember
the interface index on Windows and being able to move the setting of
routes to `TunDeviceManager`.
This simplifies several of connlib's APIs which are now infallible.
Resolves: #4473.
---------
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: conectado <gabrielalejandro7@gmail.com>
Currently, `tunnel_test` exercises a lot of code paths within connlib
already by adding & removing resources, roaming the client and sending
ICMP packets. Yet, it does all of this with just a single gateway
whereas in production, we are very likely using more than one gateway.
To capture these other code-paths, we now sample between 1 and 3
gateways and randomly assign the added resources to one of them, which
makes us hit the codepaths that select between different gateways.
Most importantly, the reference implementation has barely any knowledge
about those individual connections. Instead, it is implementation in
terms of connectivity to resources.
The `TunDeviceManager` is a component that the leaf-nodes of our
dependency tree need: the binaries. Thus, it is misplaced in the
`connlib-shared` crate which is at the very bottom of the dependency
tree.
This is necessary to allow the `TunDeviceManager` to actually construct
a `Tun` (which currently lives in `firezone-tunnel`).
Related: #5839.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Currently, the logging of fields in spans for encapsulate and
decapsulate operations is a bit inconsistent between client and gateway.
Logging the `from` field for every message is actually quite redundant
because most of these logs are emitted within `snownet`'s `Allocation`
which can add its own span to indicate, which relay we are talking to.
For most other operations, it is much more useful to log the connection
ID instead of IPs.
This should make the logs a bit more succinct.
We have several representations of `ResourceDescription` within connlib.
The ones within the `callbacks` module are meant for _presentation_ to
the clients and thus contain additional information like the site
status.
The `From` impls deleted within the PR are only used within tests. We
can rewrite those tests by asserting on the presented data instead.
This is better because it means information about resources only flows
in one direction: From connlib to the clients.
We are referencing the `tokio` dependency a lot and it makes sense to
ensure that version is tracked only once across the whole workspace.
Extracted out of #5797.
---------
Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
Closes#5601
It looks like we can hit 100+ Mbps in theory. This covers Wintun, Tokio,
and Windows OS overhead. It doesn't cover the cryptography or anything
in connlib itself.
The code is kinda messy but I'm not sure how to clean it up so I'll just
leave it for review.
This test should fail if there's any regressions in #5598.
It fails if any packet is dropped or if the speed is under 100 Mbps
```[tasklist]
### Tasks
- [x] Use `ip_packet::make`
- [x] Switch to `cargo bench`
- [x] Extract windows ARM PR
- [x] Clean up wintun.dll install code
- [x] Re-request review
```
With the introduction of a routing table in #5786, we can very easily
introduce an additional relay to `tunnel_test`. In production, we are
always given two relays and thus, this mimics the production setup more
closely.
This is a bit of a hack because features should never change behaviour.
Unfortunately, we can't use `cfg(test)` here because the proptests live
in a different crate and thus for the tests, we import the crate using
`cfg(not(test))`.
Our `proptest` feature is really only meant to be activated during
testing so I think this is fine for now.
The benefit is that the test logs are much more terse because proptest
will shrink the IDs to `0`, `1` etc. With the upcoming addition of
multiple gateways and multiple relays, we will have a lot more IDs in
the logs. Thus, it is important that they stay legible.
Currently, `tunnel_test` uses a rather naive approach when dispatching
`Transmit`s. In particular, it checks client, gateway and relay
separately whether they "want" a certain packet. In a real network,
these packets are routed based on their IP.
To mimic something similar, we introduce a `Host` abstraction that wraps
each component: client, gateway and relay. Additionally, we introduce a
`RoutingTable` where we can add and remove hosts. With these things in
place, routing a `Transmit` is as easy as looking up the destination IP
in the routing table and dispatching to the corresponding host.
Our hosts are type-safe: client, gateway and relay have different types.
Thus, we abstract over them using a `HostId` in order to know, which
host a certain message is for. Following these patches, we can easily
introduce multiple gateways and relays to this test by simply making
more entries in this routing table. This will increase the test coverage
of connlib.
Lastly, this patch massively increases the performance of `tunnel_test`.
It turns out that previously, we spent a lot of CPU cycles accessing
"random" IPs from very large iterators. With this patch, we take a
limited range of 100 IPs that we sample from, thus drastically
increasing performance of this test. The configured 1000 testcases
execute in 3s on my machine now (with opt-level 1 which is what we use
in CI).
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Closes#5052
On my dev VMs:
- systemd-resolved = 15 ms to flush
- Windows = 600 ms to flush
I tested with the headless Clients on Linux and Windows and it fixes the
issue. On Windows I didn't replicate the issue with the GUI Client, on
Linux this patch also fixes it for the GUI Client.
The [HTTP 1.1 RFC](https://datatracker.ietf.org/doc/html/rfc2616) states
that HTTP headers should be US-ASCII. This is not the case when the
macOS Client is run from a host that has a non-English language selected
as its system default due to the way we build the user agent.
This PR fixes that by normalizing how we build the user agent by more
granularly selecting which fields compose it, and not just relying on
OS-provided version strings that may contain non-ASCII characters.
fixes https://github.com/firezone/firezone/issues/5467
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Currently, connlib re-initialises the TUN device on Linux every time its
configuration gets updated such as when roaming from one network to
another. This is unnecessary. Instead, we can adopt the same approach as
already used on MacOS, iOS and Windows and only initialise it if it
doesn't exist yet.
Doing so surfaces an interesting bug. Currently, attempting to
re-initialise the TUN device fails with a warning:
> connlib_client_shared::eventloop: Failed to set interface on tunnel:
Resource busy (os error 16)
See
https://github.com/firezone/firezone/actions/runs/9656570163/job/26634409346#step:7:103
for an example. As a consequence, we never actually trigger the
`on_set_interface_config` callback and thus never actually set the new
IPs on the TUN device.
Now that we _are_ calling this callback, we execute
`TunDeviceManager::set_ips` which first clears all IPs from the device
and then attaches the new ones. A consequence of this is that the Linux
kernel will clear all routes associated with the device. This clashes
with an optimisation we have in `TunDeviceManager` where we remember the
previously set routes and don't set new ones if they are the same.
This `HashSet` needs to be cleared upon setting new IPs in order to
actually set the new routes correctly afterwards. Without that, we stop
receiving traffic on the TUN device.
Logging the routes in the span and in an event creates duplicate
information so we remove the former. Additionally, we add a debug log in
case we short-circuit the function.
All allowed IPs can be a fair few which clutters the log. Remove the
`HashSet` from the error and also remove the stuttering; the error
already says "Packet not allowed".
This PR is the "client-side" of things for #4994. Up until now, when a
user wanted to connect to a DNS resource, we would establish a
connection to the gateway and pass along the domain we are trying to
access. The gateway would resolve that domain and send the response back
to the client, allowing them to finally send a DNS response.
Now, we instantly assign and respond with 4x A and 4x AAAA records to
any query for one of our DNS resources. Upon the first IP packet for one
of these "proxy IPs", we select a gateway, establish a connection and
send our proxy IPs along. The gateway then performs the necessary
mangling and NATing of all packets. See #5354 for details.
Resolves: #4994.
Resolves: #5491.
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Closes#5481
With this, I can connect to the staging portal without a build.rs or any
extra env var setup
<img width="387" alt="image"
src="https://github.com/firezone/firezone/assets/13400041/9c080b36-3a76-49c7-b706-20723697edc7">
```[tasklist]
### Next steps
- [x] Split out a refactor PR for `ConnectArgs` (#5488)
- [x] Try doing this for other Clients
- [x] Check Gateway
- [x] Check Tauri Client
- [x] Change to `app_version`
- [x] Open for review
- [ ] Use `option_env` so that `FIREZONE_PACKAGE_VERSION` can still override the Cargo.toml version for local testing
- [ ] Check Android Client
- [ ] Check Apple Client
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
When a user sends the first packet to a resource, we generate a
"connection intent" and consult the portal, which gateway to use for
this resource. This process is throttled to only generate a new intent
every 2s.
Once we know, which gateway to use for a certain resource, we initiate a
connection via snownet. This involves an OFFER-ANSWER handshake with the
gateway. A connection for which we have sent an offer and have not yet
received an answer is what we call a "pending connection".
In case the connection setup takes longer than 2s, we will generate
another connection intent which can point to the same gateway that we
are currently setting up a connection with.
Currently, encountering a "pending connection" during another connection
setup is treated as an error which results in some state being
cleaned-up / removed. This is where the bug surfaces: If we remove the
state for a resource as a result of a 2nd connection intent and then
receive the response of the first one, we will be left with no state
that knows about this resource.
We fix this by refactoring `create_or_reuse_connection` to be atomic in
regards to its state changes: All checks that fail the function are
moved to the top which means there is no state to clean up in case of an
error. Additionally, we model the case of a "pending connection" using
an `Option` to not flood the logs with "pending connection" warnings as
those are expected during normal operation.
Fixes: #5385
As part of #4994, the IP translation and mangling of packets to and from
DNS resources is moved to the gateway. This PR represents the
"gateway-half" of the required changes.
Eventually, the client will send a list of proxy IPs that it assigned
for a certain DNS resource. The gateway assigns each proxy IP to a real
IP and mangles outgoing and incoming traffic accordingly. There are a
number of things that we need to take care of as part of that:
- We need to implement NAT to correctly route traffic. Our NAT table
maps from source port* and destination IP to an assigned port* and real
IP. We say port* because that is only true for UDP and TCP. For ICMP, we
use the identifier.
- We need to translate between IPv4 and IPv6 in case a DNS resource e.g.
only resolves to IPv6 addresses but the client gave out an IPv4 proxy
address to the application. This translation is was added in #5364 and
is now being used here.
This PR is backwards-compatible because currently, clients don't send
any IPs to the gateway. No proxy IPs means we cannot do any translation
and thus, packets are simply routed through as is which is what the
current clients expect.
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Currently, we use `sample::Index` and `sample::Selector` to
deterministically select parts of our state. Originally, this was done
because I did not yet fully understand, how `proptest-state-machine`
works.
The available transitions are always sampled from the current state,
meaning we can directly use `sample::select` to pick an element like an
IP address from a list. This has several advantages:
- The transitions are more readable when debug-printed because they now
contain the actual data that is being used.
- I _think_ this results in better shrinking because `sample::select`
will perform a binary search for the problematic value.
- We can more easily implement transitions that _remove_ state.
Currently, we cannot remove things from the `ReferenceState` because the
system-under-test would also have to index into the `ReferenceState` as
part of executing its transition. By directly embedding all necessary
information in the transition, this is much simpler.
- Removes version numbers from infra components (elixir/relay)
- Removes version bumping from Rust workspace members that don't get
published
- Splits release publishing into `gateway-`, `headless-client-`, and
`gui-client-`
- Removes auto-deploying new infrastructure when a release is published.
Use the Deploy Production workflow instead.
Fixes#4397
To make proptests efficient, it is important to generate the set of
possible test cases algorithmically instead of filtering through
randomly generated values.
This PR makes the strategies for upstream DNS servers and IP networks
more efficient by removing the filtering.
Currently, `tunnel_test` only tests DNS resources with fully-qualified
domain names. Firezone also supports wildcard domains in the forms of
`*.example.com` and `?.example.com`.
To include these in the tests, we generate a bunch of DNS records that
include various subdomains for such wildcard DNS resources.
When sampling DNS queries, we already take them from the pool of global
DNS records which now also includes these subdomains, thus nothing else
needed to be changed to support testing these resources.
Refs #3636 (This pays down some of the technical debt from Linux DNS)
Refs #4473 (This partially fulfills it)
Refs #5068 (This is needed to make `FIREZONE_DNS_CONTROL` mandatory)
As of dd6421:
- On both Linux and Windows, DNS control and IP setting (i.e.
`on_set_interface_config`) both move to the Client
- On Windows, route setting stays in `tun_windows.rs`. Route setting in
Windows requires us to know the interface index, which we don't know in
the Client code. If we could pass opaque platform-specific data between
the tunnel and the Client it would be easy.
- On Linux, route setting moves to the Client and Gateway, which
completely removes the `worker` task in `tun_linux.rs`
- Notifying systemd that we're ready moves up to the headless Client /
IPC service
```[tasklist]
### Before merging / notes
- [x] Does DNS roaming work on Linux on `main`? I don't see where it hooks up. I think I only set up DNS in `Tun::new` (Yes, the `Tun` gets recreated every time we reconfigure the device)
- [x] Fix Windows Clients
- [x] Fix Gateway
- [x] Make sure connlib doesn't get the DNS control method from the env var (will be fixed in #5068)
- [x] De-dupe consts
- [ ] ~~Add DNS control test~~ (failed)
- [ ] Smoke test Linux
- [ ] Smoke test Windows
```
Currently, `tunnel_test` only sends ICMPs to CIDR resources. We also
want to test certain properties in regards to DNS resources. In
particular, we want to test:
- Given a DNS resource, can we query it for an IP?
- Can we send an ICMP packet to the resolved IP?
- Is the mapping of proxy IP to upstream IP stable?
To achieve this, we sample a list of `IpAddr` whenever we add a DNS
resource to the state. We also add the transition
`SendQueryToDnsResource`. As the name suggests, this one simulates a DNS
query coming from the system for one of our resources. We simulate A and
AAAA queries and take note of the addresses that connlib returns to us
for the queries.
Lastly, as part of `SendICMPPacketToResource`, we now may also sample
from a list of IPs that connlib gave us for a domain and send an ICMP
packet to that one.
There is one caveat in this test that I'd like to point out: At the
moment, the exact mapping of proxy IP to real IP is an implementation
detail of connlib. As a result, I don't know which proxy IP I need to
use in order to ping a particular "real" IP. This presents an issue in
the assertions: Upon the first ICMP packet, I cannot assert what the
expected destination is. Instead, I need to "remember" it. In case we
send another ICMP packet to the same resource and happen to sample the
same proxy IP, we can then assert that the mapping did not change.
This is similar to #4097 and #4585 but for the entire `ClientState` and
`GatewayState`. We also do it in the context of a property-based test
with the vision that we can deterministically explore a large space of
state transitions and see where our main property breaks: Being able to
send an ICMP packet from the client to the gateway.
In other words, we now correctly pass all the `Transmit`s back and forth
between the components as if they would receive it from the network. Due
to the nature of property-based tests, this already exercises a very
large input space. For example, if the client does not have an IPv6
socket and the gateway doesn't have an IPv4 socket, this test already
checks whether we then correctly fall back to using a relay (because the
allocation we make on the relay is the only network path where the STUN
requests pass through).
What this does not (yet) do is set up a proper network topology. The
`dispatch_transmit` function will happily "route" a `Transmit` from e.g.
the client to the gateway even if they are in different subnets. In
other words, these tests assume that the actual network itself works and
we can exchange UDP packets between the components.
For now, we only send ICMPs to CIDR resources. As a next step, we can
extend this to DNS resources by sending DNS queries for our DNS
resources and then sending an ICMP to the resolved IP.