This reverts commit d8f25f9bf8.
#6506 broke the Clients and I guess I didn't do any manual smoke test,
so I didn't catch it.
I have leads for a permanent fix in #6551 but I don't want to leave
`main` broken since it will screw up bisects.
Supersedes #5913
This required a big refactor because `HANDLE` is no longer `Send` and
was never supposed to be.
So we add a worker thread for listening to DNS changes, since that
requires us to hold a `HANDLE` across `await` points and I couldn't find
any simpler way to do it.
I could add integration tests for this in a future PR that prove the
notifiers work by poking the registry or setting DNS servers and seeing
if we pick up the changes on time. But setting DNS servers without the
tunnel up may be tricky, so I left it out of scope for this PR.
```[tasklist]
- [x] Fix force-kill bug
```
The output of `git describe` always refers to the last tag that it can
find. This leads to confusing versions being printed such as:
```
2024-08-19T00:24:08.983891Z INFO firezone_headless_client: arch="x86_64" git_version="gateway-1.1.5-30-gf82fee162-modified"
```
Note that this is code running in the headless-client and it refers to
the gateway tag. Whilst not wrong from git's PoV, it is certainly
confusing.
We can fix this by providing a glob-pattern to `git describe` via
`--match`. This makes git ignore any other tags and print a version
identifier that refers to the current program:
```
2024-08-19T00:39:48.634191Z INFO firezone_headless_client: arch="x86_64" git_version="headless-client-1.1.7-31-ga08a3411d-modified"
```
Parsing the current Git version within `firezone-bin-shared` means this
crate (and all its dependents) need to be rebuilt everytime one makes a
commit, even if none of the code actually changes.
To avoid this whilst still allowing `firezone-bin-shared` to export a
useful, shared function, we export a macro instead that can be called
from the respective crates that need the GIT version. This means only
those binaries will be marked as dirty and rebuilds of e.g. unit tests
don't need to rebuild these workspace crates.
Now that we have the `bin-shared` crate, it is easy to move the
health-check functionality into there. That allows us to get rid of a
crate which makes navigating the workspace a bit easier.
Setting up a logger is something that pretty much every entrypoint needs
to do, be it a test, a shared library embedded in another app or a
standalone application. Thus, it makes sense to introduce a dedicated
crate that allows us to bundle all the things together, how we want to
do logging.
This allows us to introduce convenience functions like
`firezone_logging::test` which allow you to construct a logger for a
test as a one-liner.
Crucially though, introducing `firezone-logging` gives us a place to
store a default log directive that silences very noisy crates. When
looking into a problem, it is common to start by simply setting the
log-filter to `debug`. Without further action, this floods the output
with logs from crates like `netlink_proto` on Linux. It is very unlikely
that those are the logs that you want to see. Without a preset filter,
the only alternative here is to explicitly turn off the log filter for
`netlink_proto` by typing something like
`RUST_LOG=netlink_proto=off,debug`. Especially when debugging issues
with customers, this is annoying.
Log filters can be overridden, i.e. a 2nd filter that matches the exact
same scope overrides a previous one. Thus, with this design it is still
possible to activate certain logs at runtime, even if they have silenced
by default.
I'd expect `firezone-logging` to attract more functionality in the
future. For example, we want to support re-loading of log-filters on
other platforms. Additionally, where logs get stored could also be
defined in this crate.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
The `firezone-bin-shared` crate is meant to house non-tunnel related
things. That allows it to compile in parallel to everything else. It
currently only depends on `connlib-shared` to access the `DEFAULT_MTU`
constant. We can remove that by requiring the MTU as a ctor parameter of
`TunDeviceManager`.
A longer write-up of the intended dependency structure is in #4470.
When forwarding DNS queries, we need to remember the original source
socket in order to send the response back. Previously, this mapping was
indexed by the DNS query ID. As it turns out, at least Windows doesn't
have a global DNS query ID counter and may reuse them across different
DNS servers. If that happens and two of these queries overlap, then we
match the wrong responses together.
In the best case, this produces bad DNS results on the client. In the
worst case, those queries were for DNS servers with different IP
versions in which case we triggered a panic in connlib further down the
stack where we created the IP packet for the response.
To fix this, we first and foremost remove the explicit `panic!` from the
`make::` functions in `ip-packet`. Originally, these functions were only
used in tests but we started to use them in production code too and
unfortunately forgot about this panic. By introducing a `Result`, all
call-sites are made aware that this can fail.
Second, we fix the actual indexing into the data structure for forwarded
DNS queries to also include the DNS server's socket. This ensures we
don't treat the DNS query IDs as globally unique.
Third, we replace the panicking path in
`try_handle_forwarded_dns_response` with a log statement, meaning if the
above assumption turns out wrong for some reason, we still don't panic
and simply don't handle the packet.
Currently, `connlib` depends on `hickory-resolver` to perform DNS
queries for non-resources. This is unnecessary. Instead of buffering the
original UDP DNS query, consulting hickory to resolve the name and
mapping the response back, we can simply take the UDP payload and send
it via our protected socket directly to the original upstream DNS
server.
This ensures `connlib` is as transparent as possible for DNS queries for
non-resources. Additionally, it removes a lot of error handling and
other cruft that we currently have to perform because we are using
hickory. For example, hickory will automatically retry a DNS query after
a certain timeout. However, the OS / client talking to `connlib` will
also retry after a certain timeout because it is making DNS queries over
an unreliable transport (UDP). It is thus unnecessary for us to do that
internally.
To correctly test this change, our test-suite needed some refactoring.
Specifically, DNS servers are now modelled as dedicated `Host`s that can
receive (UDP) traffic.
Lastly, we can remove our dependency on `hickory-proto` and
`hickory-resolver` everywhere and only use `domain` for parsing DNS
messages.
Resolves: #6141.
Related: #6033.
Related: #4800. (Impossible to happen with this design)
Closes#5063, supersedes #5850
Other refactors and changes made as part of this:
- Adds the ability to disable DNS control on Windows
- Removes the spooky-action-at-a-distance `from_env` functions that used
to be buried in `tunnel`
- `FIREZONE_DNS_CONTROL` is now a regular `clap` argument again
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Closes#5846
Will be moved down to the IPC service eventually.
The goal for connection roaming is not for totally transparent "Change
Wi-Fi networks without dropping SSH" handoffs, but just for Firezone to
re-connect itself as quickly as possible so that everything above us can
re-connect as quickly as it times out, and won't be hung up with a
broken tunnel.
In `connlib`, traffic is sent through sockets via one of three ways:
1. Direct p2p traffic between clients and gateways: For these, we always
explicitly set the source IP (and thus interface).
2. UDP traffic to the relays: For these, we let the OS pick an
appropriate source interface.
3. WebSocket traffic over TCP to the portal: For this too, we let the OS
pick the source interface.
For (2) and (3), it is possible to run into routing loops, depending on
the routes that we have configured on the TUN device.
In Linux, we can prevent routing loops by marking a socket [0] and
repeating the mark when we add routes [1]. Packets sent via a marked
socket won't be routed by a rule that contains this mark. On Android, we
can do something similar by "protecting" a socket via a syscall on the
Java side [2].
On Windows, routing works slightly different. There, the source
interface is determined based on a computed metric [3] [4]. To prevent
routing loops on Windows, we thus need to find the "next best" interface
after our TUN interface. We can achieve this with a combination of
several syscalls:
1. List all interfaces on the machine
2. Ask Windows for the best route on each interface, except our TUN
interface.
3. Sort by Windows' routing metric and pick the lowest one (lower is
better).
Thanks to the abstraction of `SocketFactory` that we already previously
introduced, Integrating this into `connlib` isn't too difficult:
1. For TCP sockets, we simply resolve the best route after creating the
socket and then bind it to that local interface. That way, all packets
will always going via that interface, regardless of which routes are
present on our TUN interface.
2. UDP is connection-less so we need to decide per-packet, which
interface to use. "Pick the best interface for me" is modelled in
`connlib` via the `DatagramOut::src` field being `None`.
- To ensure those packets don't cause a routing loop, we introduce a
"source IP resolver" for our `UdpSocket`. This function gets called
every time we need to send a packet without a source IP.
- For improved performance, we cache these results. The Windows client
uses this source IP resolver to use the above devised strategy to find a
suitable source IP.
- In case the source IP resolution fails, we don't send the packet. This
is important, otherwise, the kernel might choose our TUN interface again
and trigger a routing loop.
The last remark to make here is that this also works for connection
roaming. The TCP socket gets thrown away when we reconnect to the
portal. Thus, the new socket will pick the new best interface as it is
re-created. The UDP sockets also get thrown away as part of roaming.
That clears the above cache which is what we want: Upon roaming, the
best interface for a given destination IP will likely have changed.
[0]:
59014a9622/rust/headless-client/src/linux.rs (L19-L29)
[1]:
59014a9622/rust/bin-shared/src/tun_device_manager/linux.rs (L204-L224)
[2]:
59014a9622/rust/connlib/clients/android/src/lib.rs (L535-L549)
[3]:
https://learn.microsoft.com/en-us/previous-versions/technet-magazine/cc137807(v=msdn.10)?redirectedfrom=MSDN
[4]:
https://learn.microsoft.com/en-us/windows-server/networking/technologies/network-subsystem/net-sub-interface-metricFixes: #5955.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Windows may delete the default route during roaming. To prevent this
from causing problems, we make `set_routes` add all routes regardless of
the previously stored ones. The known routes are only used to compute,
what routes are to be removed.
For Linux we do the same to make it consistent across platforms.
This also give us the chance to not clear the cache when ips are set,
since now all routes are always added, meaning they will be always
re-added when roaming.
Overall, this more closely aligns Linux and Windows with how Firezone
works on Apple and Android. There, we always remove all routes and set
new ones. Removing routes happens very rarely (only when CIDR resources
are deactivated), thus, not removing all and re-adding the routes is
still deemed to be worth it.
With the new implementation, this is guaranteed to always make the new
routes take effect and at the same time be idempotent.
---------
Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Note that for GUI Clients, listening is still done by the GUI process,
not the IPC service.
Yak shave towards #5846. This allows for faster dev cycles since I won't
have to compile all the GUI stuff.
Some changes in here were extracted from other draft PRs.
Changes:
- Remove `thiserror` that was never matched on
- Don't return the DNS resolvers from the notifier directly, just send a
notification and allow the caller to check the resolvers itself if
needed
- Rename `DnsListener` to `DnsNotifier`
- Rename `Worker` to `NetworkNotifier`
- remove `unwrap_or_default` when getting resolvers. I don't know why
it's there, if there's a good reason then it should be handled inside
the function, not in the caller
```[tasklist]
### Tasks
- [x] Rename `*Listener` to `*Notifier`
- [x] (not needed) ~~Support `/etc/resolv.conf` DNS control method too?~~
```
Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.8.0 to 1.10.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/uuid-rs/uuid/releases">uuid's
releases</a>.</em></p>
<blockquote>
<h2>1.10.0</h2>
<h2>Deprecations</h2>
<p>This release deprecates and renames the following functions:</p>
<ul>
<li><code>Builder::from_rfc4122_timestamp</code> ->
<code>Builder::from_gregorian_timestamp</code></li>
<li><code>Builder::from_sorted_rfc4122_timestamp</code> ->
<code>Builder::from_sorted_gregorian_timestamp</code></li>
<li><code>Timestamp::from_rfc4122</code> ->
<code>Timestamp::from_gregorian</code></li>
<li><code>Timestamp::to_rfc4122</code> ->
<code>Timestamp::to_gregorian</code></li>
</ul>
<h2>What's Changed</h2>
<ul>
<li>Use const identifier in uuid macro by <a
href="https://github.com/Vrajs16"><code>@Vrajs16</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/764">uuid-rs/uuid#764</a></li>
<li>Rename most methods referring to RFC4122 by <a
href="https://github.com/Mikopet"><code>@Mikopet</code></a> / <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/765">uuid-rs/uuid#765</a></li>
<li>prepare for 1.10.0 release by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/766">uuid-rs/uuid#766</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/Vrajs16"><code>@Vrajs16</code></a> made
their first contribution in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/764">uuid-rs/uuid#764</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/uuid-rs/uuid/compare/1.9.1...1.10.0">https://github.com/uuid-rs/uuid/compare/1.9.1...1.10.0</a></p>
<h2>1.9.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Add an example of generating bulk v7 UUIDs by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/761">uuid-rs/uuid#761</a></li>
<li>Avoid taking the shared lock when getting usable bits in
Uuid::now_v7 by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/762">uuid-rs/uuid#762</a></li>
<li>Prepare for 1.9.1 release by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/763">uuid-rs/uuid#763</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/uuid-rs/uuid/compare/1.9.0...1.9.1">https://github.com/uuid-rs/uuid/compare/1.9.0...1.9.1</a></p>
<h2>1.9.0</h2>
<h2><code>Uuid::now_v7()</code> is guaranteed to be monotonic</h2>
<p>Before this release, <code>Uuid::now_v7()</code> would only use the
millisecond-precision timestamp for ordering. It now also uses a global
42-bit counter that's re-initialized each millisecond so that the
following will always pass:</p>
<pre lang="rust"><code>let a = Uuid::now_v7();
let b = Uuid::now_v7();
<p>assert!(a < b);<br />
</code></pre></p>
<h2>What's Changed</h2>
<ul>
<li>Add a get_node_id method for v1 and v6 UUIDs by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/748">uuid-rs/uuid#748</a></li>
<li>Update atomic and zerocopy to latest by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/750">uuid-rs/uuid#750</a></li>
<li>Add repository field to uuid-macro-internal crate by <a
href="https://github.com/paolobarbolini"><code>@paolobarbolini</code></a>
in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/752">uuid-rs/uuid#752</a></li>
<li>update docs to updated RFC (from 4122 to 9562) by <a
href="https://github.com/Mikopet"><code>@Mikopet</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/753">uuid-rs/uuid#753</a></li>
<li>Support counters in v7 UUIDs by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/755">uuid-rs/uuid#755</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/paolobarbolini"><code>@paolobarbolini</code></a>
made their first contribution in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/752">uuid-rs/uuid#752</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4b4c590ae3"><code>4b4c590</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/766">#766</a> from
uuid-rs/cargo/1.10.0</li>
<li><a
href="68eff32640"><code>68eff32</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/765">#765</a> from
uuid-rs/chore/time-fn-deprecations</li>
<li><a
href="3d5384da4b"><code>3d5384d</code></a>
update docs and deprecation messages for timestamp fns</li>
<li><a
href="de50f2091f"><code>de50f20</code></a>
renaming rfc4122 functions</li>
<li><a
href="4a8841792a"><code>4a88417</code></a>
prepare for 1.10.0 release</li>
<li><a
href="66b4fcef14"><code>66b4fce</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/764">#764</a> from
Vrajs16/main</li>
<li><a
href="8896e26c42"><code>8896e26</code></a>
Use expr instead of ident</li>
<li><a
href="09973d6aff"><code>09973d6</code></a>
Added changes</li>
<li><a
href="6edf3e8cd5"><code>6edf3e8</code></a>
Use const identifer in uuid macro</li>
<li><a
href="36e6f573aa"><code>36e6f57</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/763">#763</a> from
uuid-rs/cargo/1.9.1</li>
<li>Additional commits viewable in <a
href="https://github.com/uuid-rs/uuid/compare/1.8.0...1.10.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The different implementations of `Tun` are the last platform-specific
code within `firezone-tunnel`. By introducing a dedicated crate and a
`Tun` trait, we can move this code into (platform-specific) leaf crates:
- `connlib-client-android`
- `connlib-client-apple`
- `firezone-bin-shared`
Related: #4473.
---------
Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
Currently, only connlib's UDP sockets for sending and receiving STUN &
WireGuard traffic are protected from routing loops. This is was done via
the `Sockets::with_protect` function. Connlib has additional sockets
though:
- A TCP socket to the portal.
- UDP & TCP sockets for DNS resolution via hickory.
Both of these can incur routing loops on certain platforms which becomes
evident as we try to implement #2667.
To fix this, we generalise the idea of "protecting" a socket via a
`SocketFactory` abstraction. By allowing the different platforms to
provide a specialised `SocketFactory`, anything Linux-based can give
special treatment to the socket before handing it to connlib.
As an additional benefit, this allows us to remove the `Sockets`
abstraction from connlib's API again because we can now initialise it
internally via the provided `SocketFactory` for UDP sockets.
---------
Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Connlib's routing logic and networking code is entirely platform
agnostic. The only platform-specific bit is how we interact with the TUN
device. From connlib's perspective though, all it needs is an interface
for reading and writing. How the device gets initialised and updated is
client-business.
For the most part, this is the same on all platforms: We call callbacks
and the client updates the state accordingly. The only annoying bit here
is that Android recreates the TUN interface on every update and thus our
old file descriptor is invalid. The current design works around this by
returning the new file descriptor on Android. This is a problematic
design for several reasons:
- It forces the callback handler to finish synchronously, and halting
connlib until this is complete.
- The synchronous nature also means we cannot replace the callbacks with
events as events don't have a return value.
To fix this, we introduce a new `set_tun` method on `Tunnel`. This moves
the business of how the `Tun` device is created up to the client. The
clients are already platform-specific so this makes sense. In a future
iteration, we can move all the various `Tun` implementations all the way
up to the client-specific crates, thus co-locating the platform-specific
code.
Initialising `Tun` from the outside surfaces another issue: The routes
are still set via the `Tun` handle on Windows. To fix this, we introduce
a `make_tun` function on `TunDeviceManager` in order for it to remember
the interface index on Windows and being able to move the setting of
routes to `TunDeviceManager`.
This simplifies several of connlib's APIs which are now infallible.
Resolves: #4473.
---------
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: conectado <gabrielalejandro7@gmail.com>
The `TunDeviceManager` is a component that the leaf-nodes of our
dependency tree need: the binaries. Thus, it is misplaced in the
`connlib-shared` crate which is at the very bottom of the dependency
tree.
This is necessary to allow the `TunDeviceManager` to actually construct
a `Tun` (which currently lives in `firezone-tunnel`).
Related: #5839.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>