Commit Graph

32 Commits

Author SHA1 Message Date
Thomas Eizinger
7213eb823d fix(rust): fallback to CARGO_PKG_VERSION if git is unavailable (#7188)
When building inside a docker container, like we do for the
headless-client and gateway, the `.git` directory is not available.
Thus, determining what our current version is fails and gets reported as
"unknown". We are now also using this for Sentry which is not very
helpful if all errors are categorised under the same version.

In case somebody builds a gateway / client from source, we will have the
full version available. Most users will use our docker containers
though, meaning the version will only always be for a full release.

Resolves: #7184.
2024-10-30 17:42:44 +00:00
Thomas Eizinger
73eebd2c4d refactor(rust): consistently record errors as tracing::Value (#7104)
Our logging library, `tracing` supports structured logging. This is
useful because it preserves the more than just the string representation
of a value and thus allows the active logging backend(s) to capture more
information for a particular value.

In the case of errors, this is especially useful because it allows us to
capture the sources of a particular error.

Unfortunately, recording an error as a tracing value is a bit cumbersome
because `tracing::Value` is only implemented for `&dyn
std::error::Error`. Casting an error to this is quite verbose. To make
it easier, we introduce two utility functions in `firezone-logging`:

- `std_dyn_err`
- `anyhow_dyn_err`

Tracking errors as correct `tracing::Value`s will be especially helpful
once we enable Sentry's `tracing` integration:
https://docs.rs/sentry-tracing/latest/sentry_tracing/#tracking-errors
2024-10-22 04:46:26 +00:00
Reactor Scram
9b93fc2a2c fix(rust/client/windows): set our DNS resolvers on our interface (#6931)
Closes #6777
2024-10-07 15:03:22 +00:00
Thomas Eizinger
4ae29c604c fix(windows): only consider online adapters (#6810)
When deciding which interface we are going to use for connecting to the
portal API, we need to filter through all adapters on Windows and
exclude our own TUN adapter to avoid routing loops. In addition, we also
need to filter for only online adapters, otherwise we might pick one
that is not actually routable.

Resolves: #6802.
2024-09-25 21:19:15 +00:00
Thomas Eizinger
a9f515a453 chore(rust): use #[expect] instead of #[allow] (#6692)
The `expect` attribute is similar to `allow` in that it will silence a
particular lint. In addition to `allow` however, `expect` will fail as
soon as the lint is no longer emitted. This ensures we don't end up with
stale `allow` attributes in our codebase. Additionally, it provides a
way of adding a `reason` to document, why the lint is being suppressed.
2024-09-16 13:51:12 +00:00
Reactor Scram
54b6222722 fix(client/windows): set MTU even if IPv6 is disabled (#6681)
Refs #6547, this fixes a similar error message but it's not the same
exact issue.

When IPv6 is disabled on a system, our call to set the MTU was failing
with error code 0x80070490. This patch allows some of the MTU-related
syscalls to fail with a warning log.

To replicate the issue, run this command to set a registry value to
disable IPv6, then reboot the system:

`reg add
"HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip6\Parameters"
/v DisabledComponents /t REG_DWORD /d 255 /f`

```[tasklist]
- [x] Update changelog
- [x] Apply PR feedback
```

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-09-13 17:43:21 +00:00
Reactor Scram
5a44151bba test(bin-shared): improve network notifier test (#6676)
On Windows, the network notifier always notifies once at startup. We
make the DNS notifier and Linux match this behavior, and we assert it in
the unit test.

Part of a yak shave towards removing Tauri.
2024-09-13 14:53:13 +00:00
Reactor Scram
ece8f7a5b7 test(rust/client): add unit test for DNS and network change notifiers (#6635)
Part of a yak shave towards removing Tauri.

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-09-12 15:59:13 +00:00
Reactor Scram
f507a01f9f fix(windows): prevent routing loops for TCP connections (#6584)
In #6032, we attempted to fix routing loops for Windows and did so
successfully for UDP packets. For TCP sockets, we believed that binding
the socket to an interface is enough to prevent routing loops. This
assumptions is wrong.

> On Windows, a call to bind() affects card selection only incoming
traffic, not outgoing traffic.
>
> Thus, on a client running in a multi-homed system (i.e., more than one
interface card), it's the network stack that selects the card to use,
and it makes its selection based solely on the destination IP, which in
turn is based on the routing table. A call to bind() will not affect the
choice of the card in any way.

On most of our testing machines, this problem didn't surface but it
turns out that on some machines, especially with WiFi cards there is a
conflict between the routes added on the system. In particular, with the
Internet resource active, we also add a catch-all route that we _want_
to have the most priority, i.e. Windows SHOULD send all traffic to our
TUN device. Except for traffic that we generate, like TCP connections to
the portal or UDP packets sent to gateways, relays or DNS servers.

It appears that on some systems, mostly with Ethernet adapters, Windows
picks the "correct" interface for our socket and sends traffic via that
but on other systems, it doesn't. TCP sockets are only used for the
WebSocket connection to the portal. Without that one, Firezone
completely stops working because we can't send any control messages.

To reliably fix this issue, we need to add a dedicated route for the
target IP of each TCP socket that is more specific than the Internet
resource route (`0.0.0.0/0`) but otherwise identical. We do this as part
of creating a new TCP socket. This route is for the _default_ interface
and thus, doesn't get automatically removed when Firezone exits.

We implement a RAII guard that attempts to drop the route on a
best-effort basis. Despite this RAII guard, this route can linger around
in case Firezone is being forced to exit or exits in otherwise unclean
ways. To avoid lingering routes, we always delete all routing table
entries matching the IP of the portal just before we are about to add
one.

Fixes: #6591.

[0]:
https://forums.codeguru.com/showthread.php?487139-Socket-binding-with-routing-table&s=a31637836c1bf7f0bc71c1955e47bdf9&p=1891235#post1891235

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Foo Bar <foo@bar.com>
Co-authored-by: conectado <gabrielalejandro7@gmail.com>
2024-09-05 06:17:28 +00:00
Reactor Scram
df3067a8f8 chore(rust/windows): more detailed error log for wintun::Adapter::create (#6596)
Without this we don't log the `std::io::ErrorKind`, which is useful to
know.

Refs #6547
2024-09-05 02:43:25 +00:00
Thomas Eizinger
e3688a475e refactor(connlib): only buffer 1 unsent packet if socket is busy (#6563)
Currently, we buffer UDP packets whenever the socket is busy and try to
flush them out at a later point. This requires allocations and is tricky
to get right.

In order to solve both of these problems, we refactor `snownet` to
return us an `EncryptedPacket` instead of a `Transmit`. An
`EncryptedPacket` is an indirection-abstraction that can be turned into
a `Transmit` given an `EncryptBuffer`. This combination of types allows
us to hold on to the `EncryptedPacket` (which does not contain any
references itself) in the `io` component whilst we are waiting for the
socket to be ready to send again.

This means we will immediately suspend the event loop in case the socket
is no longer ready for sending and resend the datagram in the
`EncryptBuffer` once we get re-polled.
2024-09-04 16:59:33 +00:00
Reactor Scram
d7810ef9c0 chore(rust/gui-client/windows): update windows to 0.58 (#6565)
Updates `windows` crates to 0.58 without the bug in #6551.

Supersedes #6556.

The bug was calling `try_send()?` on an MPSC channel of capacity 1,
which would bail out of the worker thread if we got 2 DNS change
notifications faster than the controller task / thread could process the
first one.
2024-09-03 04:18:46 +00:00
Reactor Scram
1505b699e5 fix(client/windows): Revert "chore(rust/gui-client/windows): update windows to 0.58 (#6506)" (#6555)
This reverts commit d8f25f9bf8.

#6506 broke the Clients and I guess I didn't do any manual smoke test,
so I didn't catch it.

I have leads for a permanent fix in #6551 but I don't want to leave
`main` broken since it will screw up bisects.
2024-09-02 20:25:10 +00:00
Reactor Scram
d8f25f9bf8 chore(rust/gui-client/windows): update windows to 0.58 (#6506)
Supersedes #5913

This required a big refactor because `HANDLE` is no longer `Send` and
was never supposed to be.

So we add a worker thread for listening to DNS changes, since that
requires us to hold a `HANDLE` across `await` points and I couldn't find
any simpler way to do it.

I could add integration tests for this in a future PR that prove the
notifiers work by poking the registry or setting DNS servers and seeing
if we pick up the changes on time. But setting DNS servers without the
tunnel up may be tricky, so I left it out of scope for this PR.

```[tasklist]
- [x] Fix force-kill bug
```
2024-09-02 18:00:45 +00:00
Thomas Eizinger
4c30d78cda fix: refer to correct tag in git-version (#6334)
The output of `git describe` always refers to the last tag that it can
find. This leads to confusing versions being printed such as:

```
2024-08-19T00:24:08.983891Z  INFO firezone_headless_client: arch="x86_64" git_version="gateway-1.1.5-30-gf82fee162-modified"
```

Note that this is code running in the headless-client and it refers to
the gateway tag. Whilst not wrong from git's PoV, it is certainly
confusing.

We can fix this by providing a glob-pattern to `git describe` via
`--match`. This makes git ignore any other tags and print a version
identifier that refers to the current program:

```
2024-08-19T00:39:48.634191Z  INFO firezone_headless_client: arch="x86_64" git_version="headless-client-1.1.7-31-ga08a3411d-modified"
```
2024-08-19 22:42:15 +00:00
Thomas Eizinger
c51cf096ae build(rust): avoid unnecessary rebuilds (#6321)
Parsing the current Git version within `firezone-bin-shared` means this
crate (and all its dependents) need to be rebuilt everytime one makes a
commit, even if none of the code actually changes.

To avoid this whilst still allowing `firezone-bin-shared` to export a
useful, shared function, we export a macro instead that can be called
from the respective crates that need the GIT version. This means only
those binaries will be marked as dirty and rebuilds of e.g. unit tests
don't need to rebuild these workspace crates.
2024-08-16 15:30:04 +00:00
Thomas Eizinger
0abbf6bba9 refactor(rust): inline http-health-check crate into bin-shared (#6258)
Now that we have the `bin-shared` crate, it is easy to move the
health-check functionality into there. That allows us to get rid of a
crate which makes navigating the workspace a bit easier.
2024-08-12 16:44:52 +00:00
Thomas Eizinger
bed625a312 chore(rust): make logging more ergonomic (#6237)
Setting up a logger is something that pretty much every entrypoint needs
to do, be it a test, a shared library embedded in another app or a
standalone application. Thus, it makes sense to introduce a dedicated
crate that allows us to bundle all the things together, how we want to
do logging.

This allows us to introduce convenience functions like
`firezone_logging::test` which allow you to construct a logger for a
test as a one-liner.

Crucially though, introducing `firezone-logging` gives us a place to
store a default log directive that silences very noisy crates. When
looking into a problem, it is common to start by simply setting the
log-filter to `debug`. Without further action, this floods the output
with logs from crates like `netlink_proto` on Linux. It is very unlikely
that those are the logs that you want to see. Without a preset filter,
the only alternative here is to explicitly turn off the log filter for
`netlink_proto` by typing something like
`RUST_LOG=netlink_proto=off,debug`. Especially when debugging issues
with customers, this is annoying.

Log filters can be overridden, i.e. a 2nd filter that matches the exact
same scope overrides a previous one. Thus, with this design it is still
possible to activate certain logs at runtime, even if they have silenced
by default.

I'd expect `firezone-logging` to attract more functionality in the
future. For example, we want to support re-loading of log-filters on
other platforms. Additionally, where logs get stored could also be
defined in this crate.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-08-10 05:17:03 +00:00
Thomas Eizinger
a87728b791 chore: remove connlib-shared dependency from bin-shared (#6229)
The `firezone-bin-shared` crate is meant to house non-tunnel related
things. That allows it to compile in parallel to everything else. It
currently only depends on `connlib-shared` to access the `DEFAULT_MTU`
constant. We can remove that by requiring the MTU as a ctor parameter of
`TunDeviceManager`.

A longer write-up of the intended dependency structure is in #4470.
2024-08-10 03:58:10 +00:00
Reactor Scram
68d934ee59 refactor(headless-client): remove unnecessary layering (#6211)
Refs #5754

The IPC service is still layered, but moving it around is more difficult
than moving the headless Client.
2024-08-09 14:10:21 +00:00
Reactor Scram
5eb2bba47b feat(headless-client): use systemd-resolved DNS control by default (#6163)
Closes #5063, supersedes #5850 

Other refactors and changes made as part of this:

- Adds the ability to disable DNS control on Windows
- Removes the spooky-action-at-a-distance `from_env` functions that used
to be buried in `tunnel`
- `FIREZONE_DNS_CONTROL` is now a regular `clap` argument again

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-08-06 18:16:51 +00:00
Gabi
5841f297a5 fix(gateway): prevent routing loops (#6096)
In some weird conditions there might be routing loops in the gateway
too, so this fixes it and it doesn't do any harm.

Could be the cause behind [these
logs](https://github.com/firezone/firezone/issues/6067#issuecomment-2259081958)
2024-07-30 22:29:38 +00:00
Reactor Scram
e6cbb5fa8a feat(gui-client/linux): network roaming (#5978)
Closes #5846 

Will be moved down to the IPC service eventually.

The goal for connection roaming is not for totally transparent "Change
Wi-Fi networks without dropping SSH" handoffs, but just for Firezone to
re-connect itself as quickly as possible so that everything above us can
re-connect as quickly as it times out, and won't be hung up with a
broken tunnel.
2024-07-30 16:01:45 +00:00
Gabi
c3a45f53df fix(connlib): prevent routing loops on windows (#6032)
In `connlib`, traffic is sent through sockets via one of three ways:

1. Direct p2p traffic between clients and gateways: For these, we always
explicitly set the source IP (and thus interface).
2. UDP traffic to the relays: For these, we let the OS pick an
appropriate source interface.
3. WebSocket traffic over TCP to the portal: For this too, we let the OS
pick the source interface.

For (2) and (3), it is possible to run into routing loops, depending on
the routes that we have configured on the TUN device.

In Linux, we can prevent routing loops by marking a socket [0] and
repeating the mark when we add routes [1]. Packets sent via a marked
socket won't be routed by a rule that contains this mark. On Android, we
can do something similar by "protecting" a socket via a syscall on the
Java side [2].

On Windows, routing works slightly different. There, the source
interface is determined based on a computed metric [3] [4]. To prevent
routing loops on Windows, we thus need to find the "next best" interface
after our TUN interface. We can achieve this with a combination of
several syscalls:

1. List all interfaces on the machine
2. Ask Windows for the best route on each interface, except our TUN
interface.
3. Sort by Windows' routing metric and pick the lowest one (lower is
better).

Thanks to the abstraction of `SocketFactory` that we already previously
introduced, Integrating this into `connlib` isn't too difficult:

1. For TCP sockets, we simply resolve the best route after creating the
socket and then bind it to that local interface. That way, all packets
will always going via that interface, regardless of which routes are
present on our TUN interface.
2. UDP is connection-less so we need to decide per-packet, which
interface to use. "Pick the best interface for me" is modelled in
`connlib` via the `DatagramOut::src` field being `None`.
- To ensure those packets don't cause a routing loop, we introduce a
"source IP resolver" for our `UdpSocket`. This function gets called
every time we need to send a packet without a source IP.
- For improved performance, we cache these results. The Windows client
uses this source IP resolver to use the above devised strategy to find a
suitable source IP.
- In case the source IP resolution fails, we don't send the packet. This
is important, otherwise, the kernel might choose our TUN interface again
and trigger a routing loop.

The last remark to make here is that this also works for connection
roaming. The TCP socket gets thrown away when we reconnect to the
portal. Thus, the new socket will pick the new best interface as it is
re-created. The UDP sockets also get thrown away as part of roaming.
That clears the above cache which is what we want: Upon roaming, the
best interface for a given destination IP will likely have changed.

[0]:
59014a9622/rust/headless-client/src/linux.rs (L19-L29)
[1]:
59014a9622/rust/bin-shared/src/tun_device_manager/linux.rs (L204-L224)
[2]:
59014a9622/rust/connlib/clients/android/src/lib.rs (L535-L549)
[3]:
https://learn.microsoft.com/en-us/previous-versions/technet-magazine/cc137807(v=msdn.10)?redirectedfrom=MSDN
[4]:
https://learn.microsoft.com/en-us/windows-server/networking/technologies/network-subsystem/net-sub-interface-metric

Fixes: #5955.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-07-29 22:25:42 +00:00
Reactor Scram
05e3a38701 refactor(bin-shared): remove CommonArgs (#6068)
Closes #6025

It was only used in the Gateway, so we inline it there and remove `clap`
as a dep for ~~that crate~~ `bin-shared`
2024-07-26 21:48:09 +00:00
Gabi
a39b853bc1 fix(windows,linux): ensure set_routes is idempotent (#6051)
Windows may delete the default route during roaming. To prevent this
from causing problems, we make `set_routes` add all routes regardless of
the previously stored ones. The known routes are only used to compute,
what routes are to be removed.

For Linux we do the same to make it consistent across platforms.

This also give us the chance to not clear the cache when ips are set,
since now all routes are always added, meaning they will be always
re-added when roaming.

Overall, this more closely aligns Linux and Windows with how Firezone
works on Apple and Android. There, we always remove all routes and set
new ones. Removing routes happens very rarely (only when CIDR resources
are deactivated), thus, not removing all and re-adding the routes is
still deemed to be worth it.

With the new implementation, this is guaranteed to always make the new
routes take effect and at the same time be idempotent.

---------

Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-07-26 05:13:58 +00:00
Reactor Scram
cc1478adc2 feat(headless-client/windows): add DNS change / network change listening to the Headless Client (#6022)
Note that for GUI Clients, listening is still done by the GUI process,
not the IPC service.

Yak shave towards #5846. This allows for faster dev cycles since I won't
have to compile all the GUI stuff.

Some changes in here were extracted from other draft PRs.

Changes:
- Remove `thiserror` that was never matched on
- Don't return the DNS resolvers from the notifier directly, just send a
notification and allow the caller to check the resolvers itself if
needed
- Rename `DnsListener` to `DnsNotifier`
- Rename `Worker` to `NetworkNotifier`
- remove `unwrap_or_default` when getting resolvers. I don't know why
it's there, if there's a good reason then it should be handled inside
the function, not in the caller

```[tasklist]
### Tasks
- [x] Rename `*Listener` to `*Notifier`
- [x] (not needed) ~~Support `/etc/resolv.conf` DNS control method too?~~
```
2024-07-25 15:45:22 +00:00
Reactor Scram
82b8de4c9c refactor(client/windows): de-dupe wintun.dll (#6020)
Closes #5977

Refactored some other stuff to make this work

Also removed a redundant impl of `ensure_dll` in a benchmark
2024-07-25 14:28:35 +00:00
Thomas Eizinger
50d6b865a1 refactor(connlib): move Tun implementations out of firezone-tunnel (#5903)
The different implementations of `Tun` are the last platform-specific
code within `firezone-tunnel`. By introducing a dedicated crate and a
`Tun` trait, we can move this code into (platform-specific) leaf crates:

- `connlib-client-android`
- `connlib-client-apple`
- `firezone-bin-shared`

Related: #4473.

---------

Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
2024-07-24 01:10:50 +00:00
Gabi
5b0aaa6f81 fix(connlib): protect all sockets from routing loops (#5797)
Currently, only connlib's UDP sockets for sending and receiving STUN &
WireGuard traffic are protected from routing loops. This is was done via
the `Sockets::with_protect` function. Connlib has additional sockets
though:

- A TCP socket to the portal.
- UDP & TCP sockets for DNS resolution via hickory.

Both of these can incur routing loops on certain platforms which becomes
evident as we try to implement #2667.

To fix this, we generalise the idea of "protecting" a socket via a
`SocketFactory` abstraction. By allowing the different platforms to
provide a specialised `SocketFactory`, anything Linux-based can give
special treatment to the socket before handing it to connlib.

As an additional benefit, this allows us to remove the `Sockets`
abstraction from connlib's API again because we can now initialise it
internally via the provided `SocketFactory` for UDP sockets.

---------

Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-07-16 00:40:05 +00:00
Thomas Eizinger
a4a8221b8b refactor(connlib): explicitly initialise Tun (#5839)
Connlib's routing logic and networking code is entirely platform
agnostic. The only platform-specific bit is how we interact with the TUN
device. From connlib's perspective though, all it needs is an interface
for reading and writing. How the device gets initialised and updated is
client-business.

For the most part, this is the same on all platforms: We call callbacks
and the client updates the state accordingly. The only annoying bit here
is that Android recreates the TUN interface on every update and thus our
old file descriptor is invalid. The current design works around this by
returning the new file descriptor on Android. This is a problematic
design for several reasons:

- It forces the callback handler to finish synchronously, and halting
connlib until this is complete.
- The synchronous nature also means we cannot replace the callbacks with
events as events don't have a return value.

To fix this, we introduce a new `set_tun` method on `Tunnel`. This moves
the business of how the `Tun` device is created up to the client. The
clients are already platform-specific so this makes sense. In a future
iteration, we can move all the various `Tun` implementations all the way
up to the client-specific crates, thus co-locating the platform-specific
code.

Initialising `Tun` from the outside surfaces another issue: The routes
are still set via the `Tun` handle on Windows. To fix this, we introduce
a `make_tun` function on `TunDeviceManager` in order for it to remember
the interface index on Windows and being able to move the setting of
routes to `TunDeviceManager`.

This simplifies several of connlib's APIs which are now infallible.

Resolves: #4473.

---------

Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: conectado <gabrielalejandro7@gmail.com>
2024-07-12 23:54:15 +00:00
Thomas Eizinger
960ce80680 refactor(connlib): move TunDeviceManager into firezone-bin-shared (#5843)
The `TunDeviceManager` is a component that the leaf-nodes of our
dependency tree need: the binaries. Thus, it is misplaced in the
`connlib-shared` crate which is at the very bottom of the dependency
tree.

This is necessary to allow the `TunDeviceManager` to actually construct
a `Tun` (which currently lives in `firezone-tunnel`).

Related: #5839.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-07-11 23:42:33 +00:00