Currently, only connlib's UDP sockets for sending and receiving STUN &
WireGuard traffic are protected from routing loops. This is was done via
the `Sockets::with_protect` function. Connlib has additional sockets
though:
- A TCP socket to the portal.
- UDP & TCP sockets for DNS resolution via hickory.
Both of these can incur routing loops on certain platforms which becomes
evident as we try to implement #2667.
To fix this, we generalise the idea of "protecting" a socket via a
`SocketFactory` abstraction. By allowing the different platforms to
provide a specialised `SocketFactory`, anything Linux-based can give
special treatment to the socket before handing it to connlib.
As an additional benefit, this allows us to remove the `Sockets`
abstraction from connlib's API again because we can now initialise it
internally via the provided `SocketFactory` for UDP sockets.
---------
Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Connlib's routing logic and networking code is entirely platform
agnostic. The only platform-specific bit is how we interact with the TUN
device. From connlib's perspective though, all it needs is an interface
for reading and writing. How the device gets initialised and updated is
client-business.
For the most part, this is the same on all platforms: We call callbacks
and the client updates the state accordingly. The only annoying bit here
is that Android recreates the TUN interface on every update and thus our
old file descriptor is invalid. The current design works around this by
returning the new file descriptor on Android. This is a problematic
design for several reasons:
- It forces the callback handler to finish synchronously, and halting
connlib until this is complete.
- The synchronous nature also means we cannot replace the callbacks with
events as events don't have a return value.
To fix this, we introduce a new `set_tun` method on `Tunnel`. This moves
the business of how the `Tun` device is created up to the client. The
clients are already platform-specific so this makes sense. In a future
iteration, we can move all the various `Tun` implementations all the way
up to the client-specific crates, thus co-locating the platform-specific
code.
Initialising `Tun` from the outside surfaces another issue: The routes
are still set via the `Tun` handle on Windows. To fix this, we introduce
a `make_tun` function on `TunDeviceManager` in order for it to remember
the interface index on Windows and being able to move the setting of
routes to `TunDeviceManager`.
This simplifies several of connlib's APIs which are now infallible.
Resolves: #4473.
---------
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: conectado <gabrielalejandro7@gmail.com>
The `TunDeviceManager` is a component that the leaf-nodes of our
dependency tree need: the binaries. Thus, it is misplaced in the
`connlib-shared` crate which is at the very bottom of the dependency
tree.
This is necessary to allow the `TunDeviceManager` to actually construct
a `Tun` (which currently lives in `firezone-tunnel`).
Related: #5839.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
This also improves some function names (i.e. don't say `windows_` when
we're already in `windows.rs`) and adds comments justifying why some
functions with only one call site are split out
I started this intending to use it to practice the sans-I/O style. It
didn't come up but I did get rid of that `spawn`
We are referencing the `tokio` dependency a lot and it makes sense to
ensure that version is tracked only once across the whole workspace.
Extracted out of #5797.
---------
Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
The Arc+Notify thing was always overkill, I just thought it was useful
early on. With the IPC change it's easier to just use the existing MPSC
channel
Also removing `TunnelReady` and assuming that the tunnel is ready
whenever connlib sends us the first Resource list
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Left over from #5789
This removes SIGHUP for the IPC service, which doesn't handle it anyway,
so it removes a code path that would just panic.
```[tasklist]
### Tasks
- [ ] Can we test this at all?
```
Closes#5789
The SIGTERM catching would have helped debug #5790
```[tasklist]
### Tasks
- [x] catch SIGTERM and log when systemd shuts us down gracefully
- [x] Log architecture at startup
```
I had to change the smoke test because it had a couple issues:
- The IPC socket had the wrong permissions because I didn't realize you
can tell `su` / `sudo` / `runuser` to set a group in addition to setting
a user
- It had a hard-coded timer of 12 seconds, and one time the test failed
because the IPC service exited before the GUI finished loading. So I
changed it so the IPC service in smoke test mode will wait forever for
exactly one client, then quit
```[tasklist]
### Tasks
- [x] Run `chown` in the Ubuntu smoke test
```
PR #5700 had a typo in it. I didn't notice that these match arms use
`|`, so I accidentally flush the DNS for an event that doesn't need it.
Only `OnUpdateResources` should flush DNS.
```[tasklist]
### Tasks
- [x] Check the GUI saves its settings file
- [x] Check the IPC service writes the device ID to disk
- [x] Check the GUI writes a log file (skipped - we already check if the exported zip has any files in it)
- [x] Run the crash file through `minidump-stackwalk`
- [x] Reach feature parity with the original smoke tests
- [x] Ready for review
- [x] Finish #5452
- [ ] Start on #5453
```
Closes#5052
On my dev VMs:
- systemd-resolved = 15 ms to flush
- Windows = 600 ms to flush
I tested with the headless Clients on Linux and Windows and it fixes the
issue. On Windows I didn't replicate the issue with the GUI Client, on
Linux this patch also fixes it for the GUI Client.
We added this to diagnose a hang in the IPC service, #5441. That hang,
to the best of our knowledge, was caused by a deadlock which we fixed in
#5571. So the heartbeat task just adds a lot of noise to the stdout
which is annoying for debugging and won't be used in production logs.
The system uptime measuring is still useful, so we now log that just
once when logging starts, next to the git version and log directives.
If we see this pattern in either process' logs, we know something is
suspicious:
- Log file ends without a clean shutdown message
- Next log file starts with a high system uptime
Updates should always result in a clean shutdown message, and a sudden
power loss (mains power outage, or laptop battery dying) would result in
the system uptime being low for the 2nd log file.
This does the same thing as #5621 without removing the library, since it
will now compile against whatever version of `windows` we need
We could do the same with `hostname`, either vendor or ask upstream to
bump deps, and then `windows` 0.52.0 should be gone.
```[tasklist]
### Tasks
- [x] Remove macOS code and shrink everything
```
Move almost half the lines of code into `ipc_service` so that it's
separate from the headless Client code
Moves the standalone / headless Client code into `standalone`
Currently, connlib re-initialises the TUN device on Linux every time its
configuration gets updated such as when roaming from one network to
another. This is unnecessary. Instead, we can adopt the same approach as
already used on MacOS, iOS and Windows and only initialise it if it
doesn't exist yet.
Doing so surfaces an interesting bug. Currently, attempting to
re-initialise the TUN device fails with a warning:
> connlib_client_shared::eventloop: Failed to set interface on tunnel:
Resource busy (os error 16)
See
https://github.com/firezone/firezone/actions/runs/9656570163/job/26634409346#step:7:103
for an example. As a consequence, we never actually trigger the
`on_set_interface_config` callback and thus never actually set the new
IPs on the TUN device.
Now that we _are_ calling this callback, we execute
`TunDeviceManager::set_ips` which first clears all IPs from the device
and then attaches the new ones. A consequence of this is that the Linux
kernel will clear all routes associated with the device. This clashes
with an optimisation we have in `TunDeviceManager` where we remember the
previously set routes and don't set new ones if they are the same.
This `HashSet` needs to be cleared upon setting new IPs in order to
actually set the new routes correctly afterwards. Without that, we stop
receiving traffic on the TUN device.
Closes#5450
Now the entire `Handler::run` function is allowed to fail, similar to a
web request handler failing in a web server.
Previously we only allowed the Handler to fail if it was idle, waiting
on incoming IPC requests. Now it can fail even if it's working with
connlib and about to send over IPC.
I replicated this on my Windows 11 VM in Parallels and the fix works
fine there. Should be the same bug and same fix in Linux.
Closes#5481
With this, I can connect to the staging portal without a build.rs or any
extra env var setup
<img width="387" alt="image"
src="https://github.com/firezone/firezone/assets/13400041/9c080b36-3a76-49c7-b706-20723697edc7">
```[tasklist]
### Next steps
- [x] Split out a refactor PR for `ConnectArgs` (#5488)
- [x] Try doing this for other Clients
- [x] Check Gateway
- [x] Check Tauri Client
- [x] Change to `app_version`
- [x] Open for review
- [ ] Use `option_env` so that `FIREZONE_PACKAGE_VERSION` can still override the Cargo.toml version for local testing
- [ ] Check Android Client
- [ ] Check Apple Client
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
This is extracted from #5487 since I needed to add an 8th parameter and
Clippy said 8 is too many.
Refs #2986
Stepping stone towards using the Builder pattern. There's only a few
Clients so this has 80% of the advantage for 20% of the effort
Extracted from https://github.com/firezone/firezone/pull/5426
- Replace `new` and `new_for_test` for IPC servers with `enum ServiceId`
- Rename `debug_command_setup` to `setup_stdout_logging`
It turned out there is no clever way to hide other platforms from
`cargo-mutants`, I thought I had such a way
This is a funny one. `cargo test -p firezone-headless-client -p
firezone-gui-client` actually passes, because the GUI client uses the
pipes feature, and Cargo apparently just does one build for both
packages. But if you build the headless Client by itself, it fails to
build.
I think this caused `cargo-mutants` to consider all its headless Client
mutants to be unviable, and so it didn't show coverage for that package.
Part of a yak shave to profile startup time for reducing it on Windows
#5026
Median of 3 runs:
- Windows 11 aarch64 Parallels VM - 4.8 s
- Windows 11 x86_64 laptop - 3.1 s (I thought it used to be slower)
- Windows Server 2022 VM - 22.2 s
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Closes#5042
Smoke test plan:
- Install on a before-Firezone VM
- Confirm logs default to `str0m=warn,info`
- Set log filter to `debug` in GUI
- Restart IPC service
- Confirm logs are `debug`
- Clear settings back to default
- Restart IPC service
- Confirm logs are `str0m=warn,info`
Directions to apply new log level:
1. Put the new log filter in
2. Click "Apply"
3. Quit Firezone Client
4. Right-click on the Start Menu and click "Terminal (Admin)" to open a
Powershell prompt
5. Run `Restart-Service -Name FirezoneClientIpcService` (on Linux, `sudo
systemctl restart firezone-client-ipc.service`)
6. Re-open Firezone Client
```[tasklist]
- [x] Log the log filter maybe
- [x] Use `atomicwrites` to write the file
- [x] (cancelled) ~~Make the GUI write the file on boot if it's not there (saves a step when upgrading from older versions)~~
- [x] Windows smoke test
- [x] Fix permissions on `/var/lib/dev.firezone.client/config`
- [x] Fix Linux IPC service not loading the log filter file
- [x] Linux smoke test
- [ ] Make sure it's okay that users in `firezone-client` can change the device ID
- [ ] Update user guides to include restarting the computer or IPC service after updating the log level?
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
- Removes version numbers from infra components (elixir/relay)
- Removes version bumping from Rust workspace members that don't get
published
- Splits release publishing into `gateway-`, `headless-client-`, and
`gui-client-`
- Removes auto-deploying new infrastructure when a release is published.
Use the Deploy Production workflow instead.
Fixes#4397
Closes#3567 (again)
Closes#5214
Ready for review
```[tasklist]
### Before merging
- [x] The IPC service should report system uptime when it starts. This will tell us whether the computer was rebooted or just the IPC service itself was upgraded / rebooted.
- [x] The IPC service should report the PID of itself and the GUI if possible
- [x] The GUI should report the PID of the IPC service if possible
- [x] Extra logging between `GIT_VERSION = ` and the token loading log line, especially right before and right after the critical Tauri launching step
- [x] If a 2nd GUI or IPC service runs and exits due to single-instance, it must log that
- [x] Remove redundant DNS deactivation when IPC service starts (I think conectado noticed this in another PR)
- [x] Manually test that the GUI logs something on clean shutdown
- [x] Logarithmic heartbeat?
- [x] If possible, log monotonic time somewhere so NTP syncs don't make the logs unreadable (uptime in the heartbeat should be monotonic, mostly)
- [x] Apply the same logging fix to the IPC service
- [x] Ensure log zips include GUI crash dumps
- [x] ~~Fix #5042~~ (that's a separate issue, I don't want to drag this PR out)
- [x] Test IPC service restart (logs as a stop event)
- [x] Test IPC service stop
- [x] Test IPC service logs during system suspend (Not logged, maybe because we aren't subscribed to power events)
- [x] Test IPC service logs during system reboot (Logged as shutdown, we exit gracefully)
- [x] Test IPC service logs during system shut down (Logged as a suspend)
- [x] Test IPC service upgrade (Logged as a stop)
- [x] Log unhandled events from the Windows service controller (Power events like suspend and resume are logged and not handled)
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Closes#5143
The initial half-second backoff should typically be enough, and if the
user is manually re-opening the GUI after a GUI crash, I don't think
they'll notice. If they do, they can open the GUI again and it should
all work.
Most of these were in `known_dirs.rs` because it's platform-specific and
`cargo-mutants` wasn't ignoring other platforms correctly.
Using `cargo mutants -p firezone-gui-client -p firezone-headless-client`
176 / 236 mutants missed before
155 / 206 mutants missed after
Refs #3636 (This pays down some of the technical debt from Linux DNS)
Refs #4473 (This partially fulfills it)
Refs #5068 (This is needed to make `FIREZONE_DNS_CONTROL` mandatory)
As of dd6421:
- On both Linux and Windows, DNS control and IP setting (i.e.
`on_set_interface_config`) both move to the Client
- On Windows, route setting stays in `tun_windows.rs`. Route setting in
Windows requires us to know the interface index, which we don't know in
the Client code. If we could pass opaque platform-specific data between
the tunnel and the Client it would be easy.
- On Linux, route setting moves to the Client and Gateway, which
completely removes the `worker` task in `tun_linux.rs`
- Notifying systemd that we're ready moves up to the headless Client /
IPC service
```[tasklist]
### Before merging / notes
- [x] Does DNS roaming work on Linux on `main`? I don't see where it hooks up. I think I only set up DNS in `Tun::new` (Yes, the `Tun` gets recreated every time we reconfigure the device)
- [x] Fix Windows Clients
- [x] Fix Gateway
- [x] Make sure connlib doesn't get the DNS control method from the env var (will be fixed in #5068)
- [x] De-dupe consts
- [ ] ~~Add DNS control test~~ (failed)
- [ ] Smoke test Linux
- [ ] Smoke test Windows
```
Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.116 to
1.0.117.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/serde-rs/json/releases">serde_json's
releases</a>.</em></p>
<blockquote>
<h2>v1.0.117</h2>
<ul>
<li>Resolve unexpected_cfgs warning (<a
href="https://redirect.github.com/serde-rs/json/issues/1130">#1130</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0ae247ca63"><code>0ae247c</code></a>
Release 1.0.117</li>
<li><a
href="4517c7a2d9"><code>4517c7a</code></a>
PartialEq is not implemented between Value and 128-bit ints</li>
<li><a
href="fdf99c7c38"><code>fdf99c7</code></a>
Combine number PartialEq tests</li>
<li><a
href="b4fc2451d7"><code>b4fc245</code></a>
Merge pull request <a
href="https://redirect.github.com/serde-rs/json/issues/1130">#1130</a>
from serde-rs/checkcfg</li>
<li><a
href="98f1a247de"><code>98f1a24</code></a>
Resolve unexpected_cfgs warning</li>
<li>See full diff in <a
href="https://github.com/serde-rs/json/compare/v1.0.116...v1.0.117">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>