Part of a yak shave to profile startup time for reducing it on Windows
#5026
Median of 3 runs:
- Windows 11 aarch64 Parallels VM - 4.8 s
- Windows 11 x86_64 laptop - 3.1 s (I thought it used to be slower)
- Windows Server 2022 VM - 22.2 s
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Closes#5042
Smoke test plan:
- Install on a before-Firezone VM
- Confirm logs default to `str0m=warn,info`
- Set log filter to `debug` in GUI
- Restart IPC service
- Confirm logs are `debug`
- Clear settings back to default
- Restart IPC service
- Confirm logs are `str0m=warn,info`
Directions to apply new log level:
1. Put the new log filter in
2. Click "Apply"
3. Quit Firezone Client
4. Right-click on the Start Menu and click "Terminal (Admin)" to open a
Powershell prompt
5. Run `Restart-Service -Name FirezoneClientIpcService` (on Linux, `sudo
systemctl restart firezone-client-ipc.service`)
6. Re-open Firezone Client
```[tasklist]
- [x] Log the log filter maybe
- [x] Use `atomicwrites` to write the file
- [x] (cancelled) ~~Make the GUI write the file on boot if it's not there (saves a step when upgrading from older versions)~~
- [x] Windows smoke test
- [x] Fix permissions on `/var/lib/dev.firezone.client/config`
- [x] Fix Linux IPC service not loading the log filter file
- [x] Linux smoke test
- [ ] Make sure it's okay that users in `firezone-client` can change the device ID
- [ ] Update user guides to include restarting the computer or IPC service after updating the log level?
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Closes#3567 (again)
Closes#5214
Ready for review
```[tasklist]
### Before merging
- [x] The IPC service should report system uptime when it starts. This will tell us whether the computer was rebooted or just the IPC service itself was upgraded / rebooted.
- [x] The IPC service should report the PID of itself and the GUI if possible
- [x] The GUI should report the PID of the IPC service if possible
- [x] Extra logging between `GIT_VERSION = ` and the token loading log line, especially right before and right after the critical Tauri launching step
- [x] If a 2nd GUI or IPC service runs and exits due to single-instance, it must log that
- [x] Remove redundant DNS deactivation when IPC service starts (I think conectado noticed this in another PR)
- [x] Manually test that the GUI logs something on clean shutdown
- [x] Logarithmic heartbeat?
- [x] If possible, log monotonic time somewhere so NTP syncs don't make the logs unreadable (uptime in the heartbeat should be monotonic, mostly)
- [x] Apply the same logging fix to the IPC service
- [x] Ensure log zips include GUI crash dumps
- [x] ~~Fix #5042~~ (that's a separate issue, I don't want to drag this PR out)
- [x] Test IPC service restart (logs as a stop event)
- [x] Test IPC service stop
- [x] Test IPC service logs during system suspend (Not logged, maybe because we aren't subscribed to power events)
- [x] Test IPC service logs during system reboot (Logged as shutdown, we exit gracefully)
- [x] Test IPC service logs during system shut down (Logged as a suspend)
- [x] Test IPC service upgrade (Logged as a stop)
- [x] Log unhandled events from the Windows service controller (Power events like suspend and resume are logged and not handled)
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Closes#5143
The initial half-second backoff should typically be enough, and if the
user is manually re-opening the GUI after a GUI crash, I don't think
they'll notice. If they do, they can open the GUI again and it should
all work.
Most of these were in `known_dirs.rs` because it's platform-specific and
`cargo-mutants` wasn't ignoring other platforms correctly.
Using `cargo mutants -p firezone-gui-client -p firezone-headless-client`
176 / 236 mutants missed before
155 / 206 mutants missed after
Refs #3636 (This pays down some of the technical debt from Linux DNS)
Refs #4473 (This partially fulfills it)
Refs #5068 (This is needed to make `FIREZONE_DNS_CONTROL` mandatory)
As of dd6421:
- On both Linux and Windows, DNS control and IP setting (i.e.
`on_set_interface_config`) both move to the Client
- On Windows, route setting stays in `tun_windows.rs`. Route setting in
Windows requires us to know the interface index, which we don't know in
the Client code. If we could pass opaque platform-specific data between
the tunnel and the Client it would be easy.
- On Linux, route setting moves to the Client and Gateway, which
completely removes the `worker` task in `tun_linux.rs`
- Notifying systemd that we're ready moves up to the headless Client /
IPC service
```[tasklist]
### Before merging / notes
- [x] Does DNS roaming work on Linux on `main`? I don't see where it hooks up. I think I only set up DNS in `Tun::new` (Yes, the `Tun` gets recreated every time we reconfigure the device)
- [x] Fix Windows Clients
- [x] Fix Gateway
- [x] Make sure connlib doesn't get the DNS control method from the env var (will be fixed in #5068)
- [x] De-dupe consts
- [ ] ~~Add DNS control test~~ (failed)
- [ ] Smoke test Linux
- [ ] Smoke test Windows
```
Refs #5022
The debug IPC service has been useful on Windows, and since there is
more refactoring to do, I want it on Linux too.
With this you can just do `sudo -E target/debug/firezone-client-ipc
debug-ipc-service` and it will launch an IPC service without messing
with systemd or installing anything. (Assuming the directory for the
socket is created)
```[tasklist]
### Before merging
- [ ] Check for regressions in Windows
- [ ] Check for regressions in Linux
```
Closes#4995Closes#4925Closes#4997Closes#5047
Supersedes #4965 and #5004.
NOT changing:
- Page description for other Clients. That is still "Firezone
Documentation"
Need these Clients:
- Windows GUI
- Linux headless
- Linux GUI
to have these things documented: (with exact terms)
- Prerequisites
- Installation
- Usage
- Signing in
- Accessing a Resource
- Signing out
- Quitting
- Upgrading
- Diagnostic logs
- Uninstalling
- Troubleshooting
- DNS not reverted after exit
- DNS Resource not accessible
- Known issues
```[tasklist]
### Before merging
- [x] Test Windows GUI instructions
- [x] Add troubleshooting for #5027
- [x] Fill in troubleshooting sections
- [x] Test Linux GUI instructions
- [x] Linux headless - Make sure SIGTERM or Ctrl+C or whatever reverts resolv.conf
- [x] Test Linux Headless instructions
- [x] Page descriptions should be "How to install and use the Firezone $OS $UI client."
- [x] ~~Linux headless - Confirm behaviors and default values of all env vars~~ (skipping - The ones that are used are exercised)
- [x] Grep for TODOs
- [x] Change "un-install" to "uninstall"
- [x] Capitalize "Client" where needed
- [x] Change "IPC service" to "Tunnel service" or something
- [x] Change "SplitDNS" to "Split DNS"
- [ ] Wait for next Client release to be cut
```
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
Ready for review.
Closes#3712.
Supersedes #4940.
Refs #4963.
I haven't figured out if it needs any new automated tests (unit,
integration, etc.) but the code itself is ready for review. There is
more refactoring that could be done, or could be left for later.
```[tasklist]
- [x] Move wintun setup from GUI to IPC service / headless client
- [x] Make sure the device ID is in a sensible place
- [x] Export IPC service logs in the zips
- [x] Test GUI + SC IPC service on Windows (f4db808919a passed)
- [x] Make sure IPC service does not busy-loop
- [x] Test un-install checklist for Windows
- [x] Test upgrade checklist for Windows
- [x] Test GUI + systemd IPC service on Linux (c4ab7e7 passed)
- [x] Test upgrade checklist for Linux
- [x] Test un-install checklist for Linux
- [x] Make sure the IPC service logs out and deactivates DNS control if the GUI crashes
- [x] Test network changing
- [x] (it's intended behavior) ~~Look into spurious `on_update_resources` (fad86babd7)~~
- [x] ~~Test max partition time on offline laptop~~ (I ended up just setting a 30-day default in the code)
- [x] Make sure headless Client does not busy-loop
- [x] Test standalone headless on Linux
- [ ] Add unit / integration tests
- [ ] Think about security a bit #3971
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
This PR introduces site's `Status`. That's used to report to the client
the status, either, unknown, online or offline, mostly as a hint to
users as what's wrong with a connection.
This are the criteria for an online or offline resource
* If all sites related to a resource are offline the resource is
considered offline, since there's no gateway that can respond to that
resource's connection
* If any site is online the resource is online, since that same peer can
be used to reach that resource
* Any other case is unknown
Right now resources are single site so it doesn't matter too much but
tracking online/offline per-site instead of per-gateway or resource
seems like the better long-term solution.
The way to "find out" the site's status is:
* If a response to a connection details is offline, all sites related to
that resource must be offline otherwise there would've been a gateway in
the response
* At the point we connect to a gateway, the site that corresponds to
that gateway must be online
* When a connection to a peer stops it's considered unknown again
Fixes#4738
Closes#4907
They're still accepted, but the binary entirely determines the behavior.
This makes the code for CLI parsing and token handling simpler with
fewer branches, so it's easier to be sure it's correct.
Replaces #4942 which isn't doing what I intended anymore.
Closes#4899
This has a known gap where theoretically the GUI could sign in while the
service is hung in startup, and then the service would wipe out the
GUI's DNS rules.
The workaround for that would be to restart the GUI, but in practice I
think the gap will not be hit, and it will go away once #3712 is done
anyway.
I tested it manually once using the reproduction steps from #4899 and it
worked.
```[tasklist]
### Before merging
- [x] Make sure the service auto-starts
- [x] Make the process idle and report its status to Windows properly using https://github.com/mullvad/windows-service-rs
- [x] DRY log dir code
- [x] Figure out where service logs will go and how the GUI will zip them
- [x] Make sure the service gets a shut down signal from Windows (this is hard to catch in the Tauri GUI)
- [x] Make sure the service restarts when Firezone is updated
- [x] Make sure the service is stopped and un-installed when Firezone is un-installed
- [x] Add test to install the MSI and check that the service runs
- [x] (will move to another PR) ~~Clean up function names~~
- [x] Make sure the Linux GUI was not broken by refactoring
```
```[tasklist]
# Before merging
- [x] Add CI test to check that the Unix domain socket is owned by `root:firezone` (#4832 will do this)
```
This allows the GUI (running as a normal user who belongs to the
`firezone` group) to read back the connlib logs and export them in the
zip file.
<img width="716" alt="image"
src="https://github.com/firezone/firezone/assets/13400041/59cb7cc5-fd6a-4b27-a311-1b9c56b7b23e">
Make the GUI use systemd-resolved to retrieve the system's resolvers.
This allows the IPC service to set up sentinels for those resolvers and
control the system's DNS.
Closes#3812
This aligns some of the internal names with #4531, but it shouldn't
break the externally-visible things like package names or permalinks.
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Closes#4270
Refs #3713
Refs #3782
It sort-of works, but many features are missing and it needs a refactor.
```[tasklist]
- [ ] Break `imp_linux.rs` into modules
- [ ] Get rid of `try_send` and panics where possible
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
For tests it doesn't hurt, but this will be used as a template for the
systemd service we ship to production, and that can't have the ID there.
So I'm also cleaning up a few other problems I noticed:
- I wanted to split the service files as part of #4531, so that the GUI
Client and headless Client can have separate sandbox rules. e.g, the
headless Client won't be allowed to create Unix domain sockets
- I'm punting more things to systemd, which allows us to tighten down
the sandbox further, e.g. creating `/var/lib/dev.firezone.client` and
`/run/dev.firezone.client` for us
- Closes#4461
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Calling `std::process::exit` won't let the DNS deactivation code runs.
For some control methods (systemd-resolved) this doesn't matter. For
etc-resolvconf and Windows, we are responsible for cleaning up DNS.
```[tasklist]
- [x] Replicate the issue
- [x] Fix it
- [x] Remove the fault injection code
```
Closes#4784
Closes#4682Closes#4691
```[tasklist]
# Before merging
- [x] Wait for `linux-group` test to go green on `main` (#4692)
- [ ] Wait for those browsers tests to get fixed
- [ ] *All* compatibility tests must pass on this branch
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
```[tasklist]
# Before merging
- [x] Remove file extension `.txt`
- [x] Wait for `linux-group` test to go green on `main` (#4692)
- [x] *all* compatibility tests must be green on this branch
```
Closes#4664Closes#4665
~~The compatibility tests are expected to fail until the next release is
cut, for the same reasons as in #4686~~
The compatibility test must be handled somehow, otherwise it'll turn
main red.
`linux-group` was moved out of integration / compatibility testing, but
the DNS tests do need the whole Docker + portal setup, so that one can't
move.
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Unfortunately I couldn't make it only happen once. This helps with
debugging service accounts, where DNS bugs look the same as forgetting
to enable a policy.
Closes#4657
```[tasklist]
### Before merging
- [x] Update KB
```
Maybe not a feature since Linux IPC isn't available to users yet?
I think it's okay if the new `linux-group` test fails in compatibility,
since it wasn't implemented at all back then.
Closes#4659Closes#4660
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Closes#4655
This should be more clear since "daemon", like "tunnel", could mean a
variety of things. The IPC thing is the distinct part for this
subcommand, and I didn't want to call it "server" and confuse it with a
web server. "service" hopefully evokes "systemd service" and "Windows
service", something that provides a service locally.
If not it could always be something longer
Unfortunately I had to keep `linux-client` to get the compatibility
tests to pass. #4578 aims to remove that package.
Please add to this list if you think of anything:
```[tasklist]
# Things that may break that CI/CD won't catch
- [ ] Github release artifacts
- [ ] Knowledge base
- [ ] Docker images
- [ ] Docker containers
- [ ] Existing `linux-client` users
- [ ] Anything that downloads ghcr artifacts
- [ ] Nix (Not sure if it's built in CI. It had a merge conflict)
```
Refs #4515, and #3712, #3782
I think this is what Thomas and I agreed on in Slack / Github
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Fixes#2363
* Rename `relay` package to `firezone-relay` so that binaries outputted
match the `firezone-*` cli naming scheme
* Rename `firezone-headless-client` package to `firezone-linux-client`
for consistency
* Add READMEs for user-facing CLI components (there will also be docs
later)