Refs #6927
This PR creates a GTK+ event loop, a blank window, and the tray menu. It
connects to the IPC service, you can sign in and everything, but the
About window, Settings window, and Welcome window aren't implemented.
We build a deb package in CI but it isn't pushed to the draft releases
in CD yet.

Pros over Iced:
- More mature
- Easy integration with `tray-icon`
- Small binaries (< 1 MB for this example)
Cons:
- GTK 3.x is abandoned as of March. GTK 4 isn't packaged for Ubuntu
20.04.
- Widgets might be hard to use
- Hard to set up on Windows, only using this for Linux for now
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Our tests are pretty fast now, meaning we can afford running more
permutations. This makes it less likely to encounter flakes in the
"coverage" tests where we grep for certain log lines to ensure that the
tests hit certain code paths.
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Refs #6138
Sentry is always enabled for now. In the near future we'll make it
opt-out per device and opt-in per org (see #6138 for details)
- Replaces the `crash_handling` module
- Catches panics in GUI process, tunnel daemon, and Headless Client
- Added a couple "breadcrumbs" to play with that feature
- User ID is not set yet
- Environment is set to the API URL, e.g. `wss://api.firezone.dev`
- Reports panics from the connlib async task
- Release should be automatically pulled from the Cargo version which we
automatically set in the version Makefile
Example screenshot of sentry.io with a caught panic:
<img width="861" alt="image"
src="https://github.com/user-attachments/assets/c5188d86-10d0-4d94-b503-3fba51a21a90">
In the past, we struggled a lot of the reproducibility of `tunnel_test`
failures because our input state and transition strategies were not
deterministic. In the end, we found out that it was due to the iteration
order of `HashMap`s.
To make sure this doesn't regress, we added a check to CI at the time
that compares the debug output of all regression seeds against a 2nd run
and ensures they are the same. That is overall a bit wonky.
We can do better by simple sampling a value from the strategy twice from
a test runner with the same seed. If the strategy is deterministic,
those need to be the same. We still rely on the debug output being
identical because:
a. Deriving `PartialEq` on everything is somewhat cumbersome
b. We actually care about the iteration order which a fancy `PartialEq`
implementation might ignore
This happens because the smoke test is stubbed out for release builds,
so any `use` statements that are only used in the smoke tests will cause
a warning in `--release` builds, including when we make release bundles.
Currently, we have two structs for representing IP packets: `IpPacket`
and `MutableIpPacket`. As the name suggests, they mostly differ in
mutability. This design was originally inspired by the `pnet_packet`
crate which we based our `IpPacket` on. With subsequent iterations, we
added more and more functionality onto our `IpPacket`, like NAT64 &
NAT46 translation. As a result of that, the `MutableIpPacket` is no
longer directly based on `pnet_packet` but instead just keeps an
internal buffer.
This duplication can be resolved by merging the two structs into a
single `IpPacket`. We do this by first replacing all usages of
`IpPacket` with `MutableIpPacket`, deleting `IpPacket` and renaming
`MutableIpPacket` to `IpPacket`. The final design now has different
`self`-receivers: Some functions take `&self`, some `&mut self` and some
consume the packet using `self`.
This results in a more ergonomic usage of `IpPacket` across the codebase
and deletes a fair bit of code. It also takes us one step closer towards
using `etherparse` for all our IP packet interaction-needs. Lastly, I am
currently exploring a performance-optimisation idea that stack-allocates
all IP packets and for that, the current split between `IpPacket` and
`MutableIpPacket` does not really work.
Related: #6366.
This PR introduces the `etherparse` dependency for parsing and
generating IP packets.
Using `etherparse`, we can implement the NAT46 & NAT64 implementations
for the gateway more elegantly because it allows us to parse the IP and
protocol headers into a static and much richer representation. The
conversion to the IPv4/IPv6 equivalent is then just a question of
transforming one data structure into another and writing it to the
correct place in the buffer.
We extract this functionality into dedicated `nat64` and `nat46`
modules.
Furthermore, we implement the various functions in `ip_packet::make`
using `etherparse` too. Following that, we also overhaul the NAT
translation tests that we have in `ip_packet::proptests`. Those now use
the more low-level `consume_to_ipX` APIs which makes the tests more
ergonomic to write.
In the future, we should upstream `Ipv4HeaderSliceMut` and
`Ipv6HeaderSliceMut` to `etherparse`.
Moving all of this functionality to `etherparse` will make it easier to
write tests that involve more IP packets as well as customise the
behaviour of our NAT.
Related: #5614.
Related: #6371.
Related: #6353.
`install-action` uses `cargo-binstall` as a fallback. That binary
contacts GitHub which may run into rate-limit without being
authenticated. In that case, we will install manually which takes very
long.
Resolves: #6374.
It happened in the past that we screwed up the `preconditions` of the
state machine test such that no more transitions were sampled that
actually send packets. To protect against this, we use the newly
introduced logs and grep for certain transitions.
In the future, we can consider emitting a more structured output, like
writing all testcases to a DB and run more complex queries against it to
ensure that certain cases are covered.
For quite a while now, we have been making extensive use of
property-based testing to ensure `connlib` works as intended. The idea
of proptests is that - given a certain seed - we deterministically
sample test inputs and assert properties on a given function.
If the test fails, `proptest` prints the seed which can then be added to
a regressions file to iterate on the test case and fix it. It is quite
obvious that non-determinism in how the test input gets generated is no
bueno and reduces the value we get out of these tests a fair bit.
The `HashMap` and `HashSet` data structures are known to be
non-deterministic in their iteration order. This causes non-determinism
during the input generation because we make use of a lot of maps and
sets to gradually build up the test input. We fix all uses of `HashMap`
and `HashSet` by replacing them with `BTreeMap` and `BTreeSet`.
To ensure this doesn't regress, we refactor `tunnel_test` to not make
use of proptest's macros and instead, we initialise and run the test
ourselves. This allows us to dump the sampled state and transitions into
a file per test run. In CI, we then run a 2nd iteration of all
regression tests and compare the sampled state and transitions with the
previous run. They must match byte-for-byte.
Finally, to discourage use of non-deterministic iteration, we ban the
use of the iteration functions on `HashMap` and `HashSet` across the
codebase. This doesn't catch iteration in a `for`-loop but it is better
than not linting against it at all.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Currently, `tunnel_test` executes all actions within the same `Instant`,
i.e. time is never advanced by itself. The difficulty with advancing
time compared to other actions like sending packets is that all
time-related actions "overlap". In other words, all timers within
connlib advance at the same time. This makes it difficult to model the
expected behaviour after a certain amount of time has passed as we'd
effectively need to model all timers and their relation to particular
actions (like resending of connection intents or STUN requests).
Instead of only advancing time by itself, we can model some aspect of it
by introducing latency on network messages. This allows us to define a
range of an "acceptable" network latency within everything is expected
to work.
Whilst this doesn't cover all failure cases, it gives us a solid
foundation of parameters within which we should not expect any
operational problems.
This version was a few months old and started throwing errors about
features that stabilized since then.
e.g.
https://github.com/firezone/firezone/actions/runs/10011089436/job/27673759249
```
error[E0658]: use of unstable library feature 'proc_macro_byte_character'
--> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/proc-macro2-1.0.86/src/wrapper.rs:871:21
|
871 | proc_macro::Literal::byte_character(byte)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: see issue #115268 <https://github.com/rust-lang/rust/issues/115268> for more information
= help: add `#![feature(proc_macro_byte_character)]` to the crate attributes to enable
= note: this compiler was built on 2024-03-25; consider upgrading it if it is out of date
error[E0658]: use of unstable library feature 'proc_macro_c_str_literals'
--> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/proc-macro2-1.0.86/src/wrapper.rs:898:21
|
898 | proc_macro::Literal::c_string(string)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: see issue #119750 <https://github.com/rust-lang/rust/issues/119750> for more information
= help: add `#![feature(proc_macro_c_str_literals)]` to the crate attributes to enable
= note: this compiler was built on 2024-03-25; consider upgrading it if it is out of date
For more information about this error, try `rustc --explain E0658`.
error: could not compile `proc-macro2` (lib) due to 2 previous errors
```
Our Rust CI runs various jobs in different configurations of packages
and / or features. Currently, only the clippy job denies warnings which
makes it possible that some code still generates warnings under
particular configurations.
To ensure we always fail on warnings, we set a global env var to deny
warnings for all Rust CI jobs.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Closes#5810
```[tasklist]
### Tasks
- [x] Try not to set the icon every time we change Resources
- [x] Get production icons
- [x] Add changelog comment
- [x] Add CI stress test that sets the icon 10,000 times
- [x] Open for review
- [x] Repair changelog
- [ ] Merge
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Closes#5601
It looks like we can hit 100+ Mbps in theory. This covers Wintun, Tokio,
and Windows OS overhead. It doesn't cover the cryptography or anything
in connlib itself.
The code is kinda messy but I'm not sure how to clean it up so I'll just
leave it for review.
This test should fail if there's any regressions in #5598.
It fails if any packet is dropped or if the speed is under 100 Mbps
```[tasklist]
### Tasks
- [x] Use `ip_packet::make`
- [x] Switch to `cargo bench`
- [x] Extract windows ARM PR
- [x] Clean up wintun.dll install code
- [x] Re-request review
```
```[tasklist]
### Tasks
- [x] Check the GUI saves its settings file
- [x] Check the IPC service writes the device ID to disk
- [x] Check the GUI writes a log file (skipped - we already check if the exported zip has any files in it)
- [x] Run the crash file through `minidump-stackwalk`
- [x] Reach feature parity with the original smoke tests
- [x] Ready for review
- [x] Finish #5452
- [ ] Start on #5453
```
Currently, the smoke tests rebuild the `dump_syms` and
`minidump-stackwalk` tools from scratch every time which is slow,
especially on Windows.
We can speed this up by utilising the `taiki-e/install-action` GitHub
action which discovers and downloads the latest binary releases of those
projects and installs them into $PATH.
I think those binaries might also be cached as part of the Rust cache
action (https://github.com/Swatinem/rust-cache) so the visible speed-up
is only within a few seconds and comes from the binaries not being
re-built inside the script.
Caching those binaries on Github still requires us to build them at
least once and also rebuild them in case the cache gets invalidated.
Hence I still think this is a good idea on its own.
When the `tunnel_test` fails, it generates a lot of output because it
keeps printing the backtrace over and over. This makes it difficult to
access the input seed to the test. Copying this seed into a local
environment is the first step in debugging this, at which point the
backtrace can be enabled locally.
We also disable the `verbose: 1` config option. Users can always set
that using the `PROPTEST_VERBOSE` env variable.
With an increased number of tests that make use of `proptest`, executing
`cargo test` (almost 6 minutes on `main` currently:
https://github.com/firezone/firezone/actions/runs/9278428694/job/25529425068#step:5:1407).
By compiling them with optimisations, we can drastically cut down the
execution time with only little penality in compilation speed as those
should be cached in CI.
This is similar to #4097 and #4585 but for the entire `ClientState` and
`GatewayState`. We also do it in the context of a property-based test
with the vision that we can deterministically explore a large space of
state transitions and see where our main property breaks: Being able to
send an ICMP packet from the client to the gateway.
In other words, we now correctly pass all the `Transmit`s back and forth
between the components as if they would receive it from the network. Due
to the nature of property-based tests, this already exercises a very
large input space. For example, if the client does not have an IPv6
socket and the gateway doesn't have an IPv4 socket, this test already
checks whether we then correctly fall back to using a relay (because the
allocation we make on the relay is the only network path where the STUN
requests pass through).
What this does not (yet) do is set up a proper network topology. The
`dispatch_transmit` function will happily "route" a `Transmit` from e.g.
the client to the gateway even if they are in different subnets. In
other words, these tests assume that the actual network itself works and
we can exchange UDP packets between the components.
For now, we only send ICMPs to CIDR resources. As a next step, we can
extend this to DNS resources by sending DNS queries for our DNS
resources and then sending an ICMP to the resolved IP.
It typically takes about 1 minute to run in CI. We don't have any leads
on fixing this issue, and it may be a regression in a recent release of
WebView2. https://github.com/firezone/firezone/pull/4935
This will fix an issue with `linux-group` and `token-path` that happens
when I try to split up the binaries.
```[tasklist]
### Before merging
- [x] Fix linux-group. That stub-ipc-client command doesn't even exist anymore
```
```[tasklist]
# Before merging
- [x] Remove file extension `.txt`
- [x] Wait for `linux-group` test to go green on `main` (#4692)
- [x] *all* compatibility tests must be green on this branch
```
Closes#4664Closes#4665
~~The compatibility tests are expected to fail until the next release is
cut, for the same reasons as in #4686~~
The compatibility test must be handled somehow, otherwise it'll turn
main red.
`linux-group` was moved out of integration / compatibility testing, but
the DNS tests do need the whole Docker + portal setup, so that one can't
move.
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Closes#4669
This should stop the problem of `linux-group` failing because of trying
to test an older release that doesn't have the right CLI features
---------
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
(After GA)
This adds a unit test for the Unix domain sockets that I intend to use
for process splitting on Linux.
The length-prefixed encoding and decoding are copied from `subzone`, but
most of that code will not be re-used since it's Windows-specific and
also specific to a Chromium-like process model, which won't work for
Firezone.
This catches two of the mutants, according to `cargo-mutants`.
~~Unfortunately since `cargo test` runs in one process, it's
all-or-nothing for sudo, this will run all unit tests as sudo.~~
(This explanation is not exactly correct, `cargo test` does run _a_
subprocess, but still, there is no way to request sudo or non-sudo
runners for specific tests, since it's just an environment variable, and
since many tests run in parallel in different threads of the same
process.)
Here it is passing in Linux:
https://github.com/firezone/firezone/actions/runs/8382799272/job/22957555987#step:5:3160
And Windows:
https://github.com/firezone/firezone/actions/runs/8382799272/job/22957558003#step:5:1006
```[tasklist]
### Before merging
- [x] Try `#[ignore]` attribute
- [x] Fail gracefully if `sudo` isn't available
```
Closes#3699 if successful
Ref #3972
I don't understand why it started working. There's at least 3
possibilities:
- Some unrelated change in the last few weeks fixed it (Maybe bumping
Tauri to 1.6.1? https://github.com/firezone/firezone/pull/3881)
- It was a bug in the Github CI runner image that they fixed
- It's an awful race condition and adding `tracing::debug!` fixed it
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
- Runs release asset builds simultaneously with `deploy-staging`. Those
don't depend on each other.
- Prevents running some build workflows in CD because they're run
already in the PR and in the merge group, and the risk of semantic
conflict is negligible
- Run `release` assets in staging
- Adds `compatibility_tests`: **To successfully introduce a breaking
change in the control / data plane APIs, you must now "Merge as
Administrator"**
- Since `CI` is no longer run on `main`, caching needed to be refactored
to make sense again
- Since `CI` is no longer run on `main`, the Elixir
`migrations_and_seeds_test` had to be rewritten. This now tests
migrations using `git checkout` instead of importing `main`'s DB dump.
- Move tauri builds to its own workflow so we can trigger Linux and
Windows builds manually on an adhoc basis like we do for the Swift and
Kotlin builds
- Add a new `hotfix` workflow that will run `compatibility_tests` with
the latest published images
- Add `workflow_dispatch` to trigger `CD` manually for testing purposes
(cc @ReactorScram)
Refs #3995
Closes#3815
Changes that are breaking (but these aren't in production so it should
be okay)
- Windows, renaming `device_id.json` to `firezone-id.json` to match the
rest of the code
- Linux GUI, storing the firezone-id under `/var/lib` instead of under
`$HOME`
- Linux GUI, bails out if not run with `sudo --preserve-env` by
detecting `$HOME == root` or `$USER != root`
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Builds off #3905 and uses the GH actions cache for tauri builds in order
to get around the `crate-type` problem sccache has with Tauri apps.
Fixes#3456
(Waiting on #3721)
Ubuntu is headless by default and needs `xvfb` to run Tauri in CI, hence
the difference.
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
This may cause conflicts with all my other PRs but it has to happen.
```[tasklist]
- [ ] Update test names in branch protection (I don't think I have perms for this)
```
This prevents duplication for different Tauri jobs like building the
release packages vs testing a debug build with mock keyring.
```[tasklist]
- [ ] Fix branch protection rules for changed tests
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>