Considered using Elixir and Rust to write the tests.
For Elixir, `wallaby` doesn't seem to have a way to attach to an
existing `chromium` instance, launching it each time, which makes it
hard to coordinate with the relay restart.
For Rust we considered `thirtyfour` which would be very nice since we
could test both firefox and chrome but each time it connects to the
instance it launches a new session making it hard to test the DNS cache
behavior.
We also considered `chrome_headless` for Rust it needs a small patch to
prevent it from closing the browser after `Drop` but it still presents a
problem, since it has no easy way to retrieve if loading a page has
succeeded. There are some workarounds such as retrieving the title that
we could have used but after some testing they are quite finnicky and we
don't want that for CI.
So I ended up settling for TypeScript but I'm open to other options, or
a fix for the previous ones!
There are some modifications still incoming for this PR, around the test
name and that sleep in the middle of the test doesn't look good so I
will probably add some retries, but the gist is here, will keep it in
draft until we expect it to be passing.
So feel free to do some initial reviews.
Note: the number of lines changed is greatly exaggerated by
`package.lock`
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Whenever we receive a `relays_presence` message from the portal, we
invalidate the candidates of all now disconnected relays and make
allocations on the new ones. This triggers signalling of new candidates
to the remote party and migrates the connection to the newly nominated
socket.
This still relies on #4613 until we have #4634.
Resolves: #4548.
---------
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
```[tasklist]
# Before merging
- [x] Remove file extension `.txt`
- [x] Wait for `linux-group` test to go green on `main` (#4692)
- [x] *all* compatibility tests must be green on this branch
```
Closes#4664Closes#4665
~~The compatibility tests are expected to fail until the next release is
cut, for the same reasons as in #4686~~
The compatibility test must be handled somehow, otherwise it'll turn
main red.
`linux-group` was moved out of integration / compatibility testing, but
the DNS tests do need the whole Docker + portal setup, so that one can't
move.
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Closes#4669
This should stop the problem of `linux-group` failing because of trying
to test an older release that doesn't have the right CLI features
---------
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Refs #4669. That issue will be for fixing and re-enabling the test.
This is only needed for Linux IPC which isn't in production yet, so it's
easier to disable first and debug second
Refs #4513
The next step after this is to use this to test security in the Linux
IPC code, it should reject any IPC commands from users not in the
`firezone` group.
refs #4602
- Removes `debug` stage building of `arm` and `arm64` binaries and
images (PRs only) -- these just get thrown away since we only test in CI
with `amd64`
- Removes `perf` builds for snownet-tests and http-test-server
- `base-base*` jobs are expected to fail since these changes haven't hit
`main` yet
-
This changes our required checks, so after approval I'll need to update
those.
Fixes the autolabeler so that changelog generation and edit process is
much less time-consuming
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Upon receiving a SIGTERM, we immediately disconnect from the websocket
connection to the portal and set a flag that we are shutting down.
Once we are disconnected from the portal and no longer have an active
allocations, we exit with 0. A repeated SIGTERM signal will interrupt
this process and force the relay to shutdown.
Disconnecting from the portal will (eventually) trigger a message to
clients and gateways that this relay should no longer be used. Thus,
depending on the timeout our supervisor has configured after sending
SIGTERM, the relay will continue all TURN operations until the number of
allocations drops to 0.
Currently, we also allow clients to make new allocations and refreshing
existing allocations. In the future, it may make sense to implement a
dedicated status code and refuse `ALLOCATE` and `REFRESH` messages
whilst we are shutting down.
Related: #4548.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
As we've learned more about how we can test for increased coverage and
certain failure scenarios, I think continuing down this path is a losing
battle.
Apple is the only platform we can't theoretically test in GitHub
actions, and we may be able to accomplish that with #4375. With #4506 in
progress, I think we can get decent coverage with a mix of CI
integration tests and portal-stubbed clients in CI.
If we can stub out the control plane I/O we can test clients in CI.
Unfortunately I had to keep `linux-client` to get the compatibility
tests to pass. #4578 aims to remove that package.
Please add to this list if you think of anything:
```[tasklist]
# Things that may break that CI/CD won't catch
- [ ] Github release artifacts
- [ ] Knowledge base
- [ ] Docker images
- [ ] Docker containers
- [ ] Existing `linux-client` users
- [ ] Anything that downloads ghcr artifacts
- [ ] Nix (Not sure if it's built in CI. It had a merge conflict)
```
Refs #4515, and #3712, #3782
I think this is what Thomas and I agreed on in Slack / Github
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
(After GA)
This adds a unit test for the Unix domain sockets that I intend to use
for process splitting on Linux.
The length-prefixed encoding and decoding are copied from `subzone`, but
most of that code will not be re-used since it's Windows-specific and
also specific to a Chromium-like process model, which won't work for
Firezone.
This is part of a yak shave towards CI testing of #3812
Moving the DNS control method out of `docker-compose.yml` and up to the
integration tests themselves allows us to test these scenarios:
- `systemd-resolved`
- `etc-resolv-conf`
- `systemd-resolved` but we're in a container where that won't work, so
we should gracefully degrade to just allowing IP/CIDR resources
This catches two of the mutants, according to `cargo-mutants`.
~~Unfortunately since `cargo test` runs in one process, it's
all-or-nothing for sudo, this will run all unit tests as sudo.~~
(This explanation is not exactly correct, `cargo test` does run _a_
subprocess, but still, there is no way to request sudo or non-sudo
runners for specific tests, since it's just an environment variable, and
since many tests run in parallel in different threads of the same
process.)
Here it is passing in Linux:
https://github.com/firezone/firezone/actions/runs/8382799272/job/22957555987#step:5:3160
And Windows:
https://github.com/firezone/firezone/actions/runs/8382799272/job/22957558003#step:5:1006
```[tasklist]
### Before merging
- [x] Try `#[ignore]` attribute
- [x] Fail gracefully if `sudo` isn't available
```
This adds an integration test that downloads a 10MB file from a server
and simulates the client roaming to another network while the download
is active.
We use a DNS resource for this to ensure it also doesn't take too long
in that case. DNS resources are what most users will be using and we
clear some internal DNS caches on connection failures. Hence, using a
DNS resource here is a somewhat roundabout way to test that we aren't
failing and re-establishing the connection but migrate it to a new
network path.
You know what I want, when I'm waiting 15-60 minutes on a CI job?
I want a stringly-typed language
I want the compiler to do
as
little
work
as
possible
If there even _is_ a compile step. Cause I love waiting and squinting at
underscores.
I thought this was going to use `cargo-deb` but it was actually easy
with the Tauri deb bundling we already use.
```[tasklist]
### Before merging
- [x] Make sure every file in the Tauri deb is also in our deb (e.g. icons)
```
AppImages won't work with process splitting. (#3713)
As far as I can tell, they just produce one binary. Internally they use
FUSE or something to mount a squashfs image, but that image won't be
able to hook into systemd and run with root permissions and everything.
I don't think it's practical, and Tauri's AppImage bundling doesn't have
the features for it.
Even their deb bundler doesn't have any way to specify a path for a
daemon to be installed. The sidecar feature only seems intended for the
GUI app to call, not anything else on the system.
(There is such a thing as installing AppImages, but I don't think it's
worth pursuing - We should just do debs)
Closes#3699 if successful
Ref #3972
I don't understand why it started working. There's at least 3
possibilities:
- Some unrelated change in the last few weeks fixed it (Maybe bumping
Tauri to 1.6.1? https://github.com/firezone/firezone/pull/3881)
- It was a bug in the Github CI runner image that they fixed
- It's an awful race condition and adding `tracing::debug!` fixed it
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
On the domain side this PR extends `Domain.Repo` with filtering,
pagination, and ordering, along with some convention changes are
removing the code that is not needed since we have the filtering now.
This required to touch pretty much all contexts and code, but I went
through all public functions and added missing tests to make sure
nothing will be broken.
On the web side I've introduced a `<.live_table />` which is as close as
possible to being a drop-in replacement for the regular `<.table />`
(but requires to structure the LiveView module differently due to
assigns anyways). I've updated all the listing tables to use it.