Commit Graph

58 Commits

Author SHA1 Message Date
Reactor Scram
869dcfa02f fix(linux-client): forbid passing the token as a CLI arg (#4683)
Closes #4682 
Closes #4691 

```[tasklist]
# Before merging
- [x] Wait for `linux-group` test to go green on `main` (#4692)
- [ ] Wait for those browsers tests to get fixed
- [ ] *All* compatibility tests must pass on this branch
```

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-24 14:09:08 +00:00
Gabi
adc0bb73f7 test(client): add reconnection tests from a client using a headless browser (#4569)
Considered using Elixir and Rust to write the tests.

For Elixir, `wallaby` doesn't seem to have a way to attach to an
existing `chromium` instance, launching it each time, which makes it
hard to coordinate with the relay restart.

For Rust we considered `thirtyfour` which would be very nice since we
could test both firefox and chrome but each time it connects to the
instance it launches a new session making it hard to test the DNS cache
behavior.

We also considered `chrome_headless` for Rust it needs a small patch to
prevent it from closing the browser after `Drop` but it still presents a
problem, since it has no easy way to retrieve if loading a page has
succeeded. There are some workarounds such as retrieving the title that
we could have used but after some testing they are quite finnicky and we
don't want that for CI.

So I ended up settling for TypeScript but I'm open to other options, or
a fix for the previous ones!

There are some modifications still incoming for this PR, around the test
name and that sleep in the middle of the test doesn't look good so I
will probably add some retries, but the gist is here, will keep it in
draft until we expect it to be passing.

So feel free to do some initial reviews.

Note: the number of lines changed is greatly exaggerated by
`package.lock`

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-20 06:57:07 +00:00
Thomas Eizinger
51089b89e7 feat(connlib): smoothly migrate relayed connections (#4568)
Whenever we receive a `relays_presence` message from the portal, we
invalidate the candidates of all now disconnected relays and make
allocations on the new ones. This triggers signalling of new candidates
to the remote party and migrates the connection to the newly nominated
socket.

This still relies on #4613 until we have #4634.

Resolves: #4548.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-04-20 06:16:35 +00:00
Reactor Scram
7081c71c10 chore(linux-client): allow custom token path (#4666)
```[tasklist]
# Before merging
- [x] Remove file extension `.txt`
- [x] Wait for `linux-group` test to go green on `main` (#4692)
- [x] *all* compatibility tests must be green on this branch
```

Closes #4664 
Closes #4665 

~~The compatibility tests are expected to fail until the next release is
cut, for the same reasons as in #4686~~

The compatibility test must be handled somehow, otherwise it'll turn
main red.
`linux-group` was moved out of integration / compatibility testing, but
the DNS tests do need the whole Docker + portal setup, so that one can't
move.

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-19 18:50:24 +00:00
Reactor Scram
bc22fb2bf2 test(linux-client): move linux-group test out of integration tests (#4692)
Closes #4669 

This should stop the problem of `linux-group` failing because of trying
to test an older release that doesn't have the right CLI features

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-19 02:52:31 +00:00
Thomas Eizinger
4972e49b34 ci: run assertions inside docker container (#4680)
As part of #4568, we are adding a 2nd relay which showed some
short-comings of the current process state assertions because they were
running outside the docker containers, thus listing all relays as soon
as there are multiple.
2024-04-18 23:48:42 +00:00
Reactor Scram
926ffe6f07 test(linux-client): fix linux-group integration test (#4671)
Closes #4669 
(Once I figure out the cause and then fix it)
2024-04-18 14:05:24 +00:00
Reactor Scram
e7a4a83e3d chore(linux): only allow IPC connections from members of the firezone group (#4628)
```[tasklist]
### Before merging
- [x] Update KB
```

Maybe not a feature since Linux IPC isn't available to users yet?

I think it's okay if the new `linux-group` test fails in compatibility,
since it wasn't implemented at all back then.

Closes #4659
Closes #4660

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-17 21:42:29 +00:00
Reactor Scram
2f6f2ef260 test(linux-client): check if we can add the user to a group in a CI test (#4600)
Refs #4513

The next step after this is to use this to test security in the Linux
IPC code, it should reject any IPC commands from users not in the
`firezone` group.
2024-04-17 20:40:27 +00:00
Reactor Scram
1f2821415f chore(linux): ask systemd to limit our privileges (#4630)
Should drop our `systemd-analyze security` level from 9.7 to about 2.5.
We could go a little further, but it would take a lot more effort, and
this is a good starting point.

```[tasklist]
# Before review
- [x] Remove unused trap function in Bash
- [x] Remove `systemd-analyze` call
```
2024-04-17 16:11:29 +00:00
Reactor Scram
cdf2bc8838 refactor(test): use 'set -euox' instead of manual echos (#4637)
I wasn't aware of `set x` when I wrote this, and it looks good in the
other test scripts.

I'm not sourcing `lib.sh` yet, because I don't happen to need any
functions from it. I have other draft PRs that will probably end up
using it.
2024-04-16 17:36:43 +00:00
Jamil
05386b8b4b chore(ci): Use netstat instead of ss for release image tests (#4640)
Fixes #4636
2024-04-16 11:14:52 -06:00
Reactor Scram
7bc1d51b0f test(linux-client): separate the token from the systemd unit file (#4626)
This is needed so that we can auto-update the systemd unit file, either
manually, or with a package manager like `apt`. We don't want users
cut-and-pasting these together on every update, and we don't want
machines doing it. Making the file updatable means we can make security
fixes to it easily.
2024-04-15 20:38:49 +00:00
Thomas Eizinger
be1a719e2c chore(relay): perform graceful shutdown upon receiving SIGTERM (#4552)
Upon receiving a SIGTERM, we immediately disconnect from the websocket
connection to the portal and set a flag that we are shutting down.

Once we are disconnected from the portal and no longer have an active
allocations, we exit with 0. A repeated SIGTERM signal will interrupt
this process and force the relay to shutdown.

Disconnecting from the portal will (eventually) trigger a message to
clients and gateways that this relay should no longer be used. Thus,
depending on the timeout our supervisor has configured after sending
SIGTERM, the relay will continue all TURN operations until the number of
allocations drops to 0.

Currently, we also allow clients to make new allocations and refreshing
existing allocations. In the future, it may make sense to implement a
dedicated status code and refuse `ALLOCATE` and `REFRESH` messages
whilst we are shutting down.

Related: #4548.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-04-12 08:45:08 +00:00
Thomas Eizinger
26494b0e34 ci: reduce duplication in integration tests (#4583)
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-04-11 23:01:12 +00:00
Reactor Scram
3a67eacfbe refactor(linux-client): replace client-tunnel with headless-client which is the same thing (#4516)
Unfortunately I had to keep `linux-client` to get the compatibility
tests to pass. #4578 aims to remove that package.

Please add to this list if you think of anything:

```[tasklist]
# Things that may break that CI/CD won't catch
- [ ] Github release artifacts
- [ ] Knowledge base 
- [ ] Docker images
- [ ] Docker containers
- [ ] Existing `linux-client` users
- [ ] Anything that downloads ghcr artifacts
- [ ] Nix (Not sure if it's built in CI. It had a merge conflict)
```

Refs #4515, and #3712, #3782

I think this is what Thomas and I agreed on in Slack / Github

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-10 22:01:55 +00:00
Jamil
7d88e28872 chore(ci): Configure relay with new IP on restart tests (#4571)
See https://firezonehq.slack.com/archives/C0575SD66E5/p1712726575563089
2024-04-10 08:45:38 +00:00
Thomas Eizinger
8d49452668 ci: assert that nothing busy loops after the perf tests (#4546)
The clients, gateway and relay all employ an internal design that is
based on an eventloop. This gives us a lot of control in how various IO
components interact with each other. Great control also comes with a
source of bugs, the latest of which made the relay busy-loop once it
started relaying some traffic.

Eventloops are notoriously hard to unit-test because they compose
various IO bits together. Instead of writing unit tests, we can go and
assert the process state after the performance tests. Those generate a
fair bit of load on all our components but after that, they should
suspend.

The most effective tests survive even large refactorings and for that,
they need to be coded against a stable API / property. Asserting that
the process sleeps when it is idle from an application PoV is such a
property.

Related: #4511.
2024-04-09 07:09:50 +00:00
Jamil
09532ea845 chore(ci): Add portal and relay downtime DNS resource tests (#4517)
Tests that DNS still works in the client with established connections
after the portal and/or relay go down.
2024-04-08 09:43:59 +00:00
Reactor Scram
74a81b2a56 test(gui-client): unit test for Linux IPC (#4277)
(After GA)

This adds a unit test for the Unix domain sockets that I intend to use
for process splitting on Linux.

The length-prefixed encoding and decoding are copied from `subzone`, but
most of that code will not be re-used since it's Windows-specific and
also specific to a Chromium-like process model, which won't work for
Firezone.
2024-04-02 19:34:24 +00:00
Reactor Scram
1e4ed7bad6 refactor(ci): move DNS control method up to docker-compose.yml (#4341)
This is part of a yak shave towards CI testing of #3812 

Moving the DNS control method out of `docker-compose.yml` and up to the
integration tests themselves allows us to test these scenarios:

- `systemd-resolved`
- `etc-resolv-conf`
- `systemd-resolved` but we're in a container where that won't work, so
we should gracefully degrade to just allowing IP/CIDR resources
2024-04-02 17:11:29 +00:00
Gabi
24e0641871 chore: set rust log level to info for gateways and client (#4319)
- [x] Updated log level string for client and gateways to info or higher
- [x] Update logs to hide DNS information

I also removed `hickory_resolve` errors which could contain sensitive
info from our general error and hide the logs that specifically relates
to them.

@bmanifold double checking that the log levels in the gateway's `*.tf`
files are just used for our own gateways.

Also, the relays still have `debug`, since only we see that I think that
makes sense but double checking with @jamilbk

Fixes: #3618.

---------

Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-03-27 01:39:12 +00:00
Thomas Eizinger
18033eafec ci: ensure roaming between networks doesn't abort file download (#4213)
This adds an integration test that downloads a 10MB file from a server
and simulates the client roaming to another network while the download
is active.

We use a DNS resource for this to ensure it also doesn't take too long
in that case. DNS resources are what most users will be using and we
clear some internal DNS caches on connection failures. Hence, using a
DNS resource here is a somewhat roundabout way to test that we aren't
failing and re-establishing the connection but migrate it to a new
network path.
2024-03-26 05:44:59 +00:00
Reactor Scram
7fece80006 refactor(gui-client): refuse to ever be elevated on Linux (#4232)
Running as sudo / root causes a lot of problems for GUI programs, so
we're unwinding that. In this case we can go back to using Tauri's "open
URL" function, which is great.

Closes #4103
Refs #3713
Affects #3972 - I was finally able to debug it because it came up
constantly during this PR
2024-03-21 14:42:48 +00:00
Reactor Scram
ada9d896cf chore(gui-client): remove unused env var (#4234)
Must have been a hack to run the smoke test in CI, and was never
actually hooked up.
2024-03-20 22:20:29 +00:00
Reactor Scram
e05cbbe0a0 build(gui-client/linux): include an empty firezone-tunnel binary with the Tauri deb package (#4220)
I thought this was going to use `cargo-deb` but it was actually easy
with the Tauri deb bundling we already use.

```[tasklist]
### Before merging
- [x] Make sure every file in the Tauri deb is also in our deb (e.g. icons)
```
2024-03-20 14:11:41 +00:00
Reactor Scram
651ea3ae00 build(gui-client/linux): make sure debug symbols get uploaded for the Linux GUI client (#4217)
- Split up CI artifacts into "exe", "pkg", and "syms" so it's easy to
check they're being uploaded. This shouldn't affect published artifacts
- Set `strip = "none"` which seems to be necessary to get the debug
symbols in Linux, although they still end up in the exe and not the dwp
file 🤔 don't know why
- Test Linux stacktrace in CI

Stacktrace examples:
- On Linux we at least get function names, but we aren't getting line
numbers for some reason
https://github.com/firezone/firezone/actions/runs/8350493514/job/22857032124#step:10:268
- On Windows we also get line numbers, as before
https://github.com/firezone/firezone/actions/runs/8350493514/job/22857033367#step:11:351

I didn't test downloading the files and doing a stacktrace locally, but
I have batched that up for whenever I do a big manual test of the
CD-produced release artifacts:
https://github.com/firezone/firezone/issues/3887
2024-03-19 22:18:03 +00:00
Reactor Scram
3ced2b3a20 ci(linux): fix incorrect use of grep -v (#4151)
I think I used `-v` wrong here. I meant it to negate a regular grep, but
it will probably always return true since it negates the matching, not
the exit code.

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-03-15 19:06:57 +00:00
Thomas Eizinger
62e082d47a refactor(connlib): make {Client,Gateway}State SANS-IO (#4096)
Resolves: #3929.
2024-03-14 23:44:36 +00:00
Reactor Scram
52cde610e1 feat(linux): make deep link auth work (#4102)
Right now it only works on my dev VM, not on my test VMs, due to #4053
and #4103, but it passes tests and should be safe to merge.

There's one doc fix and one script fix which are unrelated and could be
their own PRs, but they'd be tiny, so I left them in here.

Ref #4106 and #3713 for the plan to fix all this by splitting the tunnel
process off so that the GUI runs as a normal user.
2024-03-13 18:11:04 +00:00
Jamil
574585d146 chore(ci): Add debug/ and perf/ prefix to some images (#4104)
Followup from #4100:


- Add `perf/relay` and `debug/relay` etc data plane images in
`firezone-staging`.
- The `perf` images are `debug` stage images and have tooling installed,
but use release binaries.
- The `debug` images are `debug` binaries inside `debug` images
- `firezone-prod` contains only release binaries -- these image names
haven't changed
2024-03-12 20:27:32 +00:00
Jamil
391150f0e1 chore(ci): Fix new issues in cd.yml (#4085)
Fixes some issues encountered after the merge of #4049 

- Fix performance tests to only run using base_ref and head_ref to avoid
dependence on `main`
- Fixes some typos
- Prevents a catch-22 condition where breaking compatibility meant we
wouldn't be able to deploy production
2024-03-12 02:06:19 +00:00
Reactor Scram
7211e88338 feat(linux-client): generate firezone-id (device ID) automatically if it's not provided at launch (#3920)
Closes #3815 

Changes that are breaking (but these aren't in production so it should
be okay)

- Windows, renaming `device_id.json` to `firezone-id.json` to match the
rest of the code
- Linux GUI, storing the firezone-id under `/var/lib` instead of under
`$HOME`
- Linux GUI, bails out if not run with `sudo --preserve-env` by
detecting `$HOME == root` or `$USER != root`

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-03-08 16:13:59 +00:00
Reactor Scram
f11edae097 test(windows): run the smoke test twice to shake out issues in the script (#3889)
I may use this for #3867 , or I may do that in-process, either way this
could be handy.
2024-03-04 23:15:58 +00:00
Reactor Scram
789d2160de refactor(linux): make a place for reverting /etc/resolv.conf (#3822)
Makes progress towards #3817

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-03-01 19:09:01 +00:00
Reactor Scram
c5aab6ebba test(windows client): Add stack trace printing to smoke test (#3813)
Closes #3798 

- After the crash test, the Windows smoke test runs `minidump-stackwalk`
to print a stack trace:
https://github.com/firezone/firezone/actions/runs/8100801373/job/22139592883#step:11:770
- This acts as runnable documentation for getting stack traces on
Windows, and it should flunk the test if anything in the crash handling
to stack trace pipeline is broken
- I also updated the comment in the code since the minidump PR I was
waiting on was put into their newest release
2024-02-29 20:41:35 +00:00
Reactor Scram
ca95dcdf77 feat(gui-client): make all modules Linux-friendly (#3737)
Splits up platform-specific modules into `linux.rs` and `windows.rs`
submodules, or `imp` modules within the parent module, so that we can
compile all the platform-independent code (e.g. Tauri calls,
`run_controller`) for Linux.

This will show the Tauri GUI on Linux:


![image](https://github.com/firezone/firezone/assets/13400041/320a4b91-1a37-48dd-94b9-cc280cc09854)

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-02-28 22:09:56 +00:00
Jamil
2ed6b3d07f chore(connlib): Tune log filters to enable debug in dev and info for gateway deployments (#3788)
Refs #3618

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-02-27 23:35:08 +00:00
Reactor Scram
7825710a69 refactor(GUI clients): extract known_dirs module (#3734)
The CI tests aren't running for Linux just yet.
This organizes the well-known directories used on Linux and Windows for
logs, config, etc., and adds them to the (unused) Linux smoke test

Waiting on #3727

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-02-23 17:31:35 +00:00
Reactor Scram
90b2bdb9b1 test(windows): make sure files are written to the right paths during smoke tests (#3727)
I will need to set up the same paths for Linux, (#3734) and I want an
automated test to make sure everything gets into the right directories.

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-02-23 16:14:20 +00:00
Jamil
56e9e5e68a feat(ci): Test that relay restarts don't break existing connected entities (#3671)
~~Highlights the issue hypothesized in #3666~~

This tests that restarting a Relay won't cause sustained downtime.

Sleeps have been removed as they shouldn't necessary -- removing them
will better catch race conditions.
2024-02-23 01:06:54 +00:00
Reactor Scram
c09ba0889d refactor(ci): extract scripts for GUI client smoke tests (#3724)
(Waiting on #3721)
Ubuntu is headless by default and needs `xvfb` to run Tauri in CI, hence
the difference.

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-02-22 21:15:59 +00:00
Jamil
3bd7dc504e fix(ci): Fix flaky iperf3 "Bad file descriptor" (#3731)
- Lower UDP bandwidth to 50M -- this fixes intermittent file descriptor
issues because we overload iperf3 for more than 5 seconds
- Simplify iperf3 to the minimum set that makes tests reliable
2024-02-22 19:57:22 +00:00
Jamil
5bd717b877 fix(ci): Use workflow id to fetch perf results (#3710) 2024-02-20 19:40:16 -08:00
Jamil
63cdd09a01 refactor(ci): Merge perf results into one comment (#3707)
One comment vs eight, need I say more?
2024-02-20 18:17:48 -08:00
Jamil
19a7bac4ae chore(ci): enforce shellscript formatting and style (#3679)
Noticed that we all have different styles of writing scripts :-).

This PR adds linting to our shell scripts to standardize on formatting,
catch common issues and/or possible security bugs.

For editor setup:
- Ensure [`shellcheck`](https://github.com/koalaman/shellcheck) and
[`shfmt`](https://github.com/mvdan/sh) are in your `PATH`
- Configure `shfmt` with indentation of `4`, otherwise it uses tabs by
default.
[Here](https://github.com/jamilbk/nvim/blob/master/init.vim#L159) is how
you can do that with Vim and
[here](https://marketplace.visualstudio.com/items?itemName=mkhl.shfmt)
is how for VScode.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Brian Manifold <bmanifold@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Dryga <andrew@dryga.com>
Co-authored-by: Gabi <gabrielalejandro7@gmail.com>
2024-02-21 01:01:32 +00:00
Jamil
2d208b1991 fix(ci): Fix js typo (#3704)
More fixes from the perf test refactor
2024-02-20 16:38:05 -08:00
Jamil
0598ca55c3 fix(ci): Fix result overwrite (#3700)
Buttoning up fixes from #3695
2024-02-20 15:46:59 -08:00
Jamil
7ff40b82ed fix(ci): Run each perf test in its own matrix job (#3695)
The iperf3 server sometimes hangs, or takes a while to startup.

Rather than trying to reset the iperf3 state between performance tests,
this PR refactors them so they each run in their matrix job. This
ensures each performance test will run on a separate VM, unaffected by
previous test runs to eliminate the effect any residual network buffer
state can have on a particular test.

It also makes sure the server is listening with a `healthcheck`.
2024-02-20 22:44:20 +00:00
Gabi
3d3e737ba3 refactor(connlib): replace webrtc-rs with snownet (#3391)
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

Resolves: #3377.

---------

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-02-20 06:56:31 +00:00