Commit Graph

236 Commits

Author SHA1 Message Date
Jamil
d656cd54f6 chore: remove test lib bash sourcing from customer-run scripts (#4753)
Didn't catch this in code review. These are run on customer's systems
and can't possibly source our shared script.
2024-04-23 19:04:02 +00:00
Gabi
adc0bb73f7 test(client): add reconnection tests from a client using a headless browser (#4569)
Considered using Elixir and Rust to write the tests.

For Elixir, `wallaby` doesn't seem to have a way to attach to an
existing `chromium` instance, launching it each time, which makes it
hard to coordinate with the relay restart.

For Rust we considered `thirtyfour` which would be very nice since we
could test both firefox and chrome but each time it connects to the
instance it launches a new session making it hard to test the DNS cache
behavior.

We also considered `chrome_headless` for Rust it needs a small patch to
prevent it from closing the browser after `Drop` but it still presents a
problem, since it has no easy way to retrieve if loading a page has
succeeded. There are some workarounds such as retrieving the title that
we could have used but after some testing they are quite finnicky and we
don't want that for CI.

So I ended up settling for TypeScript but I'm open to other options, or
a fix for the previous ones!

There are some modifications still incoming for this PR, around the test
name and that sleep in the middle of the test doesn't look good so I
will probably add some retries, but the gist is here, will keep it in
draft until we expect it to be passing.

So feel free to do some initial reviews.

Note: the number of lines changed is greatly exaggerated by
`package.lock`

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-20 06:57:07 +00:00
Thomas Eizinger
51089b89e7 feat(connlib): smoothly migrate relayed connections (#4568)
Whenever we receive a `relays_presence` message from the portal, we
invalidate the candidates of all now disconnected relays and make
allocations on the new ones. This triggers signalling of new candidates
to the remote party and migrates the connection to the newly nominated
socket.

This still relies on #4613 until we have #4634.

Resolves: #4548.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-04-20 06:16:35 +00:00
Reactor Scram
7081c71c10 chore(linux-client): allow custom token path (#4666)
```[tasklist]
# Before merging
- [x] Remove file extension `.txt`
- [x] Wait for `linux-group` test to go green on `main` (#4692)
- [x] *all* compatibility tests must be green on this branch
```

Closes #4664 
Closes #4665 

~~The compatibility tests are expected to fail until the next release is
cut, for the same reasons as in #4686~~

The compatibility test must be handled somehow, otherwise it'll turn
main red.
`linux-group` was moved out of integration / compatibility testing, but
the DNS tests do need the whole Docker + portal setup, so that one can't
move.

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-19 18:50:24 +00:00
Reactor Scram
bc22fb2bf2 test(linux-client): move linux-group test out of integration tests (#4692)
Closes #4669 

This should stop the problem of `linux-group` failing because of trying
to test an older release that doesn't have the right CLI features

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-19 02:52:31 +00:00
Thomas Eizinger
4972e49b34 ci: run assertions inside docker container (#4680)
As part of #4568, we are adding a 2nd relay which showed some
short-comings of the current process state assertions because they were
running outside the docker containers, thus listing all relays as soon
as there are multiple.
2024-04-18 23:48:42 +00:00
Reactor Scram
926ffe6f07 test(linux-client): fix linux-group integration test (#4671)
Closes #4669 
(Once I figure out the cause and then fix it)
2024-04-18 14:05:24 +00:00
Reactor Scram
e7a4a83e3d chore(linux): only allow IPC connections from members of the firezone group (#4628)
```[tasklist]
### Before merging
- [x] Update KB
```

Maybe not a feature since Linux IPC isn't available to users yet?

I think it's okay if the new `linux-group` test fails in compatibility,
since it wasn't implemented at all back then.

Closes #4659
Closes #4660

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-17 21:42:29 +00:00
Reactor Scram
2f6f2ef260 test(linux-client): check if we can add the user to a group in a CI test (#4600)
Refs #4513

The next step after this is to use this to test security in the Linux
IPC code, it should reject any IPC commands from users not in the
`firezone` group.
2024-04-17 20:40:27 +00:00
Reactor Scram
1f2821415f chore(linux): ask systemd to limit our privileges (#4630)
Should drop our `systemd-analyze security` level from 9.7 to about 2.5.
We could go a little further, but it would take a lot more effort, and
this is a good starting point.

```[tasklist]
# Before review
- [x] Remove unused trap function in Bash
- [x] Remove `systemd-analyze` call
```
2024-04-17 16:11:29 +00:00
Reactor Scram
cdf2bc8838 refactor(test): use 'set -euox' instead of manual echos (#4637)
I wasn't aware of `set x` when I wrote this, and it looks good in the
other test scripts.

I'm not sourcing `lib.sh` yet, because I don't happen to need any
functions from it. I have other draft PRs that will probably end up
using it.
2024-04-16 17:36:43 +00:00
Jamil
05386b8b4b chore(ci): Use netstat instead of ss for release image tests (#4640)
Fixes #4636
2024-04-16 11:14:52 -06:00
Reactor Scram
7bc1d51b0f test(linux-client): separate the token from the systemd unit file (#4626)
This is needed so that we can auto-update the systemd unit file, either
manually, or with a package manager like `apt`. We don't want users
cut-and-pasting these together on every update, and we don't want
machines doing it. Making the file updatable means we can make security
fixes to it easily.
2024-04-15 20:38:49 +00:00
Thomas Eizinger
be1a719e2c chore(relay): perform graceful shutdown upon receiving SIGTERM (#4552)
Upon receiving a SIGTERM, we immediately disconnect from the websocket
connection to the portal and set a flag that we are shutting down.

Once we are disconnected from the portal and no longer have an active
allocations, we exit with 0. A repeated SIGTERM signal will interrupt
this process and force the relay to shutdown.

Disconnecting from the portal will (eventually) trigger a message to
clients and gateways that this relay should no longer be used. Thus,
depending on the timeout our supervisor has configured after sending
SIGTERM, the relay will continue all TURN operations until the number of
allocations drops to 0.

Currently, we also allow clients to make new allocations and refreshing
existing allocations. In the future, it may make sense to implement a
dedicated status code and refuse `ALLOCATE` and `REFRESH` messages
whilst we are shutting down.

Related: #4548.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-04-12 08:45:08 +00:00
Thomas Eizinger
26494b0e34 ci: reduce duplication in integration tests (#4583)
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-04-11 23:01:12 +00:00
Jamil
6720ab5bc1 chore(clients): Bump Apple to 1.0.2; Android 1.0.1 (#4590)
CI won't pass for these builds without these bumps because the versions
are already published.
2024-04-11 22:34:17 +00:00
Jamil
539431d9a3 chore(ci): Allow versioning components separately (#4493)
Since we already have apps published, we need the ability to decouple
the versions of components from each other so that we can run CI and
publish them independently.

This is the first step. The next step would be decoupling releases so
that they're for individual components.

refs #4397
2024-04-11 13:38:03 +00:00
Reactor Scram
3a67eacfbe refactor(linux-client): replace client-tunnel with headless-client which is the same thing (#4516)
Unfortunately I had to keep `linux-client` to get the compatibility
tests to pass. #4578 aims to remove that package.

Please add to this list if you think of anything:

```[tasklist]
# Things that may break that CI/CD won't catch
- [ ] Github release artifacts
- [ ] Knowledge base 
- [ ] Docker images
- [ ] Docker containers
- [ ] Existing `linux-client` users
- [ ] Anything that downloads ghcr artifacts
- [ ] Nix (Not sure if it's built in CI. It had a merge conflict)
```

Refs #4515, and #3712, #3782

I think this is what Thomas and I agreed on in Slack / Github

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-10 22:01:55 +00:00
Jamil
7d88e28872 chore(ci): Configure relay with new IP on restart tests (#4571)
See https://firezonehq.slack.com/archives/C0575SD66E5/p1712726575563089
2024-04-10 08:45:38 +00:00
Thomas Eizinger
8d49452668 ci: assert that nothing busy loops after the perf tests (#4546)
The clients, gateway and relay all employ an internal design that is
based on an eventloop. This gives us a lot of control in how various IO
components interact with each other. Great control also comes with a
source of bugs, the latest of which made the relay busy-loop once it
started relaying some traffic.

Eventloops are notoriously hard to unit-test because they compose
various IO bits together. Instead of writing unit tests, we can go and
assert the process state after the performance tests. Those generate a
fair bit of load on all our components but after that, they should
suspend.

The most effective tests survive even large refactorings and for that,
they need to be coded against a stable API / property. Asserting that
the process sleeps when it is idle from an application PoV is such a
property.

Related: #4511.
2024-04-09 07:09:50 +00:00
Thomas Eizinger
3951bafb60 chore(nix): add Rust nightly dev-shell and cargo-udeps (#4474) 2024-04-08 12:06:01 +00:00
Jamil
09532ea845 chore(ci): Add portal and relay downtime DNS resource tests (#4517)
Tests that DNS still works in the client with established connections
after the portal and/or relay go down.
2024-04-08 09:43:59 +00:00
Reactor Scram
74a81b2a56 test(gui-client): unit test for Linux IPC (#4277)
(After GA)

This adds a unit test for the Unix domain sockets that I intend to use
for process splitting on Linux.

The length-prefixed encoding and decoding are copied from `subzone`, but
most of that code will not be re-used since it's Windows-specific and
also specific to a Chromium-like process model, which won't work for
Firezone.
2024-04-02 19:34:24 +00:00
Reactor Scram
1e4ed7bad6 refactor(ci): move DNS control method up to docker-compose.yml (#4341)
This is part of a yak shave towards CI testing of #3812 

Moving the DNS control method out of `docker-compose.yml` and up to the
integration tests themselves allows us to test these scenarios:

- `systemd-resolved`
- `etc-resolv-conf`
- `systemd-resolved` but we're in a container where that won't work, so
we should gracefully degrade to just allowing IP/CIDR resources
2024-04-02 17:11:29 +00:00
Reactor Scram
023c885967 refactor(linux-client): extract all code to firezone-client-tunnel (#4448)
Refs #3713 

With this, the deb package for the Linux GUI Client contains a build of
the Linux CLI Client, at `/usr/bin/firezone-client-tunnel`. Future PRs
can add IPC to the code.

There is also a Windows stub, since Windows will eventually need a
tunnel process and a CLI Client.

In the future we might need to move or rename things, since the CLI
Clients and tunnel binaries for both Linux and Windows may all share
code or at least architecture. For now there is a slight duplication
with this being built as both "Firezone Client Tunnnel" and "Firezone
Linux Client"
2024-04-02 16:59:29 +00:00
Jamil
7c369e5b39 fix(gateway): Fix systemd gateway install script (#4407)
On some OSes (Debian 12) the script fails to get the correct version to
download (likely because of `sed` version), so this simplifies things a
bit.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
2024-03-31 15:56:24 +00:00
Jamil
c30138b38e chore(connlib): Remove atomicwrites and tokio::fs from apple compile path (#4395)
Fixes #4377 


Manually verified by running `nm` on the resulting binaries. I'll open
another PR to handle #4393

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-03-29 21:01:53 +00:00
Jamil
16337d57f3 refactor(connlib): Reduce log noisiness for GA (#4381)
Fixes #4380 
Fixes #4379
2024-03-28 20:51:59 +00:00
Gabi
24e0641871 chore: set rust log level to info for gateways and client (#4319)
- [x] Updated log level string for client and gateways to info or higher
- [x] Update logs to hide DNS information

I also removed `hickory_resolve` errors which could contain sensitive
info from our general error and hide the logs that specifically relates
to them.

@bmanifold double checking that the log levels in the gateway's `*.tf`
files are just used for our own gateways.

Also, the relays still have `debug`, since only we see that I think that
makes sense but double checking with @jamilbk

Fixes: #3618.

---------

Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-03-27 01:39:12 +00:00
Thomas Eizinger
18033eafec ci: ensure roaming between networks doesn't abort file download (#4213)
This adds an integration test that downloads a 10MB file from a server
and simulates the client roaming to another network while the download
is active.

We use a DNS resource for this to ensure it also doesn't take too long
in that case. DNS resources are what most users will be using and we
clear some internal DNS caches on connection failures. Hence, using a
DNS resource here is a somewhat roundabout way to test that we aren't
failing and re-establishing the connection but migrate it to a new
network path.
2024-03-26 05:44:59 +00:00
Reactor Scram
64f0427ef4 ci(gui-client): hide the Linux GUI deb since it's not ready yet (#4258)
It's still in the CI artifacts for easy testing, but there's no point
letting users see it since it's in the middle of the process split
re-architect
2024-03-21 23:49:34 +00:00
Reactor Scram
7fece80006 refactor(gui-client): refuse to ever be elevated on Linux (#4232)
Running as sudo / root causes a lot of problems for GUI programs, so
we're unwinding that. In this case we can go back to using Tauri's "open
URL" function, which is great.

Closes #4103
Refs #3713
Affects #3972 - I was finally able to debug it because it came up
constantly during this PR
2024-03-21 14:42:48 +00:00
Reactor Scram
b0904e382a chore: add crate for privileged Linux tunnel process (#4229)
Refs #3713 

```[tasklist]
### Before merging
- [ ] Is 'firezone-client-tunnel' okay for the binary name?
- [ ] Using a library and building it as two binaries is correct, right? `cargo run -p firezone-client-tunnel` takes 1 second. `cargo run -p firezone-gui-client --bin firezone-client-tunnel` takes 1m42s because it builds all the GUI deps.
```
2024-03-21 14:06:55 +00:00
Reactor Scram
ada9d896cf chore(gui-client): remove unused env var (#4234)
Must have been a hack to run the smoke test in CI, and was never
actually hooked up.
2024-03-20 22:20:29 +00:00
Reactor Scram
e05cbbe0a0 build(gui-client/linux): include an empty firezone-tunnel binary with the Tauri deb package (#4220)
I thought this was going to use `cargo-deb` but it was actually easy
with the Tauri deb bundling we already use.

```[tasklist]
### Before merging
- [x] Make sure every file in the Tauri deb is also in our deb (e.g. icons)
```
2024-03-20 14:11:41 +00:00
Reactor Scram
651ea3ae00 build(gui-client/linux): make sure debug symbols get uploaded for the Linux GUI client (#4217)
- Split up CI artifacts into "exe", "pkg", and "syms" so it's easy to
check they're being uploaded. This shouldn't affect published artifacts
- Set `strip = "none"` which seems to be necessary to get the debug
symbols in Linux, although they still end up in the exe and not the dwp
file 🤔 don't know why
- Test Linux stacktrace in CI

Stacktrace examples:
- On Linux we at least get function names, but we aren't getting line
numbers for some reason
https://github.com/firezone/firezone/actions/runs/8350493514/job/22857032124#step:10:268
- On Windows we also get line numbers, as before
https://github.com/firezone/firezone/actions/runs/8350493514/job/22857033367#step:11:351

I didn't test downloading the files and doing a stacktrace locally, but
I have batched that up for whenever I do a big manual test of the
CD-produced release artifacts:
https://github.com/firezone/firezone/issues/3887
2024-03-19 22:18:03 +00:00
Reactor Scram
74026d8b13 build(gui-client): disable AppImage bundling (#4216)
AppImages won't work with process splitting. (#3713)

As far as I can tell, they just produce one binary. Internally they use
FUSE or something to mount a squashfs image, but that image won't be
able to hook into systemd and run with root permissions and everything.
I don't think it's practical, and Tauri's AppImage bundling doesn't have
the features for it.

Even their deb bundler doesn't have any way to specify a path for a
daemon to be installed. The sidecar feature only seems intended for the
GUI app to call, not anything else on the system.

(There is such a thing as installing AppImages, but I don't think it's
worth pursuing - We should just do debs)
2024-03-19 17:26:25 +00:00
Reactor Scram
3ced2b3a20 ci(linux): fix incorrect use of grep -v (#4151)
I think I used `-v` wrong here. I meant it to negate a regular grep, but
it will probably always return true since it negates the matching, not
the exit code.

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-03-15 19:06:57 +00:00
Thomas Eizinger
62e082d47a refactor(connlib): make {Client,Gateway}State SANS-IO (#4096)
Resolves: #3929.
2024-03-14 23:44:36 +00:00
Reactor Scram
52cde610e1 feat(linux): make deep link auth work (#4102)
Right now it only works on my dev VM, not on my test VMs, due to #4053
and #4103, but it passes tests and should be safe to merge.

There's one doc fix and one script fix which are unrelated and could be
their own PRs, but they'd be tiny, so I left them in here.

Ref #4106 and #3713 for the plan to fix all this by splitting the tunnel
process off so that the GUI runs as a normal user.
2024-03-13 18:11:04 +00:00
Jamil
574585d146 chore(ci): Add debug/ and perf/ prefix to some images (#4104)
Followup from #4100:


- Add `perf/relay` and `debug/relay` etc data plane images in
`firezone-staging`.
- The `perf` images are `debug` stage images and have tooling installed,
but use release binaries.
- The `debug` images are `debug` binaries inside `debug` images
- `firezone-prod` contains only release binaries -- these image names
haven't changed
2024-03-12 20:27:32 +00:00
Jamil
391150f0e1 chore(ci): Fix new issues in cd.yml (#4085)
Fixes some issues encountered after the merge of #4049 

- Fix performance tests to only run using base_ref and head_ref to avoid
dependence on `main`
- Fixes some typos
- Prevents a catch-22 condition where breaking compatibility meant we
wouldn't be able to deploy production
2024-03-12 02:06:19 +00:00
Thomas Eizinger
ea53ae7a55 feat(snownet): timeout connections if we don't receive a candidate within 10s (#3790)
Previously, we had a dedicated timer for this within the tunnel
implementation. Now that we have control over the internals of our
connection via `snownet`, we can timeout the connection if we don't
receive a candidate from the remote within 10s.
2024-03-09 08:03:57 +00:00
Reactor Scram
7211e88338 feat(linux-client): generate firezone-id (device ID) automatically if it's not provided at launch (#3920)
Closes #3815 

Changes that are breaking (but these aren't in production so it should
be okay)

- Windows, renaming `device_id.json` to `firezone-id.json` to match the
rest of the code
- Linux GUI, storing the firezone-id under `/var/lib` instead of under
`$HOME`
- Linux GUI, bails out if not run with `sudo --preserve-env` by
detecting `$HOME == root` or `$USER != root`

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-03-08 16:13:59 +00:00
Jamil
f358f824a1 chore(devops): Make some vars optional in systemd install script (#4017) 2024-03-06 17:18:25 -08:00
Jamil
92261be9e0 chore(devops): Use separate script to install systemd gateway (#4016)
This prevents us from backslack escape hell when trying to expose this
script in different contexts.

Needed as a pre-req to #4011
2024-03-06 17:04:22 -08:00
Jamil
9cab250696 chore(windows): Sign internal exe using beforeBundleCommand (#3994)
Refs #3230 

It looks like we need to sign the internal exe before it gets bundled
too. We can use `beforeBundleCommand` to do so.

Soon, Tauri should have native support for this exact scenario:
https://github.com/tauri-apps/tauri/pull/8718
2024-03-06 16:00:54 +00:00
Jamil
19e833262f chore(windows): Sign windows exe too (#3992)
Fixes #3230
2024-03-05 22:35:24 -08:00
Reactor Scram
f11edae097 test(windows): run the smoke test twice to shake out issues in the script (#3889)
I may use this for #3867 , or I may do that in-process, either way this
could be handy.
2024-03-04 23:15:58 +00:00
Reactor Scram
789d2160de refactor(linux): make a place for reverting /etc/resolv.conf (#3822)
Makes progress towards #3817

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-03-01 19:09:01 +00:00