Commit Graph

28 Commits

Author SHA1 Message Date
Thomas Eizinger
d26df944c0 ci: reference GitHub actions by hash (#7724)
To improve supply-chain security, reference all GitHub actions using the
hash of the released tag. GitHub recommends to do this for third-party
actions
(https://docs.github.com/en/actions/security-for-github-actions/security-guides/security-hardening-for-github-actions#using-third-party-actions).
In order to make our CI more deterministic, I opted to do it for all our
actions. This means any change to our workflow configuration requires a
source code change and thus passing CI on our end.

Dependabot will automatically issue PRs for these actions and update the
comment with the new version next to them.

Resolves: #2497.
2025-01-12 17:35:52 +00:00
Jamil
6f7f6a4f34 style: Enforce code style across all supported languages using Prettier (#7322)
This ensure that we run prettier across all supported filetypes to check
for any formatting / style inconsistencies. Previously, it was only run
for files in the website/ directory using a deprecated pre-commit
plugin.

The benefit to keeping this in our pre-commit config is that devs can
optionally run these checks locally with `pre-commit run --config
.github/pre-commit-config.yaml`.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-11-13 00:19:15 +00:00
Thomas Eizinger
9de1119b69 feat(connlib): support DNS over TCP (#6944)
At present, `connlib` only supports DNS over UDP on port 53. Responses
over UDP are size-constrained on the IP MTU and thus, not all DNS
responses fit into a UDP packet. RFC9210 therefore mandates that all DNS
resolvers must also support DNS over TCP to overcome this limitation
[0].

Handling UDP packets is easy, handling TCP streams is more difficult
because we need to effectively implement a valid TCP state machine.

Building on top of a lot of earlier work (linked in issue), this is
relatively easy because we can now simply import
`dns_over_tcp::{Client,Server}` which do the heavy lifting of sending
and receiving the correct packets for us.

The main aspects of the integration that are worth pointing out are:

- We can handle at most 10 concurrent DNS TCP connections _per defined
resolver_. The assumption here is that most applications will first
query for DNS records over UDP and only fall back to TCP if the response
is truncated. Additionally, we assume that clients will close the TCP
connections once they no longer need it.
- Errors on the TCP stream to an upstream resolver result in `SERVFAIL`
responses to the client.
- All TCP connections to upstream resolvers get reset when we roam, all
currently ongoing queries will be answered with `SERVFAIL`.
- Upon network reset (i.e. roaming), we also re-allocate new local ports
for all TCP sockets, similar to our UDP sockets.

Resolves: #6140.

[0]: https://www.ietf.org/rfc/rfc9210.html#section-3-5
2024-10-18 03:40:50 +00:00
Thomas Eizinger
650e31c784 ci: remove outdated integration tests (#6922)
Since we've added these tests, `connlib`'s test coverage has increased
significantly to the point where we don't need all of them anymore.
Especially pretty much everything in regards to relays is unnecessary to
be tested using docker.

These integration tests are sometimes flaky due to docker not starting
or images failing to pull. Thus, having fewer of them is better because
it increases CI reliability. Also, there are only so many jobs that
GitHub will execute in parallel so having less jobs is better for that
too.

Resolves: #6451.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2024-10-08 16:39:18 +00:00
Thomas Eizinger
35017537c7 feat(gateway): allow out-of-order allow_access requests (#6403)
Currently, the gateway requires a strict ordering of first receiving a
`request_connection` message, following by multiple `allow_access`
messages. Additionally, access can be granted as part of the initial
`request_connection` message too.

This isn't an ideal design. Setting up a new connection is infallible,
all we need to do is send our ICE credentials back to the client.
However, untangling that will require a bit more effort.

Starting with #6335, following this strict order on the client is a more
difficult. Whilst we can send them in order, it is harder to maintain
those ordering guarantees across all our systems.

To avoid this, we change the gateway to perform an upsert for its local
ACLs for a client. In case that an `allow_access` call would somehow get
to the gateway earlier, we can simply already create the `Peer` and only
set up the actual connection later.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-08-28 13:10:06 +00:00
Jamil
84a981f668 refactor(ci): Remove browser-based integration tests (#6435)
Fixes a new issue with puppeteer, chromium 128, and Alpine 3.20 that's
causing failing browser tests.

See more: https://github.com/puppeteer/puppeteer/issues/12189

Failure:

https://github.com/firezone/firezone/actions/runs/10549430305/job/29224528663?pr=6391

Unfortunately, puppeteer's embedded browser doesn't seem to want to run
in Alpine:


https://github.com/firezone/firezone/actions/runs/10563167497/job/29265175731?pr=6435#step:6:56


Fixing this is proving very difficult since we can't seem to use
puppeteer with the latest Alpine images, so I questioned the need to
have these in at all. These tests were added at a time where the DNS
mappings were brittle, so we wanted to verify that relayed and direct
connections held up as we deployed.

This is no longer the case, and we also now have much more unit test
coverage around these things, so given the pain of maintaining these
(and the lack of a current solution to the above), they are removed.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
2024-08-26 20:01:00 +00:00
Jamil
0c6cd4a804 fix(ci): Add http test server image specifiers to CI (#6208)
- Adds `http_test_server_image` to inputs so that it gets set properly
for CI (`debug`) and CD (`perf`)
- Updates `dev` -> `debug` in docker-compose.yml to fix pulls
- Fixes issue with seeds and relevant docs from #6205
2024-08-07 12:15:00 -07:00
Thomas Eizinger
5687befc9d ci: use correct service name in docker-compose.yml (#6055)
The compose service I defined is called `otel` not `otlp`. With this fix
in place, the relay successfully connects to the OTLP exporter.

it is worthwhile noting that the connection to the OTLP exporter itself
is not critical for relay operation. Even if it fails, it won't affect
the actual data plane. I do think it makes sense to still have a working
OTLP exporter in the compose definition. As it makes it easier to test
whether the ingestion of metrics and traces works as expected.
2024-07-27 02:48:08 +00:00
Reactor Scram
6862213cc2 fix(headless-client/linux): only notify systemd that we're up after Resources are available (#6026)
Closes #5912

Before this, I had the `--exit` CLI flag and the `sd_notify` call
hanging off the wrong callback.
2024-07-26 18:53:08 +00:00
Jamil
7034344334 ci: Sleep 3 seconds after upping services for integration tests (#6019)
Fixes #4921
2024-07-24 15:49:39 +00:00
Jamil
a45acc04db fix(connlib): set default firezone_tunnel log level from trace to debug for development and some ci (#5411)
"Encapsulated packet" is now spamming dev clients, so this level is
changed to `debug` by default in dev builds.

```
2024-06-17 14:04:15.419  6911-7520  connlib                 dev.firezone.android                 V  firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet
2024-06-17 14:04:15.419  6911-7520  connlib                 dev.firezone.android                 V  firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet
2024-06-17 14:04:15.420  6911-7520  connlib                 dev.firezone.android                 V  firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet
2024-06-17 14:04:15.420  6911-7520  connlib                 dev.firezone.android                 V  firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet
2024-06-17 14:04:15.420  6911-7520  connlib                 dev.firezone.android                 V  firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet
2024-06-17 14:04:15.420  6911-7520  connlib                 dev.firezone.android                 V  firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet
2024-06-17 14:04:15.421  6911-7520  connlib                 dev.firezone.android                 V  firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet
2024-06-17 14:04:15.421  6911-7520  connlib                 dev.firezone.android                 V  firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet
2024-06-17 14:04:15.422  6911-7520  connlib                 dev.firezone.android                 V  firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet
2024-06-17 14:04:15.422  6911-7520  connlib                 dev.firezone.android                 V  firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet
2024-06-17 14:04:15.422  6911-7520  connlib                 dev.firezone.android                 V  firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet
2024-06-17 14:04:15.423  6911-7520  connlib                 dev.firezone.android                 V  firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet
```
2024-06-18 04:48:52 +00:00
Jamil
83340b9252 ci: Don't run browser tests on release images (#4722)
Fixes https://github.com/firezone/firezone/actions/runs/8763390111
2024-04-20 00:37:12 -07:00
Gabi
adc0bb73f7 test(client): add reconnection tests from a client using a headless browser (#4569)
Considered using Elixir and Rust to write the tests.

For Elixir, `wallaby` doesn't seem to have a way to attach to an
existing `chromium` instance, launching it each time, which makes it
hard to coordinate with the relay restart.

For Rust we considered `thirtyfour` which would be very nice since we
could test both firefox and chrome but each time it connects to the
instance it launches a new session making it hard to test the DNS cache
behavior.

We also considered `chrome_headless` for Rust it needs a small patch to
prevent it from closing the browser after `Drop` but it still presents a
problem, since it has no easy way to retrieve if loading a page has
succeeded. There are some workarounds such as retrieving the title that
we could have used but after some testing they are quite finnicky and we
don't want that for CI.

So I ended up settling for TypeScript but I'm open to other options, or
a fix for the previous ones!

There are some modifications still incoming for this PR, around the test
name and that sleep in the middle of the test doesn't look good so I
will probably add some retries, but the gist is here, will keep it in
draft until we expect it to be passing.

So feel free to do some initial reviews.

Note: the number of lines changed is greatly exaggerated by
`package.lock`

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-20 06:57:07 +00:00
Thomas Eizinger
51089b89e7 feat(connlib): smoothly migrate relayed connections (#4568)
Whenever we receive a `relays_presence` message from the portal, we
invalidate the candidates of all now disconnected relays and make
allocations on the new ones. This triggers signalling of new candidates
to the remote party and migrates the connection to the newly nominated
socket.

This still relies on #4613 until we have #4634.

Resolves: #4548.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-04-20 06:16:35 +00:00
Reactor Scram
bc22fb2bf2 test(linux-client): move linux-group test out of integration tests (#4692)
Closes #4669 

This should stop the problem of `linux-group` failing because of trying
to test an older release that doesn't have the right CLI features

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-04-19 02:52:31 +00:00
Reactor Scram
68016a8a56 test(linux-client): disable failing test (#4689) 2024-04-18 19:40:06 +00:00
Reactor Scram
926ffe6f07 test(linux-client): fix linux-group integration test (#4671)
Closes #4669 
(Once I figure out the cause and then fix it)
2024-04-18 14:05:24 +00:00
Reactor Scram
6da6fc8569 test(linux-client): temporarily disable failing linux-group integration test (#4670)
Refs #4669. That issue will be for fixing and re-enabling the test.

This is only needed for Linux IPC which isn't in production yet, so it's
easier to disable first and debug second
2024-04-17 23:48:22 +00:00
Reactor Scram
2f6f2ef260 test(linux-client): check if we can add the user to a group in a CI test (#4600)
Refs #4513

The next step after this is to use this to test security in the Linux
IPC code, it should reject any IPC commands from users not in the
`firezone` group.
2024-04-17 20:40:27 +00:00
Reactor Scram
c01c3c1dd8 test(integration): remove redundant integration-test- prefix (#4601)
They all have the same prefix anyway, and it uses up real estate in the
CI page

**After**
<img width="311" alt="image"
src="https://github.com/firezone/firezone/assets/13400041/8028f9bf-5c13-4170-9e01-06bfd393751c">

**Before**
<img width="292" alt="image"
src="https://github.com/firezone/firezone/assets/13400041/8cabf67e-6be2-4719-b06f-4a76cf5c8111">
2024-04-12 18:15:11 +00:00
Thomas Eizinger
be1a719e2c chore(relay): perform graceful shutdown upon receiving SIGTERM (#4552)
Upon receiving a SIGTERM, we immediately disconnect from the websocket
connection to the portal and set a flag that we are shutting down.

Once we are disconnected from the portal and no longer have an active
allocations, we exit with 0. A repeated SIGTERM signal will interrupt
this process and force the relay to shutdown.

Disconnecting from the portal will (eventually) trigger a message to
clients and gateways that this relay should no longer be used. Thus,
depending on the timeout our supervisor has configured after sending
SIGTERM, the relay will continue all TURN operations until the number of
allocations drops to 0.

Currently, we also allow clients to make new allocations and refreshing
existing allocations. In the future, it may make sense to implement a
dedicated status code and refuse `ALLOCATE` and `REFRESH` messages
whilst we are shutting down.

Related: #4548.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-04-12 08:45:08 +00:00
Jamil
09532ea845 chore(ci): Add portal and relay downtime DNS resource tests (#4517)
Tests that DNS still works in the client with established connections
after the portal and/or relay go down.
2024-04-08 09:43:59 +00:00
Reactor Scram
1e4ed7bad6 refactor(ci): move DNS control method up to docker-compose.yml (#4341)
This is part of a yak shave towards CI testing of #3812 

Moving the DNS control method out of `docker-compose.yml` and up to the
integration tests themselves allows us to test these scenarios:

- `systemd-resolved`
- `etc-resolv-conf`
- `systemd-resolved` but we're in a container where that won't work, so
we should gracefully degrade to just allowing IP/CIDR resources
2024-04-02 17:11:29 +00:00
Thomas Eizinger
18033eafec ci: ensure roaming between networks doesn't abort file download (#4213)
This adds an integration test that downloads a 10MB file from a server
and simulates the client roaming to another network while the download
is active.

We use a DNS resource for this to ensure it also doesn't take too long
in that case. DNS resources are what most users will be using and we
clear some internal DNS caches on connection failures. Hence, using a
DNS resource here is a somewhat roundabout way to test that we aren't
failing and re-establishing the connection but migrate it to a new
network path.
2024-03-26 05:44:59 +00:00
Andrew Dryga
a85b9ab185 chore(infra): Deploy domain app on a separate instance and enable background jobs on it (#4160)
Closes #3801
2024-03-16 08:58:20 -06:00
Jamil
574585d146 chore(ci): Add debug/ and perf/ prefix to some images (#4104)
Followup from #4100:


- Add `perf/relay` and `debug/relay` etc data plane images in
`firezone-staging`.
- The `perf` images are `debug` stage images and have tooling installed,
but use release binaries.
- The `debug` images are `debug` binaries inside `debug` images
- `firezone-prod` contains only release binaries -- these image names
haven't changed
2024-03-12 20:27:32 +00:00
Jamil
391150f0e1 chore(ci): Fix new issues in cd.yml (#4085)
Fixes some issues encountered after the merge of #4049 

- Fix performance tests to only run using base_ref and head_ref to avoid
dependence on `main`
- Fixes some typos
- Prevents a catch-22 condition where breaking compatibility meant we
wouldn't be able to deploy production
2024-03-12 02:06:19 +00:00
Jamil
6575e0ca26 chore(ci): Refactor CI to use prod images in staging and prevent accidental hotfix breakages (#4049)
- Runs release asset builds simultaneously with `deploy-staging`. Those
don't depend on each other.
- Prevents running some build workflows in CD because they're run
already in the PR and in the merge group, and the risk of semantic
conflict is negligible
- Run `release` assets in staging
- Adds `compatibility_tests`: **To successfully introduce a breaking
change in the control / data plane APIs, you must now "Merge as
Administrator"**
- Since `CI` is no longer run on `main`, caching needed to be refactored
to make sense again
- Since `CI` is no longer run on `main`, the Elixir
`migrations_and_seeds_test` had to be rewritten. This now tests
migrations using `git checkout` instead of importing `main`'s DB dump.
- Move tauri builds to its own workflow so we can trigger Linux and
Windows builds manually on an adhoc basis like we do for the Swift and
Kotlin builds
- Add a new `hotfix` workflow that will run `compatibility_tests` with
the latest published images
- Add `workflow_dispatch` to trigger `CD` manually for testing purposes
(cc @ReactorScram)


Refs #3995
2024-03-11 20:01:34 +00:00