firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-03-22 11:41:52 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	d26df944c0	ci: reference GitHub actions by hash (#7724 ) To improve supply-chain security, reference all GitHub actions using the hash of the released tag. GitHub recommends to do this for third-party actions (https://docs.github.com/en/actions/security-for-github-actions/security-guides/security-hardening-for-github-actions#using-third-party-actions). In order to make our CI more deterministic, I opted to do it for all our actions. This means any change to our workflow configuration requires a source code change and thus passing CI on our end. Dependabot will automatically issue PRs for these actions and update the comment with the new version next to them. Resolves: #2497.	2025-01-12 17:35:52 +00:00
Jamil	6f7f6a4f34	style: Enforce code style across all supported languages using Prettier (#7322 ) This ensure that we run prettier across all supported filetypes to check for any formatting / style inconsistencies. Previously, it was only run for files in the website/ directory using a deprecated pre-commit plugin. The benefit to keeping this in our pre-commit config is that devs can optionally run these checks locally with `pre-commit run --config .github/pre-commit-config.yaml`. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-11-13 00:19:15 +00:00
Thomas Eizinger	9de1119b69	feat(connlib): support DNS over TCP (#6944 ) At present, `connlib` only supports DNS over UDP on port 53. Responses over UDP are size-constrained on the IP MTU and thus, not all DNS responses fit into a UDP packet. RFC9210 therefore mandates that all DNS resolvers must also support DNS over TCP to overcome this limitation [0]. Handling UDP packets is easy, handling TCP streams is more difficult because we need to effectively implement a valid TCP state machine. Building on top of a lot of earlier work (linked in issue), this is relatively easy because we can now simply import `dns_over_tcp::{Client,Server}` which do the heavy lifting of sending and receiving the correct packets for us. The main aspects of the integration that are worth pointing out are: - We can handle at most 10 concurrent DNS TCP connections _per defined resolver_. The assumption here is that most applications will first query for DNS records over UDP and only fall back to TCP if the response is truncated. Additionally, we assume that clients will close the TCP connections once they no longer need it. - Errors on the TCP stream to an upstream resolver result in `SERVFAIL` responses to the client. - All TCP connections to upstream resolvers get reset when we roam, all currently ongoing queries will be answered with `SERVFAIL`. - Upon network reset (i.e. roaming), we also re-allocate new local ports for all TCP sockets, similar to our UDP sockets. Resolves: #6140. [0]: https://www.ietf.org/rfc/rfc9210.html#section-3-5	2024-10-18 03:40:50 +00:00
Thomas Eizinger	650e31c784	ci: remove outdated integration tests (#6922 ) Since we've added these tests, `connlib`'s test coverage has increased significantly to the point where we don't need all of them anymore. Especially pretty much everything in regards to relays is unnecessary to be tested using docker. These integration tests are sometimes flaky due to docker not starting or images failing to pull. Thus, having fewer of them is better because it increases CI reliability. Also, there are only so many jobs that GitHub will execute in parallel so having less jobs is better for that too. Resolves: #6451. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2024-10-08 16:39:18 +00:00
Thomas Eizinger	35017537c7	feat(gateway): allow out-of-order `allow_access` requests (#6403 ) Currently, the gateway requires a strict ordering of first receiving a `request_connection` message, following by multiple `allow_access` messages. Additionally, access can be granted as part of the initial `request_connection` message too. This isn't an ideal design. Setting up a new connection is infallible, all we need to do is send our ICE credentials back to the client. However, untangling that will require a bit more effort. Starting with #6335, following this strict order on the client is a more difficult. Whilst we can send them in order, it is harder to maintain those ordering guarantees across all our systems. To avoid this, we change the gateway to perform an upsert for its local ACLs for a client. In case that an `allow_access` call would somehow get to the gateway earlier, we can simply already create the `Peer` and only set up the actual connection later. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-08-28 13:10:06 +00:00
Jamil	84a981f668	refactor(ci): Remove browser-based integration tests (#6435 ) Fixes a new issue with puppeteer, chromium 128, and Alpine 3.20 that's causing failing browser tests. See more: https://github.com/puppeteer/puppeteer/issues/12189 Failure: https://github.com/firezone/firezone/actions/runs/10549430305/job/29224528663?pr=6391 Unfortunately, puppeteer's embedded browser doesn't seem to want to run in Alpine: https://github.com/firezone/firezone/actions/runs/10563167497/job/29265175731?pr=6435#step:6:56 Fixing this is proving very difficult since we can't seem to use puppeteer with the latest Alpine images, so I questioned the need to have these in at all. These tests were added at a time where the DNS mappings were brittle, so we wanted to verify that relayed and direct connections held up as we deployed. This is no longer the case, and we also now have much more unit test coverage around these things, so given the pain of maintaining these (and the lack of a current solution to the above), they are removed. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com>	2024-08-26 20:01:00 +00:00
Jamil	0c6cd4a804	fix(ci): Add http test server image specifiers to CI (#6208 ) - Adds `http_test_server_image` to inputs so that it gets set properly for CI (`debug`) and CD (`perf`) - Updates `dev` -> `debug` in docker-compose.yml to fix pulls - Fixes issue with seeds and relevant docs from #6205	2024-08-07 12:15:00 -07:00
Thomas Eizinger	5687befc9d	ci: use correct service name in `docker-compose.yml` (#6055 ) The compose service I defined is called `otel` not `otlp`. With this fix in place, the relay successfully connects to the OTLP exporter. it is worthwhile noting that the connection to the OTLP exporter itself is not critical for relay operation. Even if it fails, it won't affect the actual data plane. I do think it makes sense to still have a working OTLP exporter in the compose definition. As it makes it easier to test whether the ingestion of metrics and traces works as expected.	2024-07-27 02:48:08 +00:00
Reactor Scram	6862213cc2	fix(headless-client/linux): only notify `systemd` that we're up after Resources are available (#6026 ) Closes #5912 Before this, I had the `--exit` CLI flag and the `sd_notify` call hanging off the wrong callback.	2024-07-26 18:53:08 +00:00
Jamil	7034344334	ci: Sleep 3 seconds after upping services for integration tests (#6019 ) Fixes #4921	2024-07-24 15:49:39 +00:00
Jamil	a45acc04db	fix(connlib): set default `firezone_tunnel` log level from `trace` to `debug` for development and some `ci` (#5411 ) "Encapsulated packet" is now spamming dev clients, so this level is changed to `debug` by default in dev builds. ``` 2024-06-17 14:04:15.419 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.419 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.420 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.420 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.420 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.420 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.421 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.421 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.422 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.422 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.422 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.423 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet ```	2024-06-18 04:48:52 +00:00
Jamil	83340b9252	ci: Don't run browser tests on release images (#4722 ) Fixes https://github.com/firezone/firezone/actions/runs/8763390111	2024-04-20 00:37:12 -07:00
Gabi	adc0bb73f7	test(client): add reconnection tests from a client using a headless browser (#4569 ) Considered using Elixir and Rust to write the tests. For Elixir, `wallaby` doesn't seem to have a way to attach to an existing `chromium` instance, launching it each time, which makes it hard to coordinate with the relay restart. For Rust we considered `thirtyfour` which would be very nice since we could test both firefox and chrome but each time it connects to the instance it launches a new session making it hard to test the DNS cache behavior. We also considered `chrome_headless` for Rust it needs a small patch to prevent it from closing the browser after `Drop` but it still presents a problem, since it has no easy way to retrieve if loading a page has succeeded. There are some workarounds such as retrieving the title that we could have used but after some testing they are quite finnicky and we don't want that for CI. So I ended up settling for TypeScript but I'm open to other options, or a fix for the previous ones! There are some modifications still incoming for this PR, around the test name and that sleep in the middle of the test doesn't look good so I will probably add some retries, but the gist is here, will keep it in draft until we expect it to be passing. So feel free to do some initial reviews. Note: the number of lines changed is greatly exaggerated by `package.lock` --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-04-20 06:57:07 +00:00
Thomas Eizinger	51089b89e7	feat(connlib): smoothly migrate relayed connections (#4568 ) Whenever we receive a `relays_presence` message from the portal, we invalidate the candidates of all now disconnected relays and make allocations on the new ones. This triggers signalling of new candidates to the remote party and migrates the connection to the newly nominated socket. This still relies on #4613 until we have #4634. Resolves: #4548. --------- Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-04-20 06:16:35 +00:00
Reactor Scram	bc22fb2bf2	test(linux-client): move linux-group test out of integration tests (#4692 ) Closes #4669 This should stop the problem of `linux-group` failing because of trying to test an older release that doesn't have the right CLI features --------- Co-authored-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-04-19 02:52:31 +00:00
Reactor Scram	68016a8a56	test(linux-client): disable failing test (#4689 )	2024-04-18 19:40:06 +00:00
Reactor Scram	926ffe6f07	test(linux-client): fix linux-group integration test (#4671 ) Closes #4669 (Once I figure out the cause and then fix it)	2024-04-18 14:05:24 +00:00
Reactor Scram	6da6fc8569	test(linux-client): temporarily disable failing linux-group integration test (#4670 ) Refs #4669. That issue will be for fixing and re-enabling the test. This is only needed for Linux IPC which isn't in production yet, so it's easier to disable first and debug second	2024-04-17 23:48:22 +00:00
Reactor Scram	2f6f2ef260	test(linux-client): check if we can add the user to a group in a CI test (#4600 ) Refs #4513 The next step after this is to use this to test security in the Linux IPC code, it should reject any IPC commands from users not in the `firezone` group.	2024-04-17 20:40:27 +00:00
Reactor Scram	c01c3c1dd8	test(integration): remove redundant `integration-test-` prefix (#4601 ) They all have the same prefix anyway, and it uses up real estate in the CI page After <img width="311" alt="image" src="https://github.com/firezone/firezone/assets/13400041/8028f9bf-5c13-4170-9e01-06bfd393751c"> Before <img width="292" alt="image" src="https://github.com/firezone/firezone/assets/13400041/8cabf67e-6be2-4719-b06f-4a76cf5c8111">	2024-04-12 18:15:11 +00:00
Thomas Eizinger	be1a719e2c	chore(relay): perform graceful shutdown upon receiving SIGTERM (#4552 ) Upon receiving a SIGTERM, we immediately disconnect from the websocket connection to the portal and set a flag that we are shutting down. Once we are disconnected from the portal and no longer have an active allocations, we exit with 0. A repeated SIGTERM signal will interrupt this process and force the relay to shutdown. Disconnecting from the portal will (eventually) trigger a message to clients and gateways that this relay should no longer be used. Thus, depending on the timeout our supervisor has configured after sending SIGTERM, the relay will continue all TURN operations until the number of allocations drops to 0. Currently, we also allow clients to make new allocations and refreshing existing allocations. In the future, it may make sense to implement a dedicated status code and refuse `ALLOCATE` and `REFRESH` messages whilst we are shutting down. Related: #4548. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-04-12 08:45:08 +00:00
Jamil	09532ea845	chore(ci): Add portal and relay downtime DNS resource tests (#4517 ) Tests that DNS still works in the client with established connections after the portal and/or relay go down.	2024-04-08 09:43:59 +00:00
Reactor Scram	1e4ed7bad6	refactor(ci): move DNS control method up to docker-compose.yml (#4341 ) This is part of a yak shave towards CI testing of #3812 Moving the DNS control method out of `docker-compose.yml` and up to the integration tests themselves allows us to test these scenarios: - `systemd-resolved` - `etc-resolv-conf` - `systemd-resolved` but we're in a container where that won't work, so we should gracefully degrade to just allowing IP/CIDR resources	2024-04-02 17:11:29 +00:00
Thomas Eizinger	18033eafec	ci: ensure roaming between networks doesn't abort file download (#4213 ) This adds an integration test that downloads a 10MB file from a server and simulates the client roaming to another network while the download is active. We use a DNS resource for this to ensure it also doesn't take too long in that case. DNS resources are what most users will be using and we clear some internal DNS caches on connection failures. Hence, using a DNS resource here is a somewhat roundabout way to test that we aren't failing and re-establishing the connection but migrate it to a new network path.	2024-03-26 05:44:59 +00:00
Andrew Dryga	a85b9ab185	chore(infra): Deploy domain app on a separate instance and enable background jobs on it (#4160 ) Closes #3801	2024-03-16 08:58:20 -06:00
Jamil	574585d146	chore(ci): Add debug/ and perf/ prefix to some images (#4104 ) Followup from #4100: - Add `perf/relay` and `debug/relay` etc data plane images in `firezone-staging`. - The `perf` images are `debug` stage images and have tooling installed, but use release binaries. - The `debug` images are `debug` binaries inside `debug` images - `firezone-prod` contains only release binaries -- these image names haven't changed	2024-03-12 20:27:32 +00:00
Jamil	391150f0e1	chore(ci): Fix new issues in cd.yml (#4085 ) Fixes some issues encountered after the merge of #4049 - Fix performance tests to only run using base_ref and head_ref to avoid dependence on `main` - Fixes some typos - Prevents a catch-22 condition where breaking compatibility meant we wouldn't be able to deploy production	2024-03-12 02:06:19 +00:00
Jamil	6575e0ca26	chore(ci): Refactor CI to use prod images in staging and prevent accidental hotfix breakages (#4049 ) - Runs release asset builds simultaneously with `deploy-staging`. Those don't depend on each other. - Prevents running some build workflows in CD because they're run already in the PR and in the merge group, and the risk of semantic conflict is negligible - Run `release` assets in staging - Adds `compatibility_tests`: To successfully introduce a breaking change in the control / data plane APIs, you must now "Merge as Administrator" - Since `CI` is no longer run on `main`, caching needed to be refactored to make sense again - Since `CI` is no longer run on `main`, the Elixir `migrations_and_seeds_test` had to be rewritten. This now tests migrations using `git checkout` instead of importing `main`'s DB dump. - Move tauri builds to its own workflow so we can trigger Linux and Windows builds manually on an adhoc basis like we do for the Swift and Kotlin builds - Add a new `hotfix` workflow that will run `compatibility_tests` with the latest published images - Add `workflow_dispatch` to trigger `CD` manually for testing purposes (cc @ReactorScram) Refs #3995	2024-03-11 20:01:34 +00:00

28 Commits