firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 10:18:54 +00:00

Author	SHA1	Message	Date
dependabot[bot]	001cedd844	build(deps): bump actions/upload-artifact from 4.6.2 to 5.0.0 (#10950 ) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.6.2 to 5.0.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/upload-artifact/releases">actions/upload-artifact's releases</a>.</em></p> <blockquote> <h2>v5.0.0</h2> <h2>What's Changed</h2> <p><strong>BREAKING CHANGE:</strong> this update supports Node <code>v24.x</code>. This is not a breaking change per-se but we're treating it as such.</p> <ul> <li>Update README.md by <a href="https://github.com/GhadimiR"><code>@GhadimiR</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/681">actions/upload-artifact#681</a></li> <li>Update README.md by <a href="https://github.com/nebuk89"><code>@nebuk89</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/712">actions/upload-artifact#712</a></li> <li>Readme: spell out the first use of GHES by <a href="https://github.com/danwkennedy"><code>@danwkennedy</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/727">actions/upload-artifact#727</a></li> <li>Update GHES guidance to include reference to Node 20 version by <a href="https://github.com/patrikpolyak"><code>@patrikpolyak</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/725">actions/upload-artifact#725</a></li> <li>Bump <code>@actions/artifact</code> to <code>v4.0.0</code></li> <li>Prepare <code>v5.0.0</code> by <a href="https://github.com/danwkennedy"><code>@danwkennedy</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/734">actions/upload-artifact#734</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/GhadimiR"><code>@GhadimiR</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/681">actions/upload-artifact#681</a></li> <li><a href="https://github.com/nebuk89"><code>@nebuk89</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/712">actions/upload-artifact#712</a></li> <li><a href="https://github.com/danwkennedy"><code>@danwkennedy</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/727">actions/upload-artifact#727</a></li> <li><a href="https://github.com/patrikpolyak"><code>@patrikpolyak</code></a> made their first contribution in <a href="https://redirect.github.com/actions/upload-artifact/pull/725">actions/upload-artifact#725</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v4...v5.0.0">https://github.com/actions/upload-artifact/compare/v4...v5.0.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`330a01c490`"><code>330a01c</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/734">#734</a> from actions/danwkennedy/prepare-5.0.0</li> <li><a href="`03f2824452`"><code>03f2824</code></a> Update <code>github.dep.yml</code></li> <li><a href="`905a1ecb59`"><code>905a1ec</code></a> Prepare <code>v5.0.0</code></li> <li><a href="`2d9f9cdfa9`"><code>2d9f9cd</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/725">#725</a> from patrikpolyak/patch-1</li> <li><a href="`9687587dec`"><code>9687587</code></a> Merge branch 'main' into patch-1</li> <li><a href="`2848b2cda0`"><code>2848b2c</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/727">#727</a> from danwkennedy/patch-1</li> <li><a href="`9b511775fd`"><code>9b51177</code></a> Spell out the first use of GHES</li> <li><a href="`cd231ca1ed`"><code>cd231ca</code></a> Update GHES guidance to include reference to Node 20 version</li> <li><a href="`de65e23aa2`"><code>de65e23</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/712">#712</a> from actions/nebuk89-patch-1</li> <li><a href="`8747d8cd76`"><code>8747d8c</code></a> Update README.md</li> <li>Additional commits viewable in <a href="`ea165f8d65...330a01c490`">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/upload-artifact&package-manager=github_actions&previous-version=4.6.2&new-version=5.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-11-24 15:51:22 +00:00
Thomas Eizinger	103aa16b2f	fix: use our own docker install action (#10877 )	2025-11-13 15:16:07 +11:00
Thomas Eizinger	4e95dd1cb6	ci: fail fast inside the merge queue (#10746 ) Setting `fail-fast: false` unsurprisingly makes our CI fail pretty slowly. This is especially noticable in the merge queue where a long-running job could still hold up the entire queue even though a different job has failed already and the PR is never going to make it in anyway. To avoid this scenario, we set `fail-fast: true` whenever we are in the merge queue.	2025-10-28 10:42:02 -07:00
Thomas Eizinger	6a538368cb	feat(gateway): add flow-logs MVP (#10576 ) Network flow logs are a common feature of VPNs. Due to the nature of a shared exit node, it is of great interest to a network analyst, which TCP connections are getting routed through the tunnel, who is initiating them, for long do they last and how much traffic is sent across them. With this PR, the Firezone Gateway gains the ability of detecting the TCP and UDP flows that are being routed through it. The information we want to attach to these flows is spread out over several layers of the packet handling code. To simplify the implementation and not complicate the APIs unnecessarily, we chose to rely on TLS (thread-local storage) for gathering all the necessary data as a packet gets passed through the various layers. When using a const initializer, the overhead of a TLS variable over an actual local variable is basically zero. The entire routing state of the Gateway is also never sent across any threads, making TLS variables a particularly good choice for this problem. In its MVP form, the detected flows are only emitted on stdout and also that only if `flow_logs=trace` is set using `RUST_LOG`. Early adopters of this feature are encouraged to enable these logs as described and then ingest the Gateway's logs into the SIEM of their choice for further analysis. Related: #8353	2025-10-22 03:10:21 +00:00
Thomas Eizinger	b11adfcfe4	feat(connlib): create flow on ICMP error "prohibited" (#10462 ) In Firezone, a Client requests an "access authorization" for a Resource on the fly when it sees the first packet for said Resource going through the tunnel. If we don't have a connection to the Gateway yet, this is also where we will establish a connection and create the WireGuard tunnel. In order for this to work, the access authorization state between the Client and the Gateway MUST NOT get out of sync. If the Client thinks it has access to a Resource, it will just route the traffic to the Gateway. If the access authorization on the Gateway has expired or vanished otherwise, the packets will be black-holed. Starting with #9816, the Gateway sends ICMP errors back to the application whenever it filters a packet. This can happen either because the access authorization is gone or because the traffic wasn't allowed by the specific filter rules on the Resource. With this patch, the Client will attempt to create a new flow (i.e. re-authorize) traffic for this resource whenever it sees such an ICMP error, therefore acting as a way of synchronizing the view of the world between Client and Gateway should they ever run out of sync. Testing turned out to be a bit tricky. If we let the authorization on the Gateway lapse naturally, we portal will also toggle the Resource off and on on the Client, resulting in "flushing" the current authorizations. Additionally, it the Client had only access to one Resource, then the Gateway will gracefully close the connection, also resulting in the Client creating a new flow for the next packet. To actually trigger this new behaviour we need to: - Access at least two resources via the same Gateway - Directly send `reject_access` to the Gateway for this particular resource To achieve this, we dynamically eval some code on the API node and instruct the Gateway channel to send `reject_access`. The connection stays intact because there is still another active access authorization but packets for the other resource are answered with ICMP errors. To achieve a safe roll-out, the new behaviour is feature-flagged. In order to still test it, we now also allow feature flags to be set via env variables. Resolves: #10074 --------- Co-authored-by: Mariusz Klochowicz <mariusz@klochowicz.com>	2025-09-30 08:23:39 +00:00
Thomas Eizinger	9865e03343	ci: fix double symmetric NAT test failure (#10410 ) As it turns out, the flaky test was caused by a bug in the eBPF kernel where we read the old channel data header from the wrong offset. This made us essentially read garbage data for the channel number, causing us to: a. Compute a bad checksum b. Send the packet on a completely wrong channel The reason this caused a flaky test is that it requires on side to pick IPv4 to talk to the relay and the other side IPv6. The happy-eyeballs approach of the `allocation` module made that non-deterministic, only exposing this bug occasionally. To ensure these kind of things are detected earlier in the future, I am adding an additional CI step that checks all packets emitted by the eBPF kernel for checksum errors. Fixes: #10404 Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>	2025-09-25 17:53:17 +10:00
dependabot[bot]	40aba05742	build(deps): bump actions/checkout from 4 to 5 (#10440 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/checkout/releases">actions/checkout's releases</a>.</em></p> <blockquote> <h2>v5.0.0</h2> <h2>What's Changed</h2> <ul> <li>Update actions checkout to use node 24 by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2226">actions/checkout#2226</a></li> <li>Prepare v5.0.0 release by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2238">actions/checkout#2238</a></li> </ul> <h2>⚠️ Minimum Compatible Runner Version</h2> <p><strong>v2.327.1</strong><br /> <a href="https://github.com/actions/runner/releases/tag/v2.327.1">Release Notes</a></p> <p>Make sure your runner is updated to this version or newer to use this release.</p> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/checkout/compare/v4...v5.0.0">https://github.com/actions/checkout/compare/v4...v5.0.0</a></p> <h2>v4.3.0</h2> <h2>What's Changed</h2> <ul> <li>docs: update README.md by <a href="https://github.com/motss"><code>@motss</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li> <li>Add internal repos for checking out multiple repositories by <a href="https://github.com/mouismail"><code>@mouismail</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li> <li>Documentation update - add recommended permissions to Readme by <a href="https://github.com/benwells"><code>@benwells</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li> <li>Adjust positioning of user email note and permissions heading by <a href="https://github.com/joshmgross"><code>@joshmgross</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2044">actions/checkout#2044</a></li> <li>Update README.md by <a href="https://github.com/nebuk89"><code>@nebuk89</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li> <li>Update CODEOWNERS for actions by <a href="https://github.com/TingluoHuang"><code>@TingluoHuang</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2224">actions/checkout#2224</a></li> <li>Update package dependencies by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li> <li>Prepare release v4.3.0 by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/2237">actions/checkout#2237</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/motss"><code>@motss</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/1971">actions/checkout#1971</a></li> <li><a href="https://github.com/mouismail"><code>@mouismail</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/1977">actions/checkout#1977</a></li> <li><a href="https://github.com/benwells"><code>@benwells</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/2043">actions/checkout#2043</a></li> <li><a href="https://github.com/nebuk89"><code>@nebuk89</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/2194">actions/checkout#2194</a></li> <li><a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/2236">actions/checkout#2236</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/checkout/compare/v4...v4.3.0">https://github.com/actions/checkout/compare/v4...v4.3.0</a></p> <h2>v4.2.2</h2> <h2>What's Changed</h2> <ul> <li><code>url-helper.ts</code> now leverages well-known environment variables by <a href="https://github.com/jww3"><code>@jww3</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1941">actions/checkout#1941</a></li> <li>Expand unit test coverage for <code>isGhes</code> by <a href="https://github.com/jww3"><code>@jww3</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1946">actions/checkout#1946</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/checkout/compare/v4.2.1...v4.2.2">https://github.com/actions/checkout/compare/v4.2.1...v4.2.2</a></p> <h2>v4.2.1</h2> <h2>What's Changed</h2> <ul> <li>Check out other refs/* by commit if provided, fall back to ref by <a href="https://github.com/orhantoy"><code>@orhantoy</code></a> in <a href="https://redirect.github.com/actions/checkout/pull/1924">actions/checkout#1924</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/Jcambass"><code>@Jcambass</code></a> made their first contribution in <a href="https://redirect.github.com/actions/checkout/pull/1919">actions/checkout#1919</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/checkout/compare/v4.2.0...v4.2.1">https://github.com/actions/checkout/compare/v4.2.0...v4.2.1</a></p> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`08c6903cd8`"><code>08c6903</code></a> Prepare v5.0.0 release (<a href="https://redirect.github.com/actions/checkout/issues/2238">#2238</a>)</li> <li><a href="`9f265659d3`"><code>9f26565</code></a> Update actions checkout to use node 24 (<a href="https://redirect.github.com/actions/checkout/issues/2226">#2226</a>)</li> <li>See full diff in <a href="https://github.com/actions/checkout/compare/v4...v5">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/checkout&package-manager=github_actions&previous-version=4&new-version=5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-09-23 19:58:33 +00:00
Thomas Eizinger	f09232e983	ci: disable flaky integration test (#10427 ) Until we can figure out why this is flaky, comment it out to allow for stable CI.	2025-09-23 06:09:29 +00:00
Thomas Eizinger	3e6094af8d	feat(linux): try to set `rmem_max` and `wmem_max` on startup (#10349 ) The default send and receive buffer sizes on Linux are too small (only ~200 KB). Checking `nstat` after an iperf run revealed that the number of dropped packets in the first interval directly correlates with the number of receive buffer errors reported by `nstat`. We already try to increase the send and receive buffer sizes for our UDP socket but unfortunately, we cannot increase them beyond what the system limits them to. To workaround this, we try to set `rmem_max` and `wmem_max` during startup of the Linux headless client and Gateway. This behaviour can be disabled by setting `FIREZONE_NO_INC_BUF=true`. This doesn't work in Docker unfortunately, so we set the values manually in the CI perf tests and verify after the test that we didn't encounter any send and receive buffer errors. It is yet to be determined how we should deal with this problem for all the GUI clients. See #10350 as an issue tracking that. Unfortunately, this doesn't fix all packet drops during the first iperf interval. With this PR, we now see packet drops on the interface itself.	2025-09-17 23:05:01 +00:00
Thomas Eizinger	22eac1ad6d	ci: add latency to routers (#10352 ) Now that we have a more realistic network setup in our compose file, we can extend our router containers to apply the latency on the network path. This means any use of the compose file has a latency by default, simplifying our CI setup. It also allows us to restart containers without having to re-apply the latency which is useful during performance testing.	2025-09-16 20:27:47 +00:00
Thomas Eizinger	eb7090ac2c	ci: up the `veth-config` container last (#10351 ) It appears that we still have a race condition where the `veth-config` container runs too early and ends up not applying the `XDP_PASS` program to all interfaces, causing the double symmetric NAT integration test to fail: https://github.com/firezone/firezone/actions/runs/17718375157/job/50346744176?pr=10347	2025-09-16 06:27:32 +10:00
Thomas Eizinger	0b89959354	fix(relay): handle relay-relay candidate pairs in eBPF (#10286 ) Currently, the eBPF module can translate from channel data messages to UDP packets and vice versa. It can even do that across IP stacks, i.e. translate from an IPv6 UDP packet to an IPv4 channel data messages. What it cannot do is handle packets to itself. This can happen if both - Client and Gateway - pick the same relay to make an allocation. When exchanging candidates, ICE will then form pairs between both relay candidates, essentially requiring the relay to loop packets back to itself. In eBPF, we cannot do that. When sending a packet back out with `XDP_TX`, it will actually go out on the wire without an additional check whether they are for our own IP. Properly handling this in eBPF (by comparing the destination IP to our public IP) adds more cases we need to handle. The current module structure where everything is one file makes this quite hard to understand, which is why I opted to create four sub-modules: - `from_ipv4_channel` - `from_ipv4_udp` - `from_ipv6_channel` - `from_ipv6_udp` For traffic arriving via a data-channel, it is possible that we also need to send it back out via a data-channel if the peer address we are sending to is the relay itself. Therefore, the `from_ipX_channel` modules have four sub-modules: - `to_ipv4_channel` - `to_ipv4_udp` - `to_ipv6_channel` - `to_ipv6_udp` For the traffic arriving on an allocation port (`from_ipX_udp`), we always map to a data-channel and therefore can never get into a routing loop, resulting in only two modules: - `to_ipv4_channel` - `to_ipv6_channel` The actual implementation of the new code paths is rather simple and mostly copied from the existing ones. For half of them, we don't need to make any adjustments to the buffer size (i.e. IPv4 channel to IPv4 channel). For the other half, we need to adjust for the difference in the IP header size. To test these changes, we add a new integration test that makes use of the new docker-compose setup added in #10301 and configures masquerading for both Client and Gateway. To make this more useful, we also remove the `direct-` prefix from all tests as the test script itself no longer makes any decisions as to whether it is operating over a direct or relayed connection. Resolves: #7518	2025-09-11 07:19:23 +00:00
Thomas Eizinger	83171d3a2d	ci: add integration test for graceful Gateway shutdown (#10077 ) Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2025-09-10 23:41:55 +00:00
Thomas Eizinger	d1d46fdfb4	ci: create a more realistic network setup (#10301 ) Currently, the setup we have in docker-compose does not reflect real-world scenarios very well because most components share the same subnet. In reality, Clients, Gateways, relays and the backend are all in separate subnets, connected via multiple routers on the Internet. The current setup makes it hard to properly test relayed connections. To fix this, we move all components into their own subnet with a dedicated router container that performs source and destination NAT as well as acts as a firewall for the client and gateway containers to not allow inbound traffic. This setup will allow us to more easily test #10286 which requires port randomization for outgoing traffic on the Client and Gateway side.	2025-09-10 23:37:16 +00:00
Thomas Eizinger	f96cc3d583	feat(relay): remove graceful shutdown (#10322 ) Initially, we added the graceful shutdown functionality to the relay to better deal with deploys and achieve as minimal downtime as possible. With the split of app and infrastructure that we now have, this functionality is no longer necessary as portal deploys don't touch the relay infra at all. Thus, we can remove this functionality which will actually speed-up deploys of the relays as systemd no longer has to time-out after sending the SIGTERM to the binary.	2025-09-10 07:00:20 +00:00
Thomas Eizinger	eeadde0c86	ci: bump Ubuntu runners to 24.04 (#10288 ) Ubuntu 22.04 is over 3 years old and therefore ships with quite an old kernel. Our production VMs (for relays) all run Ubuntu 24.04 so it makes sense to build and test them on the same kernel / OS release. For consistency reasons, we therefore bump all runners to 24.04.	2025-09-04 02:04:55 +00:00
Jamil	0ccd4bbf24	feat(ci): enable relay eBPF offloading (#10160 ) In CI, eBPF in driver mode actually functions just fine with no changes to our existing tests, given we apply a few workarounds and bugfixes: - The interface learning mechanism had two flaws: (1) it only learned per-CPU, which meant the risk for a missing entry grew as the core count of the relay host grew, and (2) it did not filter for unicast IPs, so it picked up broadcast and link-local addresses, causing cross-relay paths to fail occasionally - The `relay-relay` candidate where the two relays are the same relay causes packet drops / loops in the Docker bridge setup, and possibly in GCP too. I'm not sure this is a valid path that solves a real connectivity issue in the wild. I can understand relay-relay paths where two relays are different hosts, and the client and gateway both talk over their TURN channel to each other (i.e. WireGuard is blocked in each of their networks), but I can't think of an advantage for a relay-relay candidate where the traffic simply hairpins (or is dropped) off the nearest switch. This has been now detected with a new `PacketLoop` error that triggers whenever source_ip == dest_ip. - The relays in CI need a common next-hop to talk to for the MAC address swapping to work. A simple router service is added which functions as a basic L3 router (no NAT) that allows the MAC swapping to work. - The `veth` driver has some peculiar requirements to allow it to function with XDP_TX. If you send a packet out of one interface of a veth pair with XDP_TX, you need to either make sure both interfaces have GRO enabled, or you need to attach a dummy XDP program that simply does XDP_PASS to the other interface so that the sk_buff is allocated before going up the stack to the Docker bridge. The GRO method was unreliable and didn't work in our case, causing massive packet delays and unpredictable bursts that prevented ICE from working, so we use the XDP_PASS method instead. A simple docker image is built and lives at https://github.com/firezone/xdp-pass to handle this. Related: #10138 Related: #10260	2025-08-31 23:37:03 +00:00
Jamil	516be7417e	fix(ci): remove extraneous caching (#10258 ) - Removes the swift DerivedData cache. This was added to attempt to speed up the Swift builds in CI but in reality, those are already fast and the cache did not speed them up. - Removes the runner.os/arch specifier from the Webview installer cache key. The binary download is hardcoded for a specific windows version / arch already so the cache key just adds unneeded complexity. These caches are getting saved on PR runs which consumes excess GHA cache storage.	2025-08-27 05:01:02 -07:00
Jamil	8eb738e66a	chore(ci): downgrade runners to free tier (#10248 ) To avoid burning Azure credits, we move the runners back down to the free tier. Now that caching is properly set up, this should incur only a minor increase in CI time.	2025-08-26 10:48:45 -07:00
Jamil	a05067d410	chore(ci): Add 50ms simulated API latency (#10132 ) In the real world, it's entirely possible that the latency between clients, gateways, and relays is much lower than the latency to the API nodes. This added latency will test that we can handle such cases reliably. --------- Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2025-08-05 09:23:43 -07:00
Thomas Eizinger	a7ba15c8c1	ci: test packet loss behaviour using download (#10067 )	2025-08-01 01:55:02 +00:00
Jamil	1763113511	test(ci): test 20% packet loss (#9846 ) Packet loss is a reality on the modern internet. Ideally, Firezone should be able to handle some level of packet loss and still function reliably, especially considering all of the UDP-based protocols we rely on. To test this, we set an extreme packet loss of 20% and perform a 10 MB download through Firezone. Doing so actually exposed a bug: For DNS resources, we need to set up the DNS resource NAT on the Gateway which happens through the p2p control protocol. This packet is resent at most every 2s but only if there are any other DNS queries. If we don't receive another DNS query but get traffic for the resource, we keep buffering those packets without trying to re-send the `AssignedIp`s packet.	2025-07-28 22:51:04 +00:00
Jamil	a8f93d24a3	chore(infra): ditch gcp registry for ghcr.io (#9913 ) Google Cloud Artifact registry and Cloud storage is a significant cost. GitHub, on the other hand, is completely free due to our being a public repository. Hence, it makes sense to ditch GCP for GHCR. To do this, we move all "staging" artifacts to GHCR. These will then be used in the infra repo to push to GCP for deploys - we probably still want pulls for our infra to hit GCP and not GitHub. One big element of this is that we potentially lose sccache, so I'll be checking the compile time of this PR and looking for alternatives that don't involve such a massive cloud bill.	2025-07-19 07:00:30 -07:00
Thomas Eizinger	cf2470ba1e	test(iperf): install iptables rule inside of container (#9880 ) In Docker environments, applying iptables rules to filter container-container traffic on the Docker bridged network is not reliable, leading to direct connections being established in our relayed tests. To fix this, we insert the rules directly from the client container itself. --------- Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>	2025-07-16 10:29:33 +00:00
Thomas Eizinger	cb3f4c0884	ci: fail perf & integration tests on warnings (#9875 ) We already do the same thing for our integration tests. It turns out that it wasn't working there either though. Related: #9874	2025-07-15 14:36:54 +00:00
Thomas Eizinger	cb9b087bf3	refactor(ci): reuse `gcp-docker-login` action (#9787 ) It appears the code for authenticating with GCP is duplicated in some of our workflows.	2025-07-04 14:06:21 +00:00
Jamil	0b09d9f2f5	refactor(portal): don't rely on flows.expires_at (#9692 ) The `expires_at` column on the `flows` table was never used outside of the context in which the flow was created in the Client Channel. This ephemeral state, which is created in the `Domain.Flows.authorize_flow/4` function, is never read from the DB in any meaningful capacity, so it can be safely removed. The `expire_flows_for` family of functions now simply reads the needed fields from the flows table in order to broadcast `{:expire_flow, flow_id, client_id, resource_id}` directly to the subscribed entities. This PR is step 1 in removing the reliance on `Flows` to manage ephemeral access state. In a subsequent PR we will actually change the structure of what state is kept in the channel PIDs such that reliance on this Flows table will no longer be necessary. Additionally, in a few places, we were referencing a Flows.Show view that was never available in production, so this dead code has been removed. Lastly, the `flows` table subscription and associated hook processing has been completely removed as it is no longer needed. We've implemented in #9667 logic to remove publications from removed table subscriptions, so we can expect to get a couple ingest warnings when we deploy this as the `Hooks.Flows` processor no longer exists, and the WAL data may have lingering flows records in the queue. These can be safely ignored.	2025-06-27 18:29:12 +00:00
Thomas Eizinger	40f0609d90	ci: lint GitHub workflows with `actionlint` (#9590 ) [`actionlint`](https://github.com/rhysd/actionlint) is a static analysis tool for GitHub workflows and actions. It detects various issues ahead of time and runs shellcheck on all `run` blocks. It is worth noting that this does not lint the contents of composite actions so we still need to be vigilant when working with those.	2025-06-24 08:05:10 +00:00
Jamil	ec5c433f5b	feat(ci): use larger runners for all jobs (#9646 ) Append `-xlarge` to the previous runner labels to match new larger runners.	2025-06-23 14:23:22 -07:00
Jamil	34c6e483f6	fix(ci): ensure docker compose up runs one by one (#9375 ) Similar to the fix in #9205, the version of docker compose on GitHub runners has a race condition when upping more than one service backed by the same image. To reduce flakiness, we ensure that `httpbin` is upped one-by-one. Related: https://github.com/firezone/firezone/actions/runs/15408440858/job/43355659174	2025-06-03 05:07:08 +00:00
Thomas Eizinger	575e974547	ci: limit docker compose parallelism (#9082 )	2025-05-12 02:44:34 +00:00
Thomas Eizinger	7d96953265	ci: add integration test with ECN enabled (#9012 )	2025-05-02 10:25:12 +00:00
Jamil	06aa485e18	ci: Use search_domain for one resource in CI test (#8393 ) - Adds a `search_domain` of `httpbin.test` in seeds - Updates one of our DNS resources under CI test to use this	2025-03-15 13:27:22 +00:00
Thomas Eizinger	18caff5538	ci: assert that integration tests don't produce warnings (#7900 ) Would have allowed us to detect https://github.com/rust-netlink/netlink-packet-route/issues/140 earlier.	2025-01-30 02:50:56 +00:00
Thomas Eizinger	d26df944c0	ci: reference GitHub actions by hash (#7724 ) To improve supply-chain security, reference all GitHub actions using the hash of the released tag. GitHub recommends to do this for third-party actions (https://docs.github.com/en/actions/security-for-github-actions/security-guides/security-hardening-for-github-actions#using-third-party-actions). In order to make our CI more deterministic, I opted to do it for all our actions. This means any change to our workflow configuration requires a source code change and thus passing CI on our end. Dependabot will automatically issue PRs for these actions and update the comment with the new version next to them. Resolves: #2497.	2025-01-12 17:35:52 +00:00
Jamil	6f7f6a4f34	style: Enforce code style across all supported languages using Prettier (#7322 ) This ensure that we run prettier across all supported filetypes to check for any formatting / style inconsistencies. Previously, it was only run for files in the website/ directory using a deprecated pre-commit plugin. The benefit to keeping this in our pre-commit config is that devs can optionally run these checks locally with `pre-commit run --config .github/pre-commit-config.yaml`. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-11-13 00:19:15 +00:00
Thomas Eizinger	9de1119b69	feat(connlib): support DNS over TCP (#6944 ) At present, `connlib` only supports DNS over UDP on port 53. Responses over UDP are size-constrained on the IP MTU and thus, not all DNS responses fit into a UDP packet. RFC9210 therefore mandates that all DNS resolvers must also support DNS over TCP to overcome this limitation [0]. Handling UDP packets is easy, handling TCP streams is more difficult because we need to effectively implement a valid TCP state machine. Building on top of a lot of earlier work (linked in issue), this is relatively easy because we can now simply import `dns_over_tcp::{Client,Server}` which do the heavy lifting of sending and receiving the correct packets for us. The main aspects of the integration that are worth pointing out are: - We can handle at most 10 concurrent DNS TCP connections _per defined resolver_. The assumption here is that most applications will first query for DNS records over UDP and only fall back to TCP if the response is truncated. Additionally, we assume that clients will close the TCP connections once they no longer need it. - Errors on the TCP stream to an upstream resolver result in `SERVFAIL` responses to the client. - All TCP connections to upstream resolvers get reset when we roam, all currently ongoing queries will be answered with `SERVFAIL`. - Upon network reset (i.e. roaming), we also re-allocate new local ports for all TCP sockets, similar to our UDP sockets. Resolves: #6140. [0]: https://www.ietf.org/rfc/rfc9210.html#section-3-5	2024-10-18 03:40:50 +00:00
Thomas Eizinger	650e31c784	ci: remove outdated integration tests (#6922 ) Since we've added these tests, `connlib`'s test coverage has increased significantly to the point where we don't need all of them anymore. Especially pretty much everything in regards to relays is unnecessary to be tested using docker. These integration tests are sometimes flaky due to docker not starting or images failing to pull. Thus, having fewer of them is better because it increases CI reliability. Also, there are only so many jobs that GitHub will execute in parallel so having less jobs is better for that too. Resolves: #6451. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2024-10-08 16:39:18 +00:00
Thomas Eizinger	35017537c7	feat(gateway): allow out-of-order `allow_access` requests (#6403 ) Currently, the gateway requires a strict ordering of first receiving a `request_connection` message, following by multiple `allow_access` messages. Additionally, access can be granted as part of the initial `request_connection` message too. This isn't an ideal design. Setting up a new connection is infallible, all we need to do is send our ICE credentials back to the client. However, untangling that will require a bit more effort. Starting with #6335, following this strict order on the client is a more difficult. Whilst we can send them in order, it is harder to maintain those ordering guarantees across all our systems. To avoid this, we change the gateway to perform an upsert for its local ACLs for a client. In case that an `allow_access` call would somehow get to the gateway earlier, we can simply already create the `Peer` and only set up the actual connection later. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-08-28 13:10:06 +00:00
Jamil	84a981f668	refactor(ci): Remove browser-based integration tests (#6435 ) Fixes a new issue with puppeteer, chromium 128, and Alpine 3.20 that's causing failing browser tests. See more: https://github.com/puppeteer/puppeteer/issues/12189 Failure: https://github.com/firezone/firezone/actions/runs/10549430305/job/29224528663?pr=6391 Unfortunately, puppeteer's embedded browser doesn't seem to want to run in Alpine: https://github.com/firezone/firezone/actions/runs/10563167497/job/29265175731?pr=6435#step:6:56 Fixing this is proving very difficult since we can't seem to use puppeteer with the latest Alpine images, so I questioned the need to have these in at all. These tests were added at a time where the DNS mappings were brittle, so we wanted to verify that relayed and direct connections held up as we deployed. This is no longer the case, and we also now have much more unit test coverage around these things, so given the pain of maintaining these (and the lack of a current solution to the above), they are removed. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com>	2024-08-26 20:01:00 +00:00
Jamil	0c6cd4a804	fix(ci): Add http test server image specifiers to CI (#6208 ) - Adds `http_test_server_image` to inputs so that it gets set properly for CI (`debug`) and CD (`perf`) - Updates `dev` -> `debug` in docker-compose.yml to fix pulls - Fixes issue with seeds and relevant docs from #6205	2024-08-07 12:15:00 -07:00
Thomas Eizinger	5687befc9d	ci: use correct service name in `docker-compose.yml` (#6055 ) The compose service I defined is called `otel` not `otlp`. With this fix in place, the relay successfully connects to the OTLP exporter. it is worthwhile noting that the connection to the OTLP exporter itself is not critical for relay operation. Even if it fails, it won't affect the actual data plane. I do think it makes sense to still have a working OTLP exporter in the compose definition. As it makes it easier to test whether the ingestion of metrics and traces works as expected.	2024-07-27 02:48:08 +00:00
Reactor Scram	6862213cc2	fix(headless-client/linux): only notify `systemd` that we're up after Resources are available (#6026 ) Closes #5912 Before this, I had the `--exit` CLI flag and the `sd_notify` call hanging off the wrong callback.	2024-07-26 18:53:08 +00:00
Jamil	7034344334	ci: Sleep 3 seconds after upping services for integration tests (#6019 ) Fixes #4921	2024-07-24 15:49:39 +00:00
Jamil	a45acc04db	fix(connlib): set default `firezone_tunnel` log level from `trace` to `debug` for development and some `ci` (#5411 ) "Encapsulated packet" is now spamming dev clients, so this level is changed to `debug` by default in dev builds. ``` 2024-06-17 14:04:15.419 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.419 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.420 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.420 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.420 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.420 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.421 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.421 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.422 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.422 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.422 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet 2024-06-17 14:04:15.423 6911-7520 connlib dev.firezone.android V firezone_tunnel::client: s0_name: encapsulates0_target=firezone_tunnel::clients0_file=connlib/tunnel/src/client.rss0_line=441s0_dst=fd00:2021:1111:8000::2Encapsulated packet ```	2024-06-18 04:48:52 +00:00
Jamil	83340b9252	ci: Don't run browser tests on release images (#4722 ) Fixes https://github.com/firezone/firezone/actions/runs/8763390111	2024-04-20 00:37:12 -07:00
Gabi	adc0bb73f7	test(client): add reconnection tests from a client using a headless browser (#4569 ) Considered using Elixir and Rust to write the tests. For Elixir, `wallaby` doesn't seem to have a way to attach to an existing `chromium` instance, launching it each time, which makes it hard to coordinate with the relay restart. For Rust we considered `thirtyfour` which would be very nice since we could test both firefox and chrome but each time it connects to the instance it launches a new session making it hard to test the DNS cache behavior. We also considered `chrome_headless` for Rust it needs a small patch to prevent it from closing the browser after `Drop` but it still presents a problem, since it has no easy way to retrieve if loading a page has succeeded. There are some workarounds such as retrieving the title that we could have used but after some testing they are quite finnicky and we don't want that for CI. So I ended up settling for TypeScript but I'm open to other options, or a fix for the previous ones! There are some modifications still incoming for this PR, around the test name and that sleep in the middle of the test doesn't look good so I will probably add some retries, but the gist is here, will keep it in draft until we expect it to be passing. So feel free to do some initial reviews. Note: the number of lines changed is greatly exaggerated by `package.lock` --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-04-20 06:57:07 +00:00
Thomas Eizinger	51089b89e7	feat(connlib): smoothly migrate relayed connections (#4568 ) Whenever we receive a `relays_presence` message from the portal, we invalidate the candidates of all now disconnected relays and make allocations on the new ones. This triggers signalling of new candidates to the remote party and migrates the connection to the newly nominated socket. This still relies on #4613 until we have #4634. Resolves: #4548. --------- Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-04-20 06:16:35 +00:00
Reactor Scram	bc22fb2bf2	test(linux-client): move linux-group test out of integration tests (#4692 ) Closes #4669 This should stop the problem of `linux-group` failing because of trying to test an older release that doesn't have the right CLI features --------- Co-authored-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-04-19 02:52:31 +00:00
Reactor Scram	68016a8a56	test(linux-client): disable failing test (#4689 )	2024-04-18 19:40:06 +00:00

1 2

62 Commits