firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 10:18:54 +00:00

Author	SHA1	Message	Date
Firezone Bot	76d86545a6	chore: publish apple-client 1.5.9 (#10654 )	2025-10-20 14:04:08 +00:00
Brian Manifold	27565ea5c8	refactor(portal): remove soft delete elements from portal code (#10607 ) Why: * In previous commits, the portal code had been updated to use hard deletion rather than soft deletion of data. The fields used in the soft deletion were still kept in the DB and the code to allow for zero downtime rollout and an easy rollback if necessary. To continue with that work the portal code has now been updated to remove any reference to the soft deleted fields (e.g. deleted_at, persistent_id, etc...). While the code has been updated the actual data in the DB will need to remain for now, to once again allow for a zero downtime rollout. Once this commit has been deployed to production another PR can follow to remove the columns from the necessary tables in the DB. Related: #8187	2025-10-18 17:02:26 +00:00
Firezone Bot	9b6ebb01ed	chore: publish android-client 1.5.5 (#10614 )	2025-10-18 16:54:35 +00:00
Thomas Eizinger	7e5ec7c2d7	ci: upload `.deb` from releases to APT repository (#10587 ) This PR creates the necessary CI infrastructure to copy `.deb` packages from releases to our APT repository. Re-generation of the index is separated out into a dedicated workflow to avoid concurrency issues and so we can re-generate it without making a release. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-16 19:39:35 +00:00
Thomas Eizinger	17ab1a6d04	ci: remove jitter from docker-compose (#10589 ) Jitter causes packets to get re-ordered which makes it really hard to get predictable performance results. With jitter disabled, we get more consistent performance numbers.	2025-10-16 13:34:59 +00:00
Firezone Bot	5272e0c992	chore: publish headless-client 1.5.4 (#10590 )	2025-10-16 09:15:32 +00:00
Firezone Bot	f78cccea1b	chore: publish gui-client 1.5.8 (#10591 )	2025-10-16 08:47:35 +00:00
Firezone Bot	e3bb2fb931	chore: publish gateway 1.4.17 (#10584 )	2025-10-16 05:38:12 +00:00
Thomas Eizinger	a1b2ca195c	ci(apple): explicitly select Xcode 26.0 (#10511 ) In order to build the iOS app with the Xcode version that is installed on the GitHub runners, we need to select the Xcode version by major and minor version. Currently, the iOS builds are failing because Xcode 26.1 also exists but iOS 26.1 isn't supported (or released?). See https://github.com/firezone/firezone/actions/runs/18239282351/job/51938727311.	2025-10-06 16:07:34 +00:00
Thomas Eizinger	0d61cacb08	ci: add 20% jitter in test environment (#10504 ) To simulate the real-world more accurately, we add a 20% jitter to the specified latency on the router containers.	2025-10-02 05:27:05 +00:00
Thomas Eizinger	b11adfcfe4	feat(connlib): create flow on ICMP error "prohibited" (#10462 ) In Firezone, a Client requests an "access authorization" for a Resource on the fly when it sees the first packet for said Resource going through the tunnel. If we don't have a connection to the Gateway yet, this is also where we will establish a connection and create the WireGuard tunnel. In order for this to work, the access authorization state between the Client and the Gateway MUST NOT get out of sync. If the Client thinks it has access to a Resource, it will just route the traffic to the Gateway. If the access authorization on the Gateway has expired or vanished otherwise, the packets will be black-holed. Starting with #9816, the Gateway sends ICMP errors back to the application whenever it filters a packet. This can happen either because the access authorization is gone or because the traffic wasn't allowed by the specific filter rules on the Resource. With this patch, the Client will attempt to create a new flow (i.e. re-authorize) traffic for this resource whenever it sees such an ICMP error, therefore acting as a way of synchronizing the view of the world between Client and Gateway should they ever run out of sync. Testing turned out to be a bit tricky. If we let the authorization on the Gateway lapse naturally, we portal will also toggle the Resource off and on on the Client, resulting in "flushing" the current authorizations. Additionally, it the Client had only access to one Resource, then the Gateway will gracefully close the connection, also resulting in the Client creating a new flow for the next packet. To actually trigger this new behaviour we need to: - Access at least two resources via the same Gateway - Directly send `reject_access` to the Gateway for this particular resource To achieve this, we dynamically eval some code on the API node and instruct the Gateway channel to send `reject_access`. The connection stays intact because there is still another active access authorization but packets for the other resource are answered with ICMP errors. To achieve a safe roll-out, the new behaviour is feature-flagged. In order to still test it, we now also allow feature flags to be set via env variables. Resolves: #10074 --------- Co-authored-by: Mariusz Klochowicz <mariusz@klochowicz.com>	2025-09-30 08:23:39 +00:00
Thomas Eizinger	9865e03343	ci: fix double symmetric NAT test failure (#10410 ) As it turns out, the flaky test was caused by a bug in the eBPF kernel where we read the old channel data header from the wrong offset. This made us essentially read garbage data for the channel number, causing us to: a. Compute a bad checksum b. Send the packet on a completely wrong channel The reason this caused a flaky test is that it requires on side to pick IPv4 to talk to the relay and the other side IPv6. The happy-eyeballs approach of the `allocation` module made that non-deterministic, only exposing this bug occasionally. To ensure these kind of things are detected earlier in the future, I am adding an additional CI step that checks all packets emitted by the eBPF kernel for checksum errors. Fixes: #10404 Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>	2025-09-25 17:53:17 +10:00
Thomas Eizinger	cf837c5087	ci: fix build context for relay container (#10426 ) The build context is taken relative from where the file is defined, meaning we first need to navigate to directories up.	2025-09-23 04:01:55 +00:00
Jamil	8b2bf97513	fix(ci): RUN_MANUAL_MIGRATIONS=true (#10377 ) This variable was renamed and not updated in our docker-compose.yml, causing intermittent errors like this one: https://github.com/firezone/firezone/actions/runs/17835644646/job/50712540454	2025-09-18 17:43:02 +00:00
Firezone Bot	8f46007674	chore: publish android-client 1.5.4 (#10374 )	2025-09-18 10:37:20 -07:00
Thomas Eizinger	22eac1ad6d	ci: add latency to routers (#10352 ) Now that we have a more realistic network setup in our compose file, we can extend our router containers to apply the latency on the network path. This means any use of the compose file has a latency by default, simplifying our CI setup. It also allows us to restart containers without having to re-apply the latency which is useful during performance testing.	2025-09-16 20:27:47 +00:00
Thomas Eizinger	737137df97	chore: remove nix flake (#10364 ) I am not longer using Nix so this is now effectively unmaintained. Let's remove it so it doesn't got stale.	2025-09-16 10:27:18 +00:00
Thomas Eizinger	e2dce710f1	refactor: tidy up `docker-compose.yml` (#10334 ) Our `docker-compose.yml` file has grown to a degree where it is almost unmanageable. Docker compose offers several tools to deal with complex compose setups, like include files and yaml anchors. We refactor our setup using these tools to organise these services and their configuration a bit better.	2025-09-15 03:37:39 +00:00
Thomas Eizinger	0b89959354	fix(relay): handle relay-relay candidate pairs in eBPF (#10286 ) Currently, the eBPF module can translate from channel data messages to UDP packets and vice versa. It can even do that across IP stacks, i.e. translate from an IPv6 UDP packet to an IPv4 channel data messages. What it cannot do is handle packets to itself. This can happen if both - Client and Gateway - pick the same relay to make an allocation. When exchanging candidates, ICE will then form pairs between both relay candidates, essentially requiring the relay to loop packets back to itself. In eBPF, we cannot do that. When sending a packet back out with `XDP_TX`, it will actually go out on the wire without an additional check whether they are for our own IP. Properly handling this in eBPF (by comparing the destination IP to our public IP) adds more cases we need to handle. The current module structure where everything is one file makes this quite hard to understand, which is why I opted to create four sub-modules: - `from_ipv4_channel` - `from_ipv4_udp` - `from_ipv6_channel` - `from_ipv6_udp` For traffic arriving via a data-channel, it is possible that we also need to send it back out via a data-channel if the peer address we are sending to is the relay itself. Therefore, the `from_ipX_channel` modules have four sub-modules: - `to_ipv4_channel` - `to_ipv4_udp` - `to_ipv6_channel` - `to_ipv6_udp` For the traffic arriving on an allocation port (`from_ipX_udp`), we always map to a data-channel and therefore can never get into a routing loop, resulting in only two modules: - `to_ipv4_channel` - `to_ipv6_channel` The actual implementation of the new code paths is rather simple and mostly copied from the existing ones. For half of them, we don't need to make any adjustments to the buffer size (i.e. IPv4 channel to IPv4 channel). For the other half, we need to adjust for the difference in the IP header size. To test these changes, we add a new integration test that makes use of the new docker-compose setup added in #10301 and configures masquerading for both Client and Gateway. To make this more useful, we also remove the `direct-` prefix from all tests as the test script itself no longer makes any decisions as to whether it is operating over a direct or relayed connection. Resolves: #7518	2025-09-11 07:19:23 +00:00
Thomas Eizinger	9cd25d70d8	ci: prevent packet reordering by router containers (#10328 ) By default, RPS (Receive Packet Steering) is disabled on Linux which means the CPU handling the interrupt for an incoming packet also handles the packet. Under high-load, this can causes packet reordering in your test setup where at least two routers are in the path between Client and Gateway. To ensure our test suite is deterministic, we enable RPS and set it to 1, meaning always CPU 1 will handle all packets. Local testing has shown that this fixes the warnings of "packet counter too old" on the Gateway and instead, all packets arrive entirely in order. Source: https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/6/html/performance_tuning_guide/network-rps	2025-09-11 06:54:05 +00:00
Thomas Eizinger	83171d3a2d	ci: add integration test for graceful Gateway shutdown (#10077 ) Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2025-09-10 23:41:55 +00:00
Thomas Eizinger	d1d46fdfb4	ci: create a more realistic network setup (#10301 ) Currently, the setup we have in docker-compose does not reflect real-world scenarios very well because most components share the same subnet. In reality, Clients, Gateways, relays and the backend are all in separate subnets, connected via multiple routers on the Internet. The current setup makes it hard to properly test relayed connections. To fix this, we move all components into their own subnet with a dedicated router container that performs source and destination NAT as well as acts as a firewall for the client and gateway containers to not allow inbound traffic. This setup will allow us to more easily test #10286 which requires port randomization for outgoing traffic on the Client and Gateway side.	2025-09-10 23:37:16 +00:00
Firezone Bot	d8079c869f	chore: publish apple-client 1.5.8 (#10323 )	2025-09-10 17:06:40 +00:00
Thomas Eizinger	f96cc3d583	feat(relay): remove graceful shutdown (#10322 ) Initially, we added the graceful shutdown functionality to the relay to better deal with deploys and achieve as minimal downtime as possible. With the split of app and infrastructure that we now have, this functionality is no longer necessary as portal deploys don't touch the relay infra at all. Thus, we can remove this functionality which will actually speed-up deploys of the relays as systemd no longer has to time-out after sending the SIGTERM to the binary.	2025-09-10 07:00:20 +00:00
Firezone Bot	af7f4c9992	chore: publish headless-client 1.5.3 (#10320 )	2025-09-10 05:25:24 +00:00
Firezone Bot	cacef44b4b	chore: publish gateway 1.4.16 (#10321 )	2025-09-10 04:50:43 +00:00
Firezone Bot	ff8781b7b6	chore: publish gui-client 1.5.7 (#10319 )	2025-09-10 04:22:09 +00:00
Thomas Eizinger	3cffeef483	ci: reduce target bitrate for UDP perf tests to 600Mbit/s (#10312 ) To achieve a more stable CI, we need to reduce the target bitrate of the UDP perf tests. Now that we no longer have GSO enabled in the tests, the most we can achieve in CI is 600Mbit/s. Forcing more packets through the tunnel results in all sorts of warnings which end up failing CI.	2025-09-09 12:58:33 +00:00
Thomas Eizinger	b762c3acde	ci: don't restart portal at the beginning of the test (#10274 ) Restarting the portal at the beginning of the test is useless. We haven't made any connections yet so restarting it will just get us back to the same state that we are already in.	2025-09-01 13:43:50 +00:00
Jamil	0ccd4bbf24	feat(ci): enable relay eBPF offloading (#10160 ) In CI, eBPF in driver mode actually functions just fine with no changes to our existing tests, given we apply a few workarounds and bugfixes: - The interface learning mechanism had two flaws: (1) it only learned per-CPU, which meant the risk for a missing entry grew as the core count of the relay host grew, and (2) it did not filter for unicast IPs, so it picked up broadcast and link-local addresses, causing cross-relay paths to fail occasionally - The `relay-relay` candidate where the two relays are the same relay causes packet drops / loops in the Docker bridge setup, and possibly in GCP too. I'm not sure this is a valid path that solves a real connectivity issue in the wild. I can understand relay-relay paths where two relays are different hosts, and the client and gateway both talk over their TURN channel to each other (i.e. WireGuard is blocked in each of their networks), but I can't think of an advantage for a relay-relay candidate where the traffic simply hairpins (or is dropped) off the nearest switch. This has been now detected with a new `PacketLoop` error that triggers whenever source_ip == dest_ip. - The relays in CI need a common next-hop to talk to for the MAC address swapping to work. A simple router service is added which functions as a basic L3 router (no NAT) that allows the MAC swapping to work. - The `veth` driver has some peculiar requirements to allow it to function with XDP_TX. If you send a packet out of one interface of a veth pair with XDP_TX, you need to either make sure both interfaces have GRO enabled, or you need to attach a dummy XDP program that simply does XDP_PASS to the other interface so that the sk_buff is allocated before going up the stack to the Docker bridge. The GRO method was unreliable and didn't work in our case, causing massive packet delays and unpredictable bursts that prevented ICE from working, so we use the XDP_PASS method instead. A simple docker image is built and lives at https://github.com/firezone/xdp-pass to handle this. Related: #10138 Related: #10260	2025-08-31 23:37:03 +00:00
Jamil	516be7417e	fix(ci): remove extraneous caching (#10258 ) - Removes the swift DerivedData cache. This was added to attempt to speed up the Swift builds in CI but in reality, those are already fast and the cache did not speed them up. - Removes the runner.os/arch specifier from the Webview installer cache key. The binary download is hardcoded for a specific windows version / arch already so the cache key just adds unneeded complexity. These caches are getting saved on PR runs which consumes excess GHA cache storage.	2025-08-27 05:01:02 -07:00
Jamil	8eb738e66a	chore(ci): downgrade runners to free tier (#10248 ) To avoid burning Azure credits, we move the runners back down to the free tier. Now that caching is properly set up, this should incur only a minor increase in CI time.	2025-08-26 10:48:45 -07:00
Jamil	0698e0d35f	ci: test IPv6 for CIDR resources (#10168 ) Docker for Mac finally supports IPv6 in general availability. It's time to add IPv6 to our suite of integration tests. The thinking behind this PR is try and not slow down CI much, if at all, by testing IPv6 side-by-side with the existing IPv4 tests. More comprehensive testing is being developed in #10131 that will test things like IPv4-in-6 relaying, client / gateway IP stack mismatches, and so forth.	2025-08-18 20:59:40 +00:00
Firezone Bot	95ee111e62	chore: publish apple-client 1.5.7 (#10159 )	2025-08-07 04:38:03 +00:00
Thomas Eizinger	456fde5b60	ci: increase bitrate of direct connection UDP perf tests (#10154 ) We can easily handle 1GBit/s for the direct connections.	2025-08-06 14:02:47 +00:00
Thomas Eizinger	b5e3ee8065	ci: reduce UDP perf test bitrate (#10153 ) Forcing 500MBit/s through a relayed connection in CI makes the user-space relay fall-over and drop control messages, leading to ICE timeouts of the connection.	2025-08-06 09:11:57 +00:00
Firezone Bot	ea960cce74	chore: publish android-client 1.5.3 (#10141 )	2025-08-05 16:38:23 +00:00
Jamil	56f5405849	chore(ci): increase perf test time to 30s (#10133 ) Our ICE timeout is ~15s, so it would be a good idea to ensure the perf tests span a possible ICE timeout if it occurs in the test, so that we may detect cases where high throughput may cause an ICE timeout.	2025-08-05 07:42:17 +00:00
Firezone Bot	3e529ed36c	chore: publish gateway 1.4.15 (#10134 )	2025-08-05 17:17:25 +10:00
Firezone Bot	acf52ccf1e	chore: publish apple-client 1.5.6 (#10106 )	2025-08-02 19:43:35 +00:00
Thomas Eizinger	a7ba15c8c1	ci: test packet loss behaviour using download (#10067 )	2025-08-01 01:55:02 +00:00
Thomas Eizinger	69f9a03ee8	refactor(connlib): simplify `IpPacket` struct (#9795 ) With the removal of the NAT64/46 modules, we can now simplify the internals of our `IpPacket` struct. The requirements for our `IpPacket` struct are somewhat delicate. On the one hand, we don't want to be overly restrictive in our parsing / validation code because there is a lot of broken software out there that doesn't necessarily follow RFCs. Hence, we want to be as lenient as possible in what we accept. On the other hand, we do need to verify certain aspects of the packet, like the payload lengths. At the moment, we are somewhat too lenient there which causes errors on the Gateway where we have to NAT or otherwise manipulate the packets. See #9567 or #9552 for example. To fix this, we make the parsing in the `IpPacket` constructor more restrictive. If it is a UDP, TCP or ICMP packet, we attempt to fully parse its headers and validate the payload lengths. This parsing allows us to then rely on the integrity of the packet as part of the implementation. This does create several code paths that can in theory panic but in practice, should be impossible to hit. To ensure that this does in fact not happen, we also tackle an issue that is long overdue: Fuzzing. Resolves: #6667 Resolves: #9567 Resolves: #9552	2025-07-29 04:42:57 +00:00
Jamil	1763113511	test(ci): test 20% packet loss (#9846 ) Packet loss is a reality on the modern internet. Ideally, Firezone should be able to handle some level of packet loss and still function reliably, especially considering all of the UDP-based protocols we rely on. To test this, we set an extreme packet loss of 20% and perform a 10 MB download through Firezone. Doing so actually exposed a bug: For DNS resources, we need to set up the DNS resource NAT on the Gateway which happens through the p2p control protocol. This packet is resent at most every 2s but only if there are any other DNS queries. If we don't receive another DNS query but get traffic for the resource, we keep buffering those packets without trying to re-send the `AssignedIp`s packet.	2025-07-28 22:51:04 +00:00
Firezone Bot	e6fc7e62da	chore: publish apple-client 1.5.5 (#10035 )	2025-07-28 20:14:12 +00:00
Firezone Bot	2309be11fc	chore: publish headless-client 1.5.2 (#10029 )	2025-07-28 06:17:42 +00:00
Firezone Bot	cf40f4dd96	chore: publish gateway 1.4.14 (#10030 )	2025-07-28 06:14:07 +00:00
Firezone Bot	7b8daf4074	chore: publish gui-client 1.5.6 (#10028 )	2025-07-28 06:08:01 +00:00
Firezone Bot	a11983e4b3	chore: publish gateway 1.4.13 (#9969 )	2025-07-22 18:56:40 +00:00
Thomas Eizinger	47b35d6e3c	ci: increase timeout for download roaming test (#9945 ) Now that we don't tolerate any failures in the download, this test sometimes fails because the timeout is a bit too tight.	2025-07-21 04:06:37 +00:00
Thomas Eizinger	72fbe306b6	test: remove curl retry in favor of keep-alive (#9888 ) At present, the `direct-download-roaming-network` integration test is a bit odd. It uses the `--retry` switch from `curl` to retry the download once it failed. However, what we want to show with this integration test is that a TCP connection can survive network roaming. We can show that successfully but only if we specify the `--keepalive-time` option, otherwise the download stalls. From inspecting the network logs, this is because `curl` simply waits for more data to be downloaded. After a network reset, the connection however is gone and the _client_ (in this case `curl`) needs to send at least 1 packet to re-establish the connection. By using the keep-alive option, we can send such a packet and the download completes successfully.	2025-07-16 16:17:27 +00:00

1 2 3 4 5 ...

562 Commits