Commit Graph

533 Commits

Author SHA1 Message Date
Jamil
0ccd4bbf24 feat(ci): enable relay eBPF offloading (#10160)
In CI, eBPF in driver mode actually functions just fine with no changes
to our existing tests, given we apply a few workarounds and bugfixes:

- The interface learning mechanism had two flaws: (1) it only learned
per-CPU, which meant the risk for a missing entry grew as the core count
of the relay host grew, and (2) it did not filter for unicast IPs, so it
picked up broadcast and link-local addresses, causing cross-relay paths
to fail occasionally
- The `relay-relay` candidate where the two relays are the same relay
causes packet drops / loops in the Docker bridge setup, and possibly in
GCP too. I'm not sure this is a valid path that solves a real
connectivity issue in the wild. I can understand relay-relay paths where
two relays are different hosts, and the client and gateway both talk
over their TURN channel to each other (i.e. WireGuard is blocked in each
of their networks), but I can't think of an advantage for a relay-relay
candidate where the traffic simply hairpins (or is dropped) off the
nearest switch. This has been now detected with a new `PacketLoop` error
that triggers whenever source_ip == dest_ip.
- The relays in CI need a common next-hop to talk to for the MAC address
swapping to work. A simple router service is added which functions as a
basic L3 router (no NAT) that allows the MAC swapping to work.
- The `veth` driver has some peculiar requirements to allow it to
function with XDP_TX. If you send a packet out of one interface of a
veth pair with XDP_TX, you need to either make sure both interfaces have
GRO enabled, or you need to attach a dummy XDP program that simply does
XDP_PASS to the other interface so that the sk_buff is allocated before
going up the stack to the Docker bridge. The GRO method was unreliable
and didn't work in our case, causing massive packet delays and
unpredictable bursts that prevented ICE from working, so we use the
XDP_PASS method instead. A simple docker image is built and lives at
https://github.com/firezone/xdp-pass to handle this.

Related: #10138 
Related: #10260
2025-08-31 23:37:03 +00:00
Jamil
516be7417e fix(ci): remove extraneous caching (#10258)
- Removes the swift DerivedData cache. This was added to attempt to
speed up the Swift builds in CI but in reality, those are already fast
and the cache did not speed them up.
- Removes the runner.os/arch specifier from the Webview installer cache
key. The binary download is hardcoded for a specific windows version /
arch already so the cache key just adds unneeded complexity.

These caches are getting saved on PR runs which consumes excess GHA
cache storage.
2025-08-27 05:01:02 -07:00
Jamil
8eb738e66a chore(ci): downgrade runners to free tier (#10248)
To avoid burning Azure credits, we move the runners back down to the
free tier. Now that caching is properly set up, this should incur only a
minor increase in CI time.
2025-08-26 10:48:45 -07:00
Jamil
0698e0d35f ci: test IPv6 for CIDR resources (#10168)
Docker for Mac finally supports IPv6 in general availability. It's time
to add IPv6 to our suite of integration tests.

The thinking behind this PR is try and not slow down CI much, if at all,
by testing IPv6 side-by-side with the existing IPv4 tests.

More comprehensive testing is being developed in #10131 that will test
things like IPv4-in-6 relaying, client / gateway IP stack mismatches,
and so forth.
2025-08-18 20:59:40 +00:00
Firezone Bot
95ee111e62 chore: publish apple-client 1.5.7 (#10159) 2025-08-07 04:38:03 +00:00
Thomas Eizinger
456fde5b60 ci: increase bitrate of direct connection UDP perf tests (#10154)
We can easily handle 1GBit/s for the direct connections.
2025-08-06 14:02:47 +00:00
Thomas Eizinger
b5e3ee8065 ci: reduce UDP perf test bitrate (#10153)
Forcing 500MBit/s through a relayed connection in CI makes the
user-space relay fall-over and drop control messages, leading to ICE
timeouts of the connection.
2025-08-06 09:11:57 +00:00
Firezone Bot
ea960cce74 chore: publish android-client 1.5.3 (#10141) 2025-08-05 16:38:23 +00:00
Jamil
56f5405849 chore(ci): increase perf test time to 30s (#10133)
Our ICE timeout is ~15s, so it would be a good idea to ensure the perf
tests span a possible ICE timeout if it occurs in the test, so that we
may detect cases where high throughput may cause an ICE timeout.
2025-08-05 07:42:17 +00:00
Firezone Bot
3e529ed36c chore: publish gateway 1.4.15 (#10134) 2025-08-05 17:17:25 +10:00
Firezone Bot
acf52ccf1e chore: publish apple-client 1.5.6 (#10106) 2025-08-02 19:43:35 +00:00
Thomas Eizinger
a7ba15c8c1 ci: test packet loss behaviour using download (#10067) 2025-08-01 01:55:02 +00:00
Thomas Eizinger
69f9a03ee8 refactor(connlib): simplify IpPacket struct (#9795)
With the removal of the NAT64/46 modules, we can now simplify the
internals of our `IpPacket` struct. The requirements for our `IpPacket`
struct are somewhat delicate.

On the one hand, we don't want to be overly restrictive in our parsing /
validation code because there is a lot of broken software out there that
doesn't necessarily follow RFCs. Hence, we want to be as lenient as
possible in what we accept.

On the other hand, we do need to verify certain aspects of the packet,
like the payload lengths. At the moment, we are somewhat too lenient
there which causes errors on the Gateway where we have to NAT or
otherwise manipulate the packets. See #9567 or #9552 for example.

To fix this, we make the parsing in the `IpPacket` constructor more
restrictive. If it is a UDP, TCP or ICMP packet, we attempt to fully
parse its headers and validate the payload lengths.

This parsing allows us to then rely on the integrity of the packet as
part of the implementation. This does create several code paths that can
in theory panic but in practice, should be impossible to hit. To ensure
that this does in fact not happen, we also tackle an issue that is long
overdue: Fuzzing.

Resolves: #6667 
Resolves: #9567
Resolves: #9552
2025-07-29 04:42:57 +00:00
Jamil
1763113511 test(ci): test 20% packet loss (#9846)
Packet loss is a reality on the modern internet. Ideally, Firezone
should be able to handle some level of packet loss and still function
reliably, especially considering all of the UDP-based protocols we rely
on.

To test this, we set an extreme packet loss of 20% and perform a 10 MB
download through Firezone. Doing so actually exposed a bug:

For DNS resources, we need to set up the DNS resource NAT on the Gateway
which happens through the p2p control protocol. This packet is resent at
most every 2s but only if there are any other DNS queries. If we don't
receive another DNS query but get traffic for the resource, we keep
buffering those packets without trying to re-send the `AssignedIp`s
packet.
2025-07-28 22:51:04 +00:00
Firezone Bot
e6fc7e62da chore: publish apple-client 1.5.5 (#10035) 2025-07-28 20:14:12 +00:00
Firezone Bot
2309be11fc chore: publish headless-client 1.5.2 (#10029) 2025-07-28 06:17:42 +00:00
Firezone Bot
cf40f4dd96 chore: publish gateway 1.4.14 (#10030) 2025-07-28 06:14:07 +00:00
Firezone Bot
7b8daf4074 chore: publish gui-client 1.5.6 (#10028) 2025-07-28 06:08:01 +00:00
Firezone Bot
a11983e4b3 chore: publish gateway 1.4.13 (#9969) 2025-07-22 18:56:40 +00:00
Thomas Eizinger
47b35d6e3c ci: increase timeout for download roaming test (#9945)
Now that we don't tolerate any failures in the download, this test
sometimes fails because the timeout is a bit too tight.
2025-07-21 04:06:37 +00:00
Thomas Eizinger
72fbe306b6 test: remove curl retry in favor of keep-alive (#9888)
At present, the `direct-download-roaming-network` integration test is a
bit odd. It uses the `--retry` switch from `curl` to retry the download
once it failed. However, what we want to show with this integration test
is that a TCP connection can survive network roaming. We can show that
successfully but only if we specify the `--keepalive-time` option,
otherwise the download stalls.

From inspecting the network logs, this is because `curl` simply waits
for more data to be downloaded. After a network reset, the connection
however is gone and the _client_ (in this case `curl`) needs to send at
least 1 packet to re-establish the connection. By using the keep-alive
option, we can send such a packet and the download completes
successfully.
2025-07-16 16:17:27 +00:00
Thomas Eizinger
cf2470ba1e test(iperf): install iptables rule inside of container (#9880)
In Docker environments, applying iptables rules to filter
container-container traffic on the Docker bridged network is not
reliable, leading to direct connections being established in our relayed
tests. To fix this, we insert the rules directly from the client
container itself.

---------

Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
2025-07-16 10:29:33 +00:00
Jamil
12351e5985 ci: publish apple 1.5.4 clients (#9842) 2025-07-11 16:35:25 +00:00
Thomas Eizinger
55aef6ae11 chore: publish gui-client 1.5.5 (#9811) 2025-07-09 12:44:38 +00:00
Jamil
4a02e89b43 ci: publish headless 1.5.1 (#9791) 2025-07-05 08:18:14 +00:00
Thomas Eizinger
0617c96b69 chore(nix): follow system release channel (#9784)
In order to avoid glibc mismatches, the dev-shell started by the nix
flake should follow the system channel instead of pinning a version by
itself.
2025-07-04 14:27:53 +00:00
Jamil
4091457788 ci: publish android 1.5.2 (#9735)
**NOTE**: This is for last week's release of 1.5.2. We will still need
to do a release to cut 1.5.3.
2025-07-01 14:11:48 +00:00
Jamil
a4cf3ead0f ci: publish gateway 1.4.12 (#9736) 2025-07-01 14:04:21 +00:00
Jamil
699739deae fix(docs): use sha256sum over sha256 (#9690)
`sha256` isn't found by default on some machines.
2025-06-27 20:08:41 +00:00
Jamil
1acfcd5678 fix(gateway): bump timeoutstart to 15s (#9685)
3s on a fresh install may be too low for the binary to download.

Related:
https://firezonehq.slack.com/archives/C08FPHECLUF/p1750946191078759?thread_ts=1750940488.328739&cid=C08FPHECLUF
2025-06-26 14:19:44 +00:00
Thomas Eizinger
40f0609d90 ci: lint GitHub workflows with actionlint (#9590)
[`actionlint`](https://github.com/rhysd/actionlint) is a static analysis
tool for GitHub workflows and actions. It detects various issues ahead
of time and runs shellcheck on all `run` blocks. It is worth noting that
this does **not** lint the contents of composite actions so we still
need to be vigilant when working with those.
2025-06-24 08:05:10 +00:00
Thomas Eizinger
a91dda139f feat(connlib): only conditionally hash firezone ID (#9633)
A bit of legacy that we have inherited around our Firezone ID is that
the ID stored on the user's device is sha'd before being passed to the
portal as the "external ID". This makes it difficult to correlate IDs in
Sentry and PostHog with the data we have in the portal. For Sentry and
PostHog, we submit the raw UUID stored on the user's device.

As a first step in overcoming this, we embed an "external ID" in those
services as well IF the provided Firezone ID is a valid UUID. This will
allow us to immediately correlate those events.

As a second step, we automatically generate all new Firezone IDs for the
Windows and Linux Client as `hex(sha256(uuid))`. These won't parse as
valid UUIDs and therefore will be submitted as is to the portal.

As a third step, we update all documentation around generating Firezone
IDs to use `uuidgen | sha256` instead of just `uuidgen`. This is
effectively the equivalent of (2) but for the Headless Client and
Gateway where the Firezone ID can be configured via environment
variables.

Resolves: #9382

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2025-06-24 07:05:48 +00:00
Jamil
081b075f2c chore: bump gui, apple, gateway (#9586)
The new publish automation still [has some
kinks](https://github.com/firezone/firezone/actions/runs/15764891111) so
publishing this manually.
2025-06-19 12:29:46 -07:00
Jamil
f50fa95778 fix(ci): lock xcode major (#9585)
Apple won't allow apps built with Xcode betas to be reviewed.

<img width="1146" alt="Screenshot 2025-06-19 at 9 04 17 AM"
src="https://github.com/user-attachments/assets/11470f04-603b-4c5c-aad2-fba0e4eb391a"
/>
2025-06-19 09:21:58 -07:00
Jamil
1856749d1b fix(ci): download iOS SDK explicitly (#9572)
The runners don't seem to have this installed any longer by default.

Related:
https://github.com/actions/runner-images/issues/12355#issuecomment-2977352487
2025-06-18 19:22:17 +00:00
Thomas Eizinger
bc854e1f9a ci: automatically create PR after publishing release (#9556)
To make releases even more smoother, this PR creates a bit of automation
that automatically bumps the versions in the `scripts/bump-versions.sh`
script and opens a PR for it.
2025-06-18 06:17:18 +00:00
Thomas Eizinger
faeb958882 refactor: use UniFFI for Android FFI (#9415)
To make our FFI layer between Android and Rust safer, we adopt the
UniFFI tool from Mozilla. UniFFI allows us to create a dedicated crate
(here `client-ffi`) that contains Rust structs annotated with various
attributes. These macros then generate code at compile time that is
built into the shared object. Using a dedicated CLI from the UniFFI
project, we can then generate Kotlin bindings from this shared object.

The primary motivation for this effort is memory safety across the FFI
boundary. Most importantly, we want to ensure that:

- The session pointer is not used after it has been free'd
- Disconnecting the session frees the pointer
- Freeing the session does not happen as part of a callback as that
triggers a cyclic dependency on the Rust side (callbacks are executed on
a runtime and that runtime is dropped as part of dropping the session)

To achieve all of these goals, we move away from callbacks altogether.
UniFFI has great support for async functions. We leverage this support
to expose a `suspend fn` to Android that returns `Event`s. These events
map to the current callback functions. Internally, these events are read
from a channel with a capacity of 1000 events. It is therefore not very
time-critical that the app reads from this channel. `connlib` will
happily continue even if the channel is full. 1000 events should be more
than sufficient though in case the host app cannot immediately process
them. We don't send events very often after all.

This event-based design has major advantages: It allows us to make use
of `AutoCloseable` on the Kotlin side, meaning the `session` pointer is
only ever accessed as part of a `use` block and automatically closed
(and therefore free'd) at the end of the block.

To communicate with the session, we introduce a `TunnelCommand` which
represents all actions that the host app can send to `connlib`. These
are passed through a channel to the `suspend fn` which continuously
listens for events and commands.

Resolves: #9499
Related: #3959

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
2025-06-17 21:48:34 +00:00
Jamil
9701cfca0f chore: publish gui 1.5.3 (#9547) 2025-06-17 10:04:04 +00:00
Jamil
5e3c240501 chore: publish gui 1.5.2 (#9516) 2025-06-12 17:16:04 +00:00
Jamil
ab01a1ef91 chore: bump gui to 1.5.1 (#9440) 2025-06-05 21:30:08 +00:00
Jamil
1e94afdb98 chore: move terraform/ to private repo (#9421)
Since we'll be adding ops playbooks and other things here, it makes
sense to separate infra from product source.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-06-05 19:24:06 +00:00
Jamil
51e13d453f chore: publish GUI client 1.5.0 (#9413)
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2025-06-05 09:06:28 +00:00
Jamil
b60d77cef4 chore: publish gateway 1.4.10 (#9412) 2025-06-05 08:55:13 +00:00
Jamil
6683178c8b chore: publish headless client 1.5.0 (#9414) 2025-06-05 08:07:18 +00:00
Jamil
7498d992cb chore: publish android 1.5.1 (#9405) 2025-06-05 03:24:32 +00:00
Jamil
221ffc7e58 chore: publish Apple 1.5.2 (#9385) 2025-06-03 19:49:06 +00:00
Jamil
a7af119668 chore: publish android 1.5.0 (#9378) 2025-06-03 06:58:22 +00:00
Thomas Eizinger
e05c98bfca ci: update to new cargo sort release (#9354)
The latest release now also sorts workspace dependencies, as well as
different dependency sections. Keeping these things sorted reduces the
chances of merge conflicts when multiple PRs edit these files.
2025-06-02 02:01:09 +00:00
Jamil
0c0ab13b90 ci: Bump apple version to 1.5.1 (#9343) 2025-06-01 16:43:31 +00:00
Thomas Eizinger
56ff469f03 refactor(gui-client): merge all windows into a single view (#9295)
This PR refactors the GUI clients frontend into a single window with a
sidebar. The functionality remains the same but we do make minor UI
improvements on the way. To pull the entire refactor off, we now use
`react` and `flowbite-react` for the GUI. All the communication with the
backend is moved towards one-way commands and events. That means, the
flow is always:

- Backend emits events to update frontend
- Frontend triggers actions in the backend that may or may not result in
further events

This allows us to decouple the GUI from knowing about which side-effects
change what parts of the state. Instead, it simply updates whenever it
receives an event.

- The previous "Advanced Settings" screen is now split into two parts:
Settings and Diagnostics. Later, we will add a "General settings" page
here.
- The tray menu remains identical to the current one. When the user
clicks "Settings" or "About", we open the window and navigate to the
corresponding page.
- The app and git version are now directly embedded in the frontend,
simplifying the interaction between the frontend and the backend
further.

|Before|After|
|---|---|

|![](https://github.com/user-attachments/assets/7e8039c8-d589-495e-92b4-1742f9eb01b2)|![](https://github.com/user-attachments/assets/363184b3-4fcf-45c2-82d2-c466902759ef)|

|![](https://github.com/user-attachments/assets/88163522-cafc-4ad4-90cd-be2e77073b7f)|![](https://github.com/user-attachments/assets/106ef921-38a7-4603-add9-8b0875064737)|

|![](https://github.com/user-attachments/assets/a4bef4b0-5b29-43dd-aea6-0babd3b4ec9e)|![](https://github.com/user-attachments/assets/b84f1b51-c35c-48cc-9335-c653eee597ff)

|![](https://github.com/user-attachments/assets/f0473a0a-cdba-4a15-af98-d97ef422dbc5)|![](https://github.com/user-attachments/assets/ddced01b-6f44-4241-80ea-038a4740915b)|
2025-05-31 01:57:40 +00:00