Docker for Mac finally supports IPv6 in general availability. It's time
to add IPv6 to our suite of integration tests.
The thinking behind this PR is try and not slow down CI much, if at all,
by testing IPv6 side-by-side with the existing IPv4 tests.
More comprehensive testing is being developed in #10131 that will test
things like IPv4-in-6 relaying, client / gateway IP stack mismatches,
and so forth.
Forcing 500MBit/s through a relayed connection in CI makes the
user-space relay fall-over and drop control messages, leading to ICE
timeouts of the connection.
Our ICE timeout is ~15s, so it would be a good idea to ensure the perf
tests span a possible ICE timeout if it occurs in the test, so that we
may detect cases where high throughput may cause an ICE timeout.
With the removal of the NAT64/46 modules, we can now simplify the
internals of our `IpPacket` struct. The requirements for our `IpPacket`
struct are somewhat delicate.
On the one hand, we don't want to be overly restrictive in our parsing /
validation code because there is a lot of broken software out there that
doesn't necessarily follow RFCs. Hence, we want to be as lenient as
possible in what we accept.
On the other hand, we do need to verify certain aspects of the packet,
like the payload lengths. At the moment, we are somewhat too lenient
there which causes errors on the Gateway where we have to NAT or
otherwise manipulate the packets. See #9567 or #9552 for example.
To fix this, we make the parsing in the `IpPacket` constructor more
restrictive. If it is a UDP, TCP or ICMP packet, we attempt to fully
parse its headers and validate the payload lengths.
This parsing allows us to then rely on the integrity of the packet as
part of the implementation. This does create several code paths that can
in theory panic but in practice, should be impossible to hit. To ensure
that this does in fact not happen, we also tackle an issue that is long
overdue: Fuzzing.
Resolves: #6667Resolves: #9567Resolves: #9552
Packet loss is a reality on the modern internet. Ideally, Firezone
should be able to handle some level of packet loss and still function
reliably, especially considering all of the UDP-based protocols we rely
on.
To test this, we set an extreme packet loss of 20% and perform a 10 MB
download through Firezone. Doing so actually exposed a bug:
For DNS resources, we need to set up the DNS resource NAT on the Gateway
which happens through the p2p control protocol. This packet is resent at
most every 2s but only if there are any other DNS queries. If we don't
receive another DNS query but get traffic for the resource, we keep
buffering those packets without trying to re-send the `AssignedIp`s
packet.
At present, the `direct-download-roaming-network` integration test is a
bit odd. It uses the `--retry` switch from `curl` to retry the download
once it failed. However, what we want to show with this integration test
is that a TCP connection can survive network roaming. We can show that
successfully but only if we specify the `--keepalive-time` option,
otherwise the download stalls.
From inspecting the network logs, this is because `curl` simply waits
for more data to be downloaded. After a network reset, the connection
however is gone and the _client_ (in this case `curl`) needs to send at
least 1 packet to re-establish the connection. By using the keep-alive
option, we can send such a packet and the download completes
successfully.
In Docker environments, applying iptables rules to filter
container-container traffic on the Docker bridged network is not
reliable, leading to direct connections being established in our relayed
tests. To fix this, we insert the rules directly from the client
container itself.
---------
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
[`actionlint`](https://github.com/rhysd/actionlint) is a static analysis
tool for GitHub workflows and actions. It detects various issues ahead
of time and runs shellcheck on all `run` blocks. It is worth noting that
this does **not** lint the contents of composite actions so we still
need to be vigilant when working with those.
A bit of legacy that we have inherited around our Firezone ID is that
the ID stored on the user's device is sha'd before being passed to the
portal as the "external ID". This makes it difficult to correlate IDs in
Sentry and PostHog with the data we have in the portal. For Sentry and
PostHog, we submit the raw UUID stored on the user's device.
As a first step in overcoming this, we embed an "external ID" in those
services as well IF the provided Firezone ID is a valid UUID. This will
allow us to immediately correlate those events.
As a second step, we automatically generate all new Firezone IDs for the
Windows and Linux Client as `hex(sha256(uuid))`. These won't parse as
valid UUIDs and therefore will be submitted as is to the portal.
As a third step, we update all documentation around generating Firezone
IDs to use `uuidgen | sha256` instead of just `uuidgen`. This is
effectively the equivalent of (2) but for the Headless Client and
Gateway where the Firezone ID can be configured via environment
variables.
Resolves: #9382
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
To make releases even more smoother, this PR creates a bit of automation
that automatically bumps the versions in the `scripts/bump-versions.sh`
script and opens a PR for it.
To make our FFI layer between Android and Rust safer, we adopt the
UniFFI tool from Mozilla. UniFFI allows us to create a dedicated crate
(here `client-ffi`) that contains Rust structs annotated with various
attributes. These macros then generate code at compile time that is
built into the shared object. Using a dedicated CLI from the UniFFI
project, we can then generate Kotlin bindings from this shared object.
The primary motivation for this effort is memory safety across the FFI
boundary. Most importantly, we want to ensure that:
- The session pointer is not used after it has been free'd
- Disconnecting the session frees the pointer
- Freeing the session does not happen as part of a callback as that
triggers a cyclic dependency on the Rust side (callbacks are executed on
a runtime and that runtime is dropped as part of dropping the session)
To achieve all of these goals, we move away from callbacks altogether.
UniFFI has great support for async functions. We leverage this support
to expose a `suspend fn` to Android that returns `Event`s. These events
map to the current callback functions. Internally, these events are read
from a channel with a capacity of 1000 events. It is therefore not very
time-critical that the app reads from this channel. `connlib` will
happily continue even if the channel is full. 1000 events should be more
than sufficient though in case the host app cannot immediately process
them. We don't send events very often after all.
This event-based design has major advantages: It allows us to make use
of `AutoCloseable` on the Kotlin side, meaning the `session` pointer is
only ever accessed as part of a `use` block and automatically closed
(and therefore free'd) at the end of the block.
To communicate with the session, we introduce a `TunnelCommand` which
represents all actions that the host app can send to `connlib`. These
are passed through a channel to the `suspend fn` which continuously
listens for events and commands.
Resolves: #9499
Related: #3959
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
The latest release now also sorts workspace dependencies, as well as
different dependency sections. Keeping these things sorted reduces the
chances of merge conflicts when multiple PRs edit these files.