The current Rust workspace isn't as consistent as it could be. To make
navigation a bit easier, we move a few crates around. Generally, we
follow the idea that entry-points should be at the top-level. `rust/`
now looks like this (directories only):
```
.
├── cli # Firezone CLI
├── client-ffi # Entry point for Apple & Android
├── gateway # Gateway
├── gui-client # GUI client
├── headless-client # Headless client
├── libs # Library crates
├── relay # Relay
├── target # Compile artifacts
├── tests # Crates for testing
└── tools # Local tools
```
To further enforce this structure, we also drop the `firezone-` prefix
from all crates that are not top-level binary crates.
Right now, connlib hands out a `BiMap` of sentinel IPs <> upstream
servers whenever it emits a `TunInterfaceUpdated` event. This `BiMap`
internally uses two `HashMap`s. The iteration order of `HashMap`s is
non-deterministic and therefore, we lose the order in which the upstream
/ system resolvers have been passed to us originally.
To prevent that, we now emit a dedicated `DnsMapping` type that does not
expose its internal data structure but only getters for retrieving the
sentinel and upstream servers. Internally, it uses a `Vec` to store this
mapping and thus retains the original order. This is asserted as part of
our proptests by comparing the resulting `Vec`s.
This fix is preceded by a few refactorings that encapsulate the code for
creating and updating this DNS mapping.
Resolves: #8439
Rust 1.91 has been released and brings with it a few new lints that we
need to tidy up. In addition, it also stabilizes `BTreeMap::extract_if`:
A really nifty std-lib function that allows us to conditionally take
elements from a map. We need that in a bunch of places.
Bumps [secrecy](https://github.com/iqlusioninc/crates) from 0.8.0 to
0.10.3.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/iqlusioninc/crates/commits">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Upon moving the version string from PKG_VERSION and Cargo.toml, we lost
the bump version automation. To avoid more bugs here in the future, we
now check for the version marker across all Git-tracked files,
regardless of their extension.
Fixes#10748
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
When working on the `client-ffi` module on a Linux or Windows machine,
we currently see a lot of "unused code" warnings. We could feature-gate
the remaining functions too but that would result in not having
code-completion on those platforms at all.
To make working on this module more ergonomic, we add a dummy
constructor for the session.
As far as I can tell, the `async_runtime` config option doesn't exist in
UniFFI, hence we remove that.
Whilst going through the UniFFI docs, I also noticed that there is a
specific flag about Android that we can toggle on. Effectively, this
uses the shared
[`SystemCleaner`](https://developer.android.com/reference/android/system/SystemCleaner)
instead of a per-thread one which is supposed to be more performant.
Finally, using immutable records seems like a good idea as mutating any
FFI-originated field is not going to be reflected in connlib's state.
Preventing that at compile-time has a good chance of reducing bugs.
In the spirit of making Firezone as robust as possible, we make the FFI
calls infallible and complete as much of the task as possible. For
example, we don't fail `setDns` entirely just because we cannot parse a
single DNS server's IP.
Resolves: #10611
This PR eliminates JSON-based communication across the FFI boundary,
replacing it with proper
uniffi-generated types for improved type safety, performance, and
reliability. We replace JSON string parameters with native uniffi types
for:
- Resources (DNS, CIDR, Internet)
- Device information
- DNS server lists
- Network routes (CIDR representation)
Also, get rid of JSON serialisation in Swift client IPC in favour of
PropertyList based serialisation.
Fixes: https://github.com/firezone/firezone/issues/9548
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Building on top of #10507, setting the initial Internet Resource state
is a piece of cake. All we need to do is thread a boolean variable
through to all call-sites of `Session::connect`. Without the need for
the Internet Resource's ID, we can simply pass in the boolean that is
saved in the configuration of each client.
Resolves: #10255
Instead of the generic "disable any kind of resource"-functionality that
connlib currently exposes, we now provide an API to only enable /
disable the Internet Resource. This is a lot simpler to deal with and
reason about than the previous system, especially when it comes to the
proptests. Those need to model connlib's behaviour correctly across its
entire API surface which makes them unnecessarily complex if we only
ever use the `set_disabled_resources` API with a single resource.
In preparation for #4789, I want to extend the proptests to cover
traffic filters (#7126). This will make them a fair bit more
complicated, so any prior removal of complexity is appreciated.
Simplifying the implementation here is also a good starting point to fix
#10255. Not implicitly enabling the Internet Resource when it gets added
should be quite simple after this change.
Finally, resolving #8885 should also be quite easy. We just need to
store the state of the Internet Resource once per API URL instead of
globally.
Resolves: #8404
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
In order to allow the portal to more easily classify, what kind of
component is connecting, we extend the `get_user_agent` header to
include a component type instead of the generic `connlib/`.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
In #10076, connlib gained the ability to gracefully close connections
between peers. The Gateway already uses this when it is being gracefully
shutdown such as during an upgrade. This allows Clients to immediately
fail-over to a different Gateway instead of waiting for an ICE timeout.
When a Client signs out, we currently just drop all the state, resulting
in an ICE timeout on the Gateway ~15 seconds later. This makes it
difficult for us to analyze, whether an ICE timeout in the logs presents
an actual problem where a network connection got cut or whether the
Client simply signed out.
Whilst not water-tight, attempting to gracefully close our connections
when the Client signs out is better than nothing so we implement this
here.
All Clients use the `Session` abstraction from `client-shared` which
spawns the event-loop into a dedicated task.
- For the Linux and Windows GUI client, the already present tokio
runtime instance of the tunnel service is used for this.
- For Android and Apple, we create a dedicated, single-threaded runtime
instance for connlib.
- For the headless client, we also reuse the already existing tokio
runtime instance of the binary.
In case of Android, Apple and the headless client, this means we need to
ensure the tokio runtime instances stays alive long enough to actually
complete the graceful shutdown task. We achieve this by draining the
`EventStream` returned from `Session`. The `EventStream` is a wrapper
around a channel connected to the event-loop. This stream only finishes
once the event-loop is entirely dropped (and therefore completed the
graceful shutdown) as it holds the sender-end of the channel.
In case of the Linux and Windows GUI client, the runtime outlives the
`Session` because it is scoped to the entire tunnel process. Therefore,
no additional measures are necessary there to ensure the graceful
shutdown task completes.
Instead of recording the queue depths on every event-loop tick, we now
record them once a second by setting a Gauge. Not only is that a simpler
instrument to work with but it is significantly more performant. The
current version - when metrics are enabled - takes on quite a bit of CPU
time.
Resolves: #10237
With the introduction of the pre-resolved Sentry host, all Firezone
clients now require Internet on startup. That is a signficant usability
hit that we can easily fix by simply falling back to resolving the host
on-demand.
We always end up allow this lint when it pops up so we can also just
allow it for the whole repo in general. Most of the time, the reason for
too many arguments are borrow-checker limitations of Rust where mutable
references need to be tracked explicitly.
Right now, the Client event-loops have a channel with 1000 items for
sending new resource lists and updates to the TUN device to the host
app. This is kind of unnecessary as we always only care about the last
version of these. Intermediate updates that the host app doesn't process
are effectively irrelevant.
We've had an issue before where a bug in the portal caused us to receive
many updates to resources which ended up crashing Client apps because
this channel filled up.
To be more resilient on this front, we refactor the Client event loop to
use a `watch` channel for this. Watch channels only retain the last
value that got sent into them.
Our Sentry client needs to resolve DNS before being able to send logs or
errors to the backend. Currently, this DNS resolution happens on-demand
as we don't take any control of the underlying HTTP client.
In addition, this will use HTTP/1.1 by default which isn't as efficient
as it could be, especially with concurrent requests.
Finally, if we decide to ever proxy all Sentry for traffic through our
own domain, we have to take control of the underlying client anyway.
To resolve all of the above, we create a custom `TransportFactory` where
we reuse the existing `ReqwestHttpTransport` but provide an already
configured `reqwest::Client` that always uses HTTP/2 with a
pre-configured set of DNS records for the given ingest host.
By default, dropping a `tokio` runtime waits until all tasks have
finished. The tasks we spawn within `connlib` can have complex
dependencies with each other. To ensure that we can shut down in any
case and don't hang, we apply a timeout of 1s to the runtime.
Applying a filter globally to the entire subscriber means it filters
events for all layers. This prevents the Sentry layer from uploading
DEBUG logs if configured.
At present, our primary indicator as to whether telemetry is active is
whether we have a Sentry session. For our analytics events however, we
currently require passing in the Firezone ID and API url again. This
makes it difficult to send analytics events from areas of the code that
don't have this information available.
To still allow for that, we integrate the `analytics` module more
tightly with the Sentry session. This allows us to drop two parameters
from the `$identify` event and also means we now respect the
`NO_TELEMETRY` setting for these events except for `new_session`. This
event is sent regardless because it allows us to track, how many on-prem
installations of Firezone are out there.
Sentry has a new "Logs" feature where we can stream logs directly to
Sentry. Doing this for all Clients and Gateways would be way too much
data to collect though.
In order to aid debugging from customer installations, we add a
PostHog-managed feature flag that - if set to `true` - enables the
streaming of logs to Sentry. This feature flag is evaluated every time
the telemetry context is initialised:
- For all FFI usages of connlib, this happens every time a new session
is created.
- For the Windows/Linux Tunnel service, this also happens every time we
create a new session.
- For the Headless Client and Gateway, it happens on startup and
afterwards, every minute. The feature-flag context itself is only
checked every 5 minutes though so it might take up to 5 minutes before
this takes effect.
The default value - like all feature flags - is `false`. Therefore, if
there is any issue with the PostHog service, we will fallback to the
previous behaviour where logs are simply stored locally.
Resolves: #9600
In order to more easily target customers with certain feature flags, we
include the `account_slug` in the `$identify` event to PostHog. This
will allow us to create Cohorts in PostHog and enable / disable feature
flags for all installations of Firezone for a particular customer.
To make our FFI layer between Android and Rust safer, we adopt the
UniFFI tool from Mozilla. UniFFI allows us to create a dedicated crate
(here `client-ffi`) that contains Rust structs annotated with various
attributes. These macros then generate code at compile time that is
built into the shared object. Using a dedicated CLI from the UniFFI
project, we can then generate Kotlin bindings from this shared object.
The primary motivation for this effort is memory safety across the FFI
boundary. Most importantly, we want to ensure that:
- The session pointer is not used after it has been free'd
- Disconnecting the session frees the pointer
- Freeing the session does not happen as part of a callback as that
triggers a cyclic dependency on the Rust side (callbacks are executed on
a runtime and that runtime is dropped as part of dropping the session)
To achieve all of these goals, we move away from callbacks altogether.
UniFFI has great support for async functions. We leverage this support
to expose a `suspend fn` to Android that returns `Event`s. These events
map to the current callback functions. Internally, these events are read
from a channel with a capacity of 1000 events. It is therefore not very
time-critical that the app reads from this channel. `connlib` will
happily continue even if the channel is full. 1000 events should be more
than sufficient though in case the host app cannot immediately process
them. We don't send events very often after all.
This event-based design has major advantages: It allows us to make use
of `AutoCloseable` on the Kotlin side, meaning the `session` pointer is
only ever accessed as part of a `use` block and automatically closed
(and therefore free'd) at the end of the block.
To communicate with the session, we introduce a `TunnelCommand` which
represents all actions that the host app can send to `connlib`. These
are passed through a channel to the `suspend fn` which continuously
listens for events and commands.
Resolves: #9499
Related: #3959
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>