Commit Graph

150 Commits

Author SHA1 Message Date
Thomas Eizinger
8bd8098cab refactor(connlib): don't re-implement waker for TUN thread (#7944)
Within `connlib` - on UNIX platforms - we have dedicated threads that
read from and write to the TUN device. These threads are connected with
`connlib`'s main thread via bounded channels: one in each direction.
When these channels are full, `connlib`'s main thread will suspend and
not read any network packets from the sockets in order to maintain
back-pressure. Reading more packets from the socket would mean most
likely sending more packets out the TUN device.

When debugging #7763, it became apparent that _something_ must be wrong
with these threads and that somehow, we either consider them as full or
aren't emptying them and as a result, we don't read _any_ network
packets from our sockets.

To maintain back-pressure here, we currently use our own `AtomicWaker`
construct that is shared with the TUN thread(s). This is unnecessary. We
can also directly convert the `flume::Sender` into a
`flume::async::SendSink` and therefore directly access a `poll`
interface.
2025-01-29 15:48:48 +00:00
Jamil
7e231c6b10 chore: Release Android 1.4.1 (#7911) 2025-01-29 00:29:15 +00:00
Thomas Eizinger
c6492d4832 fix(rust): don't start all log files with connlib. (#7853)
At present, the file logger for all Rust code starts each logfile with
`connlib.`. This is very confusing when exporting the logs from the GUI
client because even the logs from the client itself will start with
`connlib.`. To fix this, we make the base file name of the log file
configurable.
2025-01-28 01:35:05 +00:00
Thomas Eizinger
ed5285268d refactor: merge on_update_routes and on_set_interface_config (#7699)
For a while now, `connlib` has been calling these two callbacks right
after each other because the internal event already bundles all the
information about the TUN device. With this PR, we merge the two
callback functions also in layers above `connlib` itself.

Resolves: #6182.
2025-01-08 18:26:40 +00:00
Thomas Eizinger
037a2e64b6 fix(connlib): attempt to detect runtime shutdown within TUN task (#7605)
Reading and writing to the TUN device within `connlib` happens in a
separate thread. The task running within these threads is connected to
the rest of `connlib` via channels. When the application shuts down,
these threads also need to exit. Currently, we attempt to detect this
from within the task when these channels close. It appears that there is
a race condition here because we first attempt to read from the TUN
device before reading from the channels. We treat read & write errors on
the TUN device as non-fatal so we loop around and attempt to read from
it again, causing an infinite-loop and log spam.

To fix this, we swap the order in which we evaluate the two concurrent
tasks: The first task to be polled is now the channel for outbound
packets and only if that one is empty, we attempt to read new packets
from the TUN device. This is also better from a backpressure point of
view: We should attempt to flush out our local buffers of already
processed packets before taking on "new work".

As a defense-in-depth strategy, we also attempt to detect the particular
error from the tokio runtime when it is being shut down and exit the
task.

Resolves: #7601.
Related: https://github.com/tokio-rs/tokio/issues/7056.
2025-01-05 20:41:24 +00:00
Jamil
309914a45d chore(android): release version 1.4.0 (#7649)
Bumps the Android client to the 1.4.0 release.

Tested in Android emulator.
2025-01-03 14:45:00 +00:00
Thomas Eizinger
81f71cba62 fix(telemetry): use package@version notation for releases (#7466)
In order for Sentry to parse our releases as semver, they need to be in
the form of `package@version` [0]. Without this, the feature of "Mark
this issue as resolved in the _next_ version" doesn't work properly
because Sentry compares the versions as to when it first saw them vs
parsing the semver string itself. We test versions prior to releasing
them, meaning Sentry learns about a 1.4.0 version before it is actually
released. This causes false-positive "regressions" even though they are
fixed in a later (as per semver) release.

This create some redundancy with the different DSNs that we are already
using. I think it would make sense to consider merging the two projects
we have for the GUI client for example. That is really just one project
that happens to run as two binaries.

For all other projects, I think the separation still makes sense because
we e.g. may add Sentry to the "host" applications of Android and
MacOS/iOS as well. For those, we would reuse the DSN and thus funnel the
issues into the same Sentry project.

As per Sentry's docs, releases are organisation-wide and therefore need
a package identifier to be grouped correctly.

[0]:
https://docs.sentry.io/platforms/javascript/configuration/releases/#bind-the-version
2024-12-09 05:04:45 +00:00
Thomas Eizinger
ddce9312ea fix(android): apply new log-filter on repeated connect call (#7461)
Related: #7460.
Resolves: #5634.
2024-12-06 04:45:28 +00:00
Thomas Eizinger
90cf191a7c feat(linux): multi-threaded TUN device operations (#7449)
## Context

At present, we only have a single thread that reads and writes to the
TUN device on all platforms. On Linux, it is possible to open the file
descriptor of a TUN device multiple times by setting the
`IFF_MULTI_QUEUE` option using `ioctl`. Using multi-queue, we can then
spawn multiple threads that concurrently read and write to the TUN
device. This is critical for achieving a better throughput.

## Solution

`IFF_MULTI_QUEUE` is a Linux-only thing and therefore only applies to
headless-client, GUI-client on Linux and the Gateway (it may also be
possible on Android, I haven't tried). As such, we need to first change
our internal abstractions a bit to move the creation of the TUN thread
to the `Tun` abstraction itself. For this, we change the interface of
`Tun` to the following:

- `poll_recv_many`: An API, inspired by tokio's `mpsc::Receiver` where
multiple items in a channel can be batch-received.
- `poll_send_ready`: Mimics the API of `Sink` to check whether more
items can be written.
- `send`: Mimics the API of `Sink` to actually send an item.

With these APIs in place, we can implement various (performance)
improvements for the different platforms.

- On Linux, this allows us to spawn multiple threads to read and write
from the TUN device and send all packets into the same channel. The `Io`
component of `connlib` then uses `poll_recv_many` to read batches of up
to 100 packets at once. This ties in well with #7210 because we can then
use GSO to send the encrypted packets in single syscalls to the OS.
- On Windows, we already have a dedicated recv thread because `WinTun`'s
most-convenient API uses blocking IO. As such, we can now also tie into
that by batch-receiving from this channel.
- In addition to using multiple threads, this API now also uses correct
readiness checks on Linux, Darwin and Android to uphold backpressure in
case we cannot write to the TUN device.

## Configuration

Local testing has shown that 2 threads give the best performance for a
local `iperf3` run. I suspect this is because there is only so much
traffic that a single application (i.e. `iperf3`) can generate. With
more than 2 threads, the throughput actually drops drastically because
`connlib`'s main thread is too busy with lock-contention and triggering
`Waker`s for the TUN threads (which mostly idle around if there are 4+
of them). I've made it configurable on the Gateway though so we can
experiment with this during concurrent speedtests etc.

In addition, switching `connlib` to a single-threaded tokio runtime
further increased the throughput. I suspect due to less task / context
switching.

## Results

Local testing with `iperf3` shows some very promising results. We now
achieve a throughput of 2+ Gbit/s.

```
Connecting to host 172.20.0.110, port 5201
Reverse mode, remote host 172.20.0.110 is sending
[  5] local 100.80.159.34 port 57040 connected to 172.20.0.110 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   274 MBytes  2.30 Gbits/sec
[  5]   1.00-2.00   sec   279 MBytes  2.34 Gbits/sec
[  5]   2.00-3.00   sec   216 MBytes  1.82 Gbits/sec
[  5]   3.00-4.00   sec   224 MBytes  1.88 Gbits/sec
[  5]   4.00-5.00   sec   234 MBytes  1.96 Gbits/sec
[  5]   5.00-6.00   sec   238 MBytes  2.00 Gbits/sec
[  5]   6.00-7.00   sec   229 MBytes  1.92 Gbits/sec
[  5]   7.00-8.00   sec   222 MBytes  1.86 Gbits/sec
[  5]   8.00-9.00   sec   223 MBytes  1.87 Gbits/sec
[  5]   9.00-10.00  sec   217 MBytes  1.82 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.30 GBytes  1.98 Gbits/sec  22247             sender
[  5]   0.00-10.00  sec  2.30 GBytes  1.98 Gbits/sec                  receiver

iperf Done.
```

This is a pretty solid improvement over what is in `main`:

```
Connecting to host 172.20.0.110, port 5201
[  5] local 100.65.159.3 port 56970 connected to 172.20.0.110 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  90.4 MBytes   758 Mbits/sec  1800    106 KBytes
[  5]   1.00-2.00   sec  93.4 MBytes   783 Mbits/sec  1550   51.6 KBytes
[  5]   2.00-3.00   sec  92.6 MBytes   777 Mbits/sec  1350   76.8 KBytes
[  5]   3.00-4.00   sec  92.9 MBytes   779 Mbits/sec  1800   56.4 KBytes
[  5]   4.00-5.00   sec  93.4 MBytes   783 Mbits/sec  1650   69.6 KBytes
[  5]   5.00-6.00   sec  90.6 MBytes   760 Mbits/sec  1500   73.2 KBytes
[  5]   6.00-7.00   sec  87.6 MBytes   735 Mbits/sec  1400   76.8 KBytes
[  5]   7.00-8.00   sec  92.6 MBytes   777 Mbits/sec  1600   82.7 KBytes
[  5]   8.00-9.00   sec  91.1 MBytes   764 Mbits/sec  1500   70.8 KBytes
[  5]   9.00-10.00  sec  92.0 MBytes   771 Mbits/sec  1550   85.1 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   917 MBytes   769 Mbits/sec  15700             sender
[  5]   0.00-10.00  sec   916 MBytes   768 Mbits/sec                  receiver

iperf Done.
```
2024-12-05 00:18:20 +00:00
Thomas Eizinger
48bd0f9804 chore: bump client versions to 1.4.0 (#7092)
In order to release the new control protocol to users, we need to bump
the versions of the clients to 1.4.0. The portal has a version gate to
only select gateways with version >= 1.4.0 for clients >= 1.4.0. Thus,
bumping these versions can only happen once testing has completed and
the gateway has actually been released as 1.4.0.

Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
2024-12-04 19:48:51 +00:00
Thomas Eizinger
dd6b52b236 chore(rust): share edition key via workspace table (#7451) 2024-12-03 00:28:06 +00:00
Thomas Eizinger
932f6791fb fix(phoenix-channel): lazily create backoff timer (#7414)
Our `phoenix-channel` component is responsible for maintaining a
WebSocket connection to the portal. In case that connection fails, we
want to reconnect to it using an exponential backoff, eventually giving
up after a certain amount of time.

Unfortunately, the code we have today doesn't quite do that. An
`ExponentialBackoff` has a setting for the `max_elapsed_time`.
Regardless of how many and how often we retry something, we won't ever
wait longer than this amount of time. For the Relay, this is set to
15min. For other components its indefinite (Gateway, headless-client),
or very long (30 days for Android, 1 day for Apple).

The point in time from which this duration is counted is when the
`ExponentialBackoff` is **constructed** which translates to when we
**first** connected to the portal. As a result, our backoff would
immediately fail on the first error if it has been longer than
`max_elapsed_time` since we first connected. For most components, this
codepath is not relevant because the `max_elapsed_time` is so long. For
the Relay however, that is only 15 minutes so chances are, the Relay
would immediately fail (and get rebooted) on the first connection error
with the portal.

To fix this, we now lazily create the `ExponentialBackoff` on the first
error.

This bug has some interesting consequences: When a relay reboots, it
looses all its state, i.e. allocations, channel bindings, available
nonces etc, stamp-secret. Thus, all credentials and state that got
distributed to Clients and Gateways get invalidated, causing disconnects
from the Relay. We have observed these alerts in Sentry for a while and
couldn't explain them. Most likely, this is the root cause for those
because whilst a Relay disconnects, the portal also cannot detect its
presence and pro-actively inform Clients and Gateways to no longer use
this Relay.
2024-11-29 20:19:11 +00:00
Thomas Eizinger
2c26fc9c0e ci: lint Rust dependencies using cargo deny (#7390)
One of Rust's promises is "if it compiles, it works". However, there are
certain situations in which this isn't true. In particular, when using
dynamic typing patterns where trait objects are downcast to concrete
types, having two versions of the same dependency can silently break
things.

This happened in #7379 where I forgot to patch a certain Sentry
dependency. A similar problem exists with our `tracing-stackdriver`
dependency (see #7241).

Lastly, duplicate dependencies increase the compile-times of a project,
so we should aim for having as few duplicate versions of a particular
dependency as possible in our dependency graph.

This PR introduces `cargo deny`, a linter for Rust dependencies. In
addition to linting for duplicate dependencies, it also enforces that
all dependencies are compatible with an allow-list of licenses and it
warns when a dependency is referred to from multiple crates without
introducing a workspace dependency. Thanks to existing tooling
(https://github.com/mainmatter/cargo-autoinherit), transitioning all
dependencies to workspace dependencies was quite easy.

Resolves: #7241.
2024-11-22 00:17:28 +00:00
Thomas Eizinger
8c5a5fa690 chore(rust): correctly disable ANSI escapes globally (#7336)
I think I finally understood and correctly traced, where the use of ANSI
escape codes came from. It turns out, the `with_ansi` switch on
`tracing_subscriber::fmt::Layer` is what you want to toggle. From there,
it trickles down to the `Writer` which we can then test for in our
`Format`.

Resolves: #7284.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2024-11-14 05:00:53 +00:00
Thomas Eizinger
48ba2869a8 chore(rust): ban the use of .unwrap except in tests (#7319)
Using the clippy lint `unwrap_used`, we can automatically lint against
all uses of `.unwrap()` on `Result` and `Option`. This turns up quite a
few results actually. In most cases, they are invariants that can't
actually be hit. For these, we change them to `Option`. In other cases,
they can actually be hit. For example, if the user supplies an invalid
log-filter.

Activating this lint ensures the compiler will yell at us every time we
use `.unwrap` to double-check whether we do indeed want to panic here.

Resolves: #7292.
2024-11-13 03:59:22 +00:00
Thomas Eizinger
ad4eea29ff chore(rust): don't panic in fallible functions (#7298)
"Just let it crash" is terrible advice for software that is shipped to
end users. Where possible, we should use proper error handling and only
fail the current function / task that is active, e.g. drop a particular
packet instead of failing all of connlib. We more or less already do
that.

Activating the clippy lint `unwrap_in_result` surfaced a few more places
where we panic despite being in a function that is fallible already.
These cases can easily be converted to not panic and return an error
instead.
2024-11-11 23:55:23 +00:00
Thomas Eizinger
488c599d5b chore(telemetry): capture Firezone ID and account in user ctx (#7310)
Sentry has a feature called the "User context" which allows us to assign
events to individual users. This in turn will give us statistics in
Sentry, how many users are affected by a certain issue.

Unfortunately, Sentry's user context cannot be built-up step-by-step but
has to be set as a whole. To achieve this, we need to slightly refactor
`Telemetry` to not be `clone`d and instead passed around by mutable
reference.

Resolves: #7248.
Related: https://github.com/getsentry/sentry-rust/issues/706.
2024-11-11 19:50:14 +00:00
Jamil
1dda915376 ci: Publish new clients (#7291)
Fixes the roaming bug.
2024-11-08 22:58:06 +00:00
Thomas Eizinger
78ebad13ab chore(rust): log more errors as tracing::Values (#7208)
Logging these as structured values gives us a better stacktrace in
Sentry (assuming the errors themselves make proper use of defining an
error-chain).
2024-11-05 14:36:47 +00:00
Thomas Eizinger
5564e578fe fix(telemetry): flush sentry.io events in dedicated task (#7205)
`sentry`'s transport layer appears to be using blocking IO for flushing
events. Performing blocking IO within a future that is running on a
worker-thread of tokio causes this operation to hang and eventually
time-out after 5 seconds. As a result, many events - especially traces -
don't get flushed to sentry when an app is being shut down.

To fix this, we make `Telemetry::stop` an `async fn` and offload the
flushing to a task on tokio's thread-pool for blocking IO.
2024-11-01 15:52:09 +00:00
Thomas Eizinger
59412223cb chore: bump Android and Apple apps to next version (#7192)
We are in the process of releasing these so we need to bump their
version to the next one.
2024-10-31 14:24:33 +00:00
Thomas Eizinger
5cf105f073 chore(android): start telemetry together with connlib session (#7151)
As a first step for integration Sentry into the Android app, we launch
the Sentry Rust agent as soon as a `connlib` session starts up. At a
later point, we can also integrate Sentry into the Android app itself
using the Java / Kotlin SDK.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2024-10-24 20:03:06 +00:00
Thomas Eizinger
2ca91a3b1a chore(connlib): remove old mock feature (#7142)
This is so stale, it definitely needs to go in the bin.
2024-10-23 16:30:15 +00:00
Thomas Eizinger
857bbf5d98 chore(connlib): introduce custom logging format (#7024)
This PR introduces a custom logging format for all Rust-components. It
is more or less a copy of `tracing_subscriber::fmt::format::Compact`
with the main difference that span-names don't get logged.

Spans are super useful because they allow us to record contextual
values, like the current connection ID, for a certain scope. What is IMO
less useful about them is that in the default formatter configuration,
active spans cause a right-drift of the actual log message.

The actual log message is still what most accurately describes, what
`connlib` is currently doing. Spans only add contextual information that
the reader may use for further understand what is happening. This
optional nature of the utility of spans IMO means that they should come
_after_ the actual log message.

Resolves: #7014.
2024-10-14 18:09:38 +00:00
Thomas Eizinger
2d4818e007 refactor(connlib): rotate tunnel private key on reset (#6909)
With the new control protocol specified in #6461, the client will no
longer initiate new connections. Instead, the credentials are generated
deterministically by the portal based on the gateway's and the client's
public key. For as long as they use the same public key, they also have
the same in-memory state which makes creating connections idempotent.

What we didn't consider in the new design at first is that when clients
roam, they discard all connections but keep the same private key. As a
result, the portal would generate the same ICE credentials which means
the gateway thinks it can reuse the existing connection when new flows
get authorized. The client however discarded all connections (and
rotated its ports and maybe IPs), meaning the previous candidates sent
to the gateway are no longer valid and connectivity fails.

We fix this by also rotating the private keys upon reset. Rotating the
keys itself isn't enough, we also need to propagate the new public key
all the way "over" to the phoenix channel component which lives
separately from connlib's data plane.

To achieve this, we change `PhoenixChannel` to now start in the
"disconnected" state and require an explicit `connect` call. In
addition, the `LoginUrl` constructed by various components now acts
merely as a "prototype", which may require additional data to construct
a fully valid URL. In the case of client and gateway, this is the public
key of the `Node`. This additional parameter needs to be passed to
`PhoenixChannel` in the `connect` call, thus forming a type-safe
contract that ensures we never attempt to connect without providing a
public key.

For the relay, this doesn't apply.

Lastly, this allows us to tidy up the code a bit by:

a) generating the `Node`'s private key from the existing RNG
b) removing `ConnectArgs` which only had two members left

Related: #6461.
Related: #6732.
2024-10-07 22:28:51 +00:00
Thomas Eizinger
be250f1e00 refactor(connlib): repurpose connlib-shared as connlib-model (#6919)
The `connlib-shared` crate has become a bit of a dependency magnet
without a clear purpose. It hosts utilities like `get_user_agent`,
messages for the client and gateway to communicate with the portal and
domain types like `ResourceId`.

To create a better dependency structure in our workspace, we repurpose
`connlib-shared` as a `connlib-model` crate. Its purpose is to host
domain-specific model types that multiple crates may want to use. For
that purpose, we rename the `callbacks::ResourceDescription` type to
`ResourceView`, designating that this is a _view_ onto a resource as
seen by `connlib`. The message types which currently double up as
connlib-internal model thus become an implementation detail of
`firezone-tunnel` and shouldn't be used for anything else.

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-10-03 14:47:58 +00:00
Jamil
613127d298 ci: Bump all clients and gateway (#6923)
Main fix: idle connection timing. These have already been released.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
2024-10-03 07:12:52 -07:00
Gabi
3501d5b287 feat(clients): use hardware id for device verification (#6857)
We want to associate additional device information for the device
verification, these are all parameters that tries to uniquely identify
the hardware.

For that reason we read system information and send it as part of the
query params to the portal, that way the portal can store this when
device is verified and match against that later on.

These are the parameters according to each platform:

|Platform|Query Field|Field Meaning|
|-----|----|-----|
|MacOS|`device_serial`|Hardware's Serial|
|MacOS| `device_uuid`|Hardware's UUID|
|iOS|`identifier_for_vendor`| Identifier for vendor, resets only on
uninstall/install|
|Android|`firebase_installation_id`| Firebase installation ID, resets
only on uninstall/install|
|Windows|`device_serial`|Motherboard's Serial|
|Linux|`device_serial`|Motherboard's Serial|


Fixes #6837
2024-10-02 08:44:26 +00:00
Jamil
e7dddee78f ci: bump android apple dns match (#6833)
Bumps Android -> 1.3.4, Apple -> 1.3.5

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
2024-09-26 09:32:41 -07:00
Jamil
332a9fe352 ci: bump all clients to include fix for #6781 (#6820)
bump all clients to include #6781 fix

---------

Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
2024-09-25 19:27:50 +00:00
Thomas Eizinger
a9f515a453 chore(rust): use #[expect] instead of #[allow] (#6692)
The `expect` attribute is similar to `allow` in that it will silence a
particular lint. In addition to `allow` however, `expect` will fail as
soon as the lint is no longer emitted. This ensures we don't end up with
stale `allow` attributes in our codebase. Additionally, it provides a
way of adding a `reason` to document, why the lint is being suppressed.
2024-09-16 13:51:12 +00:00
Jamil
4c7daddf64 ci: Publish Apple/Android changelog entries (#6631)
These have been published.
2024-09-07 10:38:00 -07:00
Jamil
ae5613b223 ci: Update changelog for 1.3.1ish clients (#6612)
Bumps internet resource UI.
2024-09-06 00:07:52 +00:00
Jamil
c6b0b0a922 ci: Release 1.3.0 for Internet Resource (#6503)
This publishes the 1.3.0 clients and gateways so that Internet Resources
will work.

The feature is still disabled for the Stripe plans until we publish the
launch post. Select customers have the feature enabled.

Closes #2667
2024-08-30 01:21:34 -07:00
Jamil
c66f0c15c0 ci: Draft bump 1.3.0 clients (#6470)
- Internet resources
2024-08-29 23:33:02 -07:00
Jamil
c8eed59387 ci: Release 1.2.0 (#6395)
Releasing 1.2.0 to unblock portal deploy! Some of these have already
been published.
2024-08-22 00:18:27 +00:00
Thomas Eizinger
3b56664e02 test(rust): ensure deterministic proptests (#6319)
For quite a while now, we have been making extensive use of
property-based testing to ensure `connlib` works as intended. The idea
of proptests is that - given a certain seed - we deterministically
sample test inputs and assert properties on a given function.

If the test fails, `proptest` prints the seed which can then be added to
a regressions file to iterate on the test case and fix it. It is quite
obvious that non-determinism in how the test input gets generated is no
bueno and reduces the value we get out of these tests a fair bit.

The `HashMap` and `HashSet` data structures are known to be
non-deterministic in their iteration order. This causes non-determinism
during the input generation because we make use of a lot of maps and
sets to gradually build up the test input. We fix all uses of `HashMap`
and `HashSet` by replacing them with `BTreeMap` and `BTreeSet`.

To ensure this doesn't regress, we refactor `tunnel_test` to not make
use of proptest's macros and instead, we initialise and run the test
ourselves. This allows us to dump the sampled state and transitions into
a file per test run. In CI, we then run a 2nd iteration of all
regression tests and compare the sampled state and transitions with the
previous run. They must match byte-for-byte.

Finally, to discourage use of non-deterministic iteration, we ban the
use of the iteration functions on `HashMap` and `HashSet` across the
codebase. This doesn't catch iteration in a `for`-loop but it is better
than not linting against it at all.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-08-16 23:15:58 +00:00
Thomas Eizinger
d399e65246 build(deps): bump tokio-tungstenite to 0.23 (#5509)
With the upgrade to 0.23, `tokio-tungstenite` pulls in `rustls` 0.27
which supports multiple crypto providers. By default, this uses the
`aws-lc-crypto` provider. The previous default was `ring`.

This PR bumps the necessary versions and installs the `ring` crypto
provider at the beginning of each application, before connlib starts. We
try and do this as early as possible to make it obvious that it only
needs to happen once per process.

Resolves: #5380.
2024-08-15 06:02:17 +00:00
Thomas Eizinger
7c70850217 feat(connlib): allow glob patterns for matching domain names (#5901)
Currently, `connlib` can only handle "simple" DNS wildcards where `*`
matches any number of subdomains, including zero and `?` matches a
single subdomain.

With this PR, we expand `connlib'`s capabilities to allow for a much
more complex matching of domains that more closely resembles glob
patterns:

- `**` matches any number of subdomains. This supersedes the previous
`*` operator.
- `*` matches a single subdomain. This supersedes the previous `?`
operator.
- `?` matches a single character. This wasn't possible before.
- Additionally, any of these can be combined. Previously, only `*` or
`?` was allowed and they were only accepted at the front of the domain
name pattern.

Resolves: #5056.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2024-08-15 01:30:53 +00:00
Jamil
296ca4ad4d ci: Bump Clients and Gateways to fix NAT / allocation issues (#6287)
Bump all Clients and Gateways due to #6265 being fixed.

---------

Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
2024-08-13 21:58:12 +00:00
Thomas Eizinger
7642f37d56 refactor: thin out connlib-shared (#6256)
Most of `connlib-shared` exists only for historical reasons. The
`Tunnel` has since been decoupled from the `Callbacks` and most error
variants on `ConnlibError` are not actually used.

This allows us to move a few things around and trim down `ConnlibError`
to just the variants that actually cause a call to `on_disconnect`.

Moving everything related to `proptest`s to `firezone-tunnel` also
requires us to delete the specialisation for printing IDs in a shorter
format during the tests. That is a bit unfortunate but was always kind
of a hack. I'd rather make progress on getting rid of `connlib-shared`
though and perhaps re-introduce that feature once the messages are fully
moved into the tunnel.

Related: #4470.
2024-08-12 22:57:06 +00:00
Jamil
e7f8a4e4bf ci: bump apple / android versions (#6251)
These were approved and published so the versions need bumping.
2024-08-10 13:04:26 -07:00
Thomas Eizinger
bed625a312 chore(rust): make logging more ergonomic (#6237)
Setting up a logger is something that pretty much every entrypoint needs
to do, be it a test, a shared library embedded in another app or a
standalone application. Thus, it makes sense to introduce a dedicated
crate that allows us to bundle all the things together, how we want to
do logging.

This allows us to introduce convenience functions like
`firezone_logging::test` which allow you to construct a logger for a
test as a one-liner.

Crucially though, introducing `firezone-logging` gives us a place to
store a default log directive that silences very noisy crates. When
looking into a problem, it is common to start by simply setting the
log-filter to `debug`. Without further action, this floods the output
with logs from crates like `netlink_proto` on Linux. It is very unlikely
that those are the logs that you want to see. Without a preset filter,
the only alternative here is to explicitly turn off the log filter for
`netlink_proto` by typing something like
`RUST_LOG=netlink_proto=off,debug`. Especially when debugging issues
with customers, this is annoying.

Log filters can be overridden, i.e. a 2nd filter that matches the exact
same scope overrides a previous one. Thus, with this design it is still
possible to activate certain logs at runtime, even if they have silenced
by default.

I'd expect `firezone-logging` to attract more functionality in the
future. For example, we want to support re-loading of log-filters on
other platforms. Additionally, where logs get stored could also be
defined in this crate.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-08-10 05:17:03 +00:00
Jamil
a6ba9868dd ci: Revert bumps to 1.2 (#6227)
We need these at 1.1 until ready to release.
2024-08-08 18:34:39 -07:00
Jamil
096ddfe7c5 ci: bump gui/headless to 1.1.10 (#6221)
To publish the mpsc channel fix.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-08-08 16:20:20 +00:00
Thomas Eizinger
128d0eb407 feat(connlib): transparently forward non-resources DNS queries (#6181)
Currently, `connlib` depends on `hickory-resolver` to perform DNS
queries for non-resources. This is unnecessary. Instead of buffering the
original UDP DNS query, consulting hickory to resolve the name and
mapping the response back, we can simply take the UDP payload and send
it via our protected socket directly to the original upstream DNS
server.

This ensures `connlib` is as transparent as possible for DNS queries for
non-resources. Additionally, it removes a lot of error handling and
other cruft that we currently have to perform because we are using
hickory. For example, hickory will automatically retry a DNS query after
a certain timeout. However, the OS / client talking to `connlib` will
also retry after a certain timeout because it is making DNS queries over
an unreliable transport (UDP). It is thus unnecessary for us to do that
internally.

To correctly test this change, our test-suite needed some refactoring.
Specifically, DNS servers are now modelled as dedicated `Host`s that can
receive (UDP) traffic.

Lastly, we can remove our dependency on `hickory-proto` and
`hickory-resolver` everywhere and only use `domain` for parsing DNS
messages.

Resolves: #6141.
Related: #6033.
Related: #4800. (Impossible to happen with this design)
2024-08-07 08:54:49 +00:00
Gabi
a2d849087a feat(android): add setDisabledResources FFI (#6166)
Builds on top of  #6164

Part of the effor towards
https://github.com/firezone/firezone/issues/6074

Prepares connlib to call `setDisableResource` from android.

Furthermore, we add a `disablable` parameter for resources which default
to false for now, in the future the portal will set it for the internet
resource, and further in the future it may be used for other resources.

The `disablable` parameter only affect UI.
2024-08-05 22:43:27 +00:00
Jamil
51e0b61c9c chore: Bump all clients and gateway versions (#6149)
Includes major fixes https://github.com/firezone/firezone/pull/6143 and
https://github.com/firezone/firezone/pull/6117
2024-08-02 01:12:49 -07:00
Thomas Eizinger
fc4b8c7b46 refactor: rename reconnect to reset (#6057)
Connection roaming within `connlib` has changed a fair-bit since we
introduced the `reconnect` function. The new implementation is basically
a hard-reset of all state within `connlib`. Renaming this function
across all layers makes this more obvious.

Resolves: #6038.
2024-07-28 07:41:45 +00:00
Thomas Eizinger
59014a9622 refactor(connlib): encapsulate UDP and TCP sockets (#6028)
As part of debugging full-route tunneling on Windows, we discovered that
we need to always explicitly choose the interface through which we want
to send packets, otherwise Windows may cause a routing loop by routing
our packets back into the TUN device.

We already have a `SocketFactory` abstraction in `connlib` that is used
by each platform to customise the setup of each socket to prevent
routing loops.

So far, this abstraction directly returns tokio sockets which don't
allow us to intercept the actual sending of packets. For some of our
traffic, i.e. the UDP packets exchanged with relays, we don't specify a
source address. To make full-route work on Windows, we need to intercept
these packets and explicitly set the source address.

To achieve that, we introduce dedicated `TcpSocket` and `UdpSocket`
structs within `socket-factory`. With this in place, we will be able to
add Windows-conditional code to looks up and sets the source address of
outgoing UDP packets. For TCP sockets, the lookup will happen prior to
connecting to the address and used to bind to the correct interface.

Related: #2667.
Related: #5955.
2024-07-25 04:28:46 +00:00