Upon receiving packets for a resource that we are not connected to,
connlib emits a "connection intent" to the portal. In case there are
gateways online for this resource, the portal sends us a "connection
details" event.
Currently, this is handled in a `create_or_reuse_connection` function.
What the current name doesn't capture is that this message is
essentially an update to connlib's "routing table", i.e. which gateway
in which site to use for the given resource. If we move this concern to
the fore-front of the design, whether or not we will make a new
connection or reuse an existing one kind of becomes secondary.
Re-framing the way we handle this messages makes it more natural to
design it in an asynchronous way, i.e. set its return type to `()` and
schedule events to be emitted. The translation of
`Request::NewConnection` is more or less 1-to-1 with the introduction of
`ClientEvent::RequestConnection`. The translation of
`Request::ReuseConnection` turns into the also renamed
`ClientEvent::RequestAccess`. This captures better what we need to do:
When we have an existing connection, we need to request access for it,
otherwise the gateway will drop all packets we send to this resource.
The motivation for this refactoring is #6335. Buffering the initial
packets while establishing a new connection opens up a race condition
where we may send `RequestAccess` before the gateway has processed
`RequestConnection`. In order to avoid this, we need to be able to
locally buffer our `RequestAccess` messages and wait until the gateway
has confirmed our connection.
In the `tunnel_test` test suite, we send ICMP requests with arbitrary
sequence numbers and identifiers. Due to the NAT implementation of the
gateway, the sequence number and identifier chosen by the client are not
necessarily the same as the ones sent to the resource. Thus, it is
impossible to correlate the ICMP packets sent by the client with the
ones arriving at the gateway.
Currently, our test suite thus relies on the ordering of packets to
match them up and assert properties on them, like whether they target
the correct resource. As soon as we want to send multiple packets
concurrently, this order is not necessarily stable.
ICMP echo requests can contain an arbitrary payload. We utilise this
payload to embed a random u64 that acts as the unique identifier of an
ICMP request. This allows us to correlate the packets arriving at the
gateway with the ones sent by the client, making the test suite more
robust and ready for handling concurrent ICMP packets.
`tunnel_test` uses the `FluxCapacitor` component to rapidly advance time
within a test-run. This component is also used by the logger in the
tests to print _relative_ timestamps which makes it easier to compare
different test runs.
Currently, this component is initialised in the `ReferenceState`
although it isn't really part of the test-input itself. It is only
needed during the execution of a transition.
When proptest finds a failing test run, it will reuse the same
`ReferenceState` and attempt to shrink it to find the minimally-failing
input. It does this by calling `.clone`. Because `FluxCapacitor` uses
interior mutability to advance time, this means the time isn't actually
reset to `0` whilst proptest is shrinking the input.
This has / had on impact on the outcome of the test, it only makes the
logs harder and more confusing to read.
We fix this by removing the `StateMachineTest` trait implementation of
`TunnelTest` and passing an additional parameter to `init_state`. This
trait was originally needed because we were using the "no-boilerplate"
version of `proptest-state-machine`. We have recently migrated away from
that to get more fine-grained control over logging and test execution,
meaning we no longer need this trait and can simply call these functions
ourselves.
Bumps [swift-bridge-build](https://github.com/chinedufn/swift-bridge)
from 0.1.55 to 0.1.57.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/chinedufn/swift-bridge/releases">swift-bridge-build's
releases</a>.</em></p>
<blockquote>
<h2>0.1.57</h2>
<ul>
<li>
<p>Support Failable initializers. <a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/276">#276</a>
(thanks <a
href="https://github.com/niwakadev"><code>@niwakadev</code></a>)</p>
<pre lang="rust"><code>// Rust
<p>#[swift_bridge::bridge]
mod ffi {
extern "Rust" {
#[swift_bridge(Equatable)]
type FailableInitType;</p>
<pre><code> #[swift_bridge(init)]
fn new() -&gt; Option&lt;FailableInitType&gt;;
}
</code></pre>
<p>}
</code></pre></p>
<pre lang="swift"><code>// Swift
let failableInitType = FailableInitType()
if failableInitType == nil {
// ...
} else {
// ...
}
</code></pre>
</li>
<li>
<p>Support Throwing initializers <a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/287">#287</a>
(thanks <a
href="https://github.com/niwakadev"><code>@niwakadev</code></a>)</p>
<pre lang="rust"><code>// Rust
<p>#[swift_bridge::bridge]
mod ffi {
enum ResultTransparentEnum {
NamedField { data: i32 },
UnnamedFields(u8, String),
NoFields,
}
extern "Rust" {
type ThrowingInitializer;
#[swift_bridge(init)]
fn new(succeed: bool) -> Result<ThrowingInitializer,
ResultTransparentEnum>;
fn val(&self) -> i32;
}
}
</code></pre></p>
<pre lang="swift"><code>// Swift
do {
</code></pre>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1b547a5f0c"><code>1b547a5</code></a>
Support throwing initializers (<a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/287">#287</a>)</li>
<li><a
href="495611ba39"><code>495611b</code></a>
Support failable initializers (<a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/276">#276</a>)</li>
<li><a
href="d2c09e2e60"><code>d2c09e2</code></a>
0.1.56</li>
<li><a
href="d3d2da4e01"><code>d3d2da4</code></a>
Upgrade macOS CI runners (<a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/285">#285</a>)</li>
<li><a
href="37ac2c2627"><code>37ac2c2</code></a>
Apply input module visibility to output module (<a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/284">#284</a>)</li>
<li><a
href="34ce1ffb6b"><code>34ce1ff</code></a>
Suppress dead_code warning when returning <code>-> Result\<*,
TransparentType></code> (<a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/278">#278</a>)</li>
<li><a
href="717fcef70d"><code>717fcef</code></a>
Add parse-bridges CLI subcommand (<a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/274">#274</a>)</li>
<li><a
href="ef01d21001"><code>ef01d21</code></a>
0.1.55</li>
<li>See full diff in <a
href="https://github.com/chinedufn/swift-bridge/compare/0.1.55...0.1.57">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The only reason we use an `AtomicU32` here is because the closure is
only an `Fn` and not an `FnMut` closure which prevents us from using a
regular `u32`.
Right now, whenever a connection is established we update the site
status.
In order to do that, we call `on_update_resources`, when
`on_update_resources` is called this in turn calls
`set_disabled_resources`, since we apply from the application side the
"disabled" given the current resources.
`set_disabled_resources` currently, always call `on_update_routes`,
which causes connectivity issues on Android and MacOS, since the packets
aren't correctly routed when the routes are changed.
To fix this we make `set_disabled_resources` only emit the routes when
they have actually changed.
Fixes: #6387.
---------
Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
In order to accurately model how `connlib` tracks resources, we need to
store the list of all resources separately from the CIDR resources. That
is because CIDR resources can overlap or target an identical CIDR range.
In that case, `connlib`s current behaviour is "last-wins".
Whenever we reconnect to the portal, we re-add our list of resources in
the order they are given to us. To model this correctly, we store the
list of resources in the tests in the order we receive them throughout
the previous session. This may not necessarily be the order in which the
portal does it but that is irrelevant. What is important is that it is
deterministic.
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Previously, failing to bind to any interfaces was a hard-error. In
reality and in `connlib`'s current state, this is quite unlikely because
machines will at least have a loopback interface that we will bind to.
However, with #6382 in the pipeline, it may be more likely that we
actually end up with no functional UDP sockets. Furthermore, we are
considering to extend those connectivity checks in the future.
Thus, it is important that the case of "no available UDP sockets" is
gracefully handled.
Instead of failing with a hard-error, we now suspend `connlib's` network
stack. The connectivity to the portal is unaffected by this and we will
still also receive commands from the client application like `reset`.
When we receive a `reset`, we attempt to rebind the sockets and thus
retry connectivity.
Because we are suspending the entire eventloop, this won't send any
messages or trigger any timers whatsoever. For example, if we
hypothetically started up without network interfaces, this is now the
log output:
```
2024-08-22T01:50:42.170101Z INFO firezone_headless_client: arch="x86_64" git_version="headless-client-1.2.0-2-gc8eed5938-modified"
2024-08-22T01:50:42.178777Z DEBUG phoenix_channel: Connecting to portal host=api.firez.one user_agent=NixOS/24.5.0 connlib/1.2.1 (x86_64; 6.8.12)
2024-08-22T01:50:42.178978Z DEBUG firezone_headless_client::dns_control::linux: Deactivating DNS control...
2024-08-22T01:50:42.180691Z ERROR firezone_tunnel::sockets: No available UDP sockets
2024-08-22T01:50:42.197098Z INFO firezone_tunnel::device_channel: Initializing TUN device name=tun-firezone
2024-08-22T01:50:42.197165Z DEBUG firezone_tunnel::client: Unable to update DNS servesr without interface configuration
2024-08-22T01:50:42.453988Z DEBUG tungstenite::handshake::client: Client handshake done.
2024-08-22T01:50:42.454161Z INFO phoenix_channel: Connected to portal host=api.firez.one
2024-08-22T01:50:42.676825Z DEBUG firezone_tunnel::client: Updating DNS servers mapping={fd00:2021:1111:8000:100:100:111:0 <> [2606:4700:4700::1111]:53, 100.100.111.1 <> 1.1.1.1:53}
2024-08-22T01:50:42.677084Z INFO firezone_tunnel::client: Activating resource name=IPerf3 address=10.0.32.101/32 sites=AWS Dev (Gateways track `main`)
2024-08-22T01:50:42.677173Z INFO firezone_tunnel::client: Activating resource name=*.slack.com address=**.slack.com sites=Vultr Stable (Latest Release Gateways)
2024-08-22T01:50:42.677223Z INFO firezone_tunnel::client: Activating resource name=*.slack-edge.com address=**.slack-edge.com sites=Vultr Stable (Latest Release Gateways)
2024-08-22T01:50:42.677283Z INFO firezone_tunnel::client: Activating resource name=*.spotify.com address=**.spotify.com sites=AWS Dev (Gateways track `main`)
2024-08-22T01:50:42.677345Z INFO firezone_tunnel::client: Activating resource name=*.github.com address=**.github.com sites=AWS Dev (Gateways track `main`)
2024-08-22T01:50:42.677418Z INFO firezone_tunnel::client: Activating resource name=whatismyip.com address=**.whatismyip.com sites=AWS Dev (Gateways track `main`)
2024-08-22T01:50:42.677489Z INFO firezone_tunnel::client: Activating resource name=ifconfig.net address=ifconfig.net sites=Vultr Stable (Latest Release Gateways)
2024-08-22T01:50:42.677538Z INFO firezone_tunnel::client: Activating resource name=*.google.com address=**.google.com sites=AWS Dev (Gateways track `main`)
2024-08-22T01:50:42.677632Z INFO firezone_tunnel::client: Activating resource name=*.fastmail.com address=**.fastmail.com sites=AWS Dev (Gateways track `main`)
2024-08-22T01:50:42.677682Z INFO firezone_tunnel::client: Activating resource name=speed.cloudflare.com address=speed.cloudflare.com sites=Vultr Stable (Latest Release Gateways)
2024-08-22T01:50:42.678212Z INFO snownet::node: Added new TURN server rid=b6fc4d73-9c8e-44df-a941-da7d2134cb70 address=Dual { v4: 34.40.133.55:3478, v6: [2600:1900:40b0:1504:0:97::]:3478 }
2024-08-22T01:50:42.678322Z INFO snownet::node: Added new TURN server rid=c818b11a-d0cc-4f2a-bb88-473d8298a885 address=Dual { v4: 34.81.229.132:3478, v6: [2600:1900:4030:b0d9:0:9b::]:3478 }
2024-08-22T01:50:42.678365Z INFO connlib_client_shared::eventloop: Firezone Started!
```
After this, nothing will happen other than receiving messages via from
the portal or the client app.
Related: #6382.
Related: #6385.
Previously, `connlib` would only send the currently connected gateways
to the portal upon a new connection intent. With our introduced idle
connection timeout, this could result in the portal choosing a different
gateway upon reconnecting to the resource.
To fix this, we introduce an LRU cache with at most 100 entries.
Iteration over the LRU cache happens in MRU order, meaning a recently
connected gateway will be at the front of the list.
We assume that this list is processed in order and thus still prefer
gateways that we are still connected to.
Related: #6347.
[Refs](https://github.com/firezone/firezone/pull/6299#discussion_r1724108733)
The problem right now, after #6325 we send the internet resource up to
the clients. The clients expect a certain format for the resources and
panic if it isn't followed.
Particularly, in the case of no `address` or no `name`.
To fix this, we add a name and an address for the internet resource when
it is converted to the callback type.
Setting the `name` at that point actually makes a lot of sense since it
homogenizes the name across all platforms. But the internet resource
having an address makes no sense.
So in a next PR, when I do the last UI changes I plan to make `address`
optional for all resources on the clients and specialize the display of
the internet resource.
For now I wanted to get this in so that we don't ever panic on the
internet resource existing. (This was tested on all platforms and it
works)
This PR handles the internet resource both on the gateway and client.
In the gateway, it's handled like any other resource save for the
address which is both ipv6 and ipv4.
In the client it's handled mostly like any other resource but with some
exceptions for DNS forwarded packets, because we want to work as a CIDR
resource for the matter of forwarding packets to the gateway.
Fixes: #6313.
---------
Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Currently, `snownet` allocates a 65KB buffer per connection as a
scratch-space for encrypting packets. 65KB is the theoretical limit of a
UDP packet. In practice, the largest UDP packets we send are 1336 bytes
due to the MTU of 1280 set on our TUN interface and various overheads
for WG, TURN channels and NAT46.
Thus, it is unnecessary to allocate such a large buffer per connection.
For gateways with many connections, reducing these buffers results in a
smaller memory footprint.
Additionally, any UDP packets larger than this buffer could be an
indicator of a DoS attack and we can thus drop them without processing.
A legitimate client / gateway will never send a packet larger than that.
In `connlib`, when a CIDR resource gets disabled, we remove it from the
`IpNetworkTable` that does the routing for the packets. This ensures
that when we check for the `longest_match` of a packet, disabled
resources are not considered.
In
https://github.com/firezone/firezone/actions/runs/10449400486/job/28931681264?pr=6339,
CI found a bug where the reference implementation in the tests diverged
from this behaviour because it implements this behaviour slightly
differently. To ensure we don't match against a disabled resource, we
match all resources, filter out the disabled ones and then pick the one
with the highest netmask which should be the most specific one.
For quite a while now, we have been making extensive use of
property-based testing to ensure `connlib` works as intended. The idea
of proptests is that - given a certain seed - we deterministically
sample test inputs and assert properties on a given function.
If the test fails, `proptest` prints the seed which can then be added to
a regressions file to iterate on the test case and fix it. It is quite
obvious that non-determinism in how the test input gets generated is no
bueno and reduces the value we get out of these tests a fair bit.
The `HashMap` and `HashSet` data structures are known to be
non-deterministic in their iteration order. This causes non-determinism
during the input generation because we make use of a lot of maps and
sets to gradually build up the test input. We fix all uses of `HashMap`
and `HashSet` by replacing them with `BTreeMap` and `BTreeSet`.
To ensure this doesn't regress, we refactor `tunnel_test` to not make
use of proptest's macros and instead, we initialise and run the test
ourselves. This allows us to dump the sampled state and transitions into
a file per test run. In CI, we then run a 2nd iteration of all
regression tests and compare the sampled state and transitions with the
previous run. They must match byte-for-byte.
Finally, to discourage use of non-deterministic iteration, we ban the
use of the iteration functions on `HashMap` and `HashSet` across the
codebase. This doesn't catch iteration in a `for`-loop but it is better
than not linting against it at all.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
This is a test-failure detected in
https://github.com/firezone/firezone/actions/runs/10426492110/job/28879531621.
In the relay, we have fast-path lookup maps to for incoming traffic from
peers. This improves throughput as any incoming packet only needs to
look-up a single routing entry. Unfortunately, this creates duplication
in how the data must be stored.
In #6276, we correctly identified that channels must be re-bound on the
relay when a client sends `CHANNEL_BIND` message whilst the channel is
cooling down. What we failed to identify (and what as now caught by the
tests) is that we also need to re-insert the entry into the fast-path
lookup map to actually allow data from flowing through the channel.
Last PR for #6074
This adds Enable/Disable for tauri clients.
In windows, edge seems to hold on to the sockets for a bit too long
after disabling the resources. This will be solved for the internet
resource probably by modifying the firewall, in another PR.
When refactoring the expected behaviour for DNS queries, we overlooked a
case where we expected a DNS query to get routed through the tunnel
although it was a query for a DNS resource.
Fixes: #6310.
With the upgrade to 0.23, `tokio-tungstenite` pulls in `rustls` 0.27
which supports multiple crypto providers. By default, this uses the
`aws-lc-crypto` provider. The previous default was `ring`.
This PR bumps the necessary versions and installs the `ring` crypto
provider at the beginning of each application, before connlib starts. We
try and do this as early as possible to make it obvious that it only
needs to happen once per process.
Resolves: #5380.
For most cases, TURN identifies clients by their 3-tuple. This can make
it hard to correlate logs in case the client roams or its NAT session
gets reset, both of which cause the port to change.
To make problem analysis easier, we include the RFC-recommended
`SOFTWARE` attribute in all STUN requests created by `snownet`.
Typically, this includes a textual description of who sent the request
and a version number. See [0] for details. We don't track the version of
`snownet` individually and passing the actual client-version across this
many layers is deemed too complicated for now.
What we can add though is a parameter that includes a sticky session ID.
This session ID is computed based on the `Node`'s public key, meaning it
doesn't change until the user logs-out and in again.
On the relay, we now look for a `SOFTWARE` attribute in all STUN
requests and optionally include it in all spans if it is present.
[0]: https://datatracker.ietf.org/doc/html/rfc5389#section-15.10
Currently, `connlib` can only handle "simple" DNS wildcards where `*`
matches any number of subdomains, including zero and `?` matches a
single subdomain.
With this PR, we expand `connlib'`s capabilities to allow for a much
more complex matching of domains that more closely resembles glob
patterns:
- `**` matches any number of subdomains. This supersedes the previous
`*` operator.
- `*` matches a single subdomain. This supersedes the previous `?`
operator.
- `?` matches a single character. This wasn't possible before.
- Additionally, any of these can be combined. Previously, only `*` or
`?` was allowed and they were only accepted at the front of the domain
name pattern.
Resolves: #5056.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
In #6259, we added a regression test for concurrent DNS queries. A case
that we overlooked is that when DNS servers are defined as CIDR
resources, the queries themselves will act as connection intents and
thus dropped until we have a connection.
In the tests, the connection is only established as part of `advance`.
Thus, if we get multiple concurrent DNS queries to the same server that
is defined as a CIDR resource, we need to drop all future queries.
Fixes: #6283.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
For efficiency reasons, TURN's data channels don't have any
authentication or integrity metadata. Instead, the operate using a short
2-byte channel number to identify the target peer of the data.
To avoid abuse, channel bindings are at most valid for 10 minutes before
they need to be refreshed. In case they expire, there is a 5 minute
cooldown period, before the same channel number can be bound to a
different peer and before the same peer can be bound to a different
channel.
We had a similar issue in the past (#5613) where channels got rebound
early. Whilst that was fixed and is no longer happening, a case that we
didn't consider is what happens if we want to bind a channel to a peer
that still has a channel bound but is currently cooling down (i.e. in
the 5 minute period after its expiry).
In that case, `snownet` would wrongly assume that there is no channel to
this peer and try to bind a new one. That would get rejected by the
relay with a bad request.
To fix this, we simply need to check whether we still have a channel to
this peer and if yes, return the same channel number. On the relay, we
need to ensure that we consider a channel as `bound` again when it is
being refreshed.
We ensure that this doesn't regress in two ways:
- We add a unit-test for the `ChannelBindings` struct
- We modify the `Idle` transition to idle for 6 instead of 5 minutes.
This ensures that a combination of 2 idle transitions puts the channel
bindings into the 10-15 minute time window where rebinding the peer to a
different channel fails.
Related: #6265.
Most of `connlib-shared` exists only for historical reasons. The
`Tunnel` has since been decoupled from the `Callbacks` and most error
variants on `ConnlibError` are not actually used.
This allows us to move a few things around and trim down `ConnlibError`
to just the variants that actually cause a call to `on_disconnect`.
Moving everything related to `proptest`s to `firezone-tunnel` also
requires us to delete the specialisation for printing IDs in a shorter
format during the tests. That is a bit unfortunate but was always kind
of a hack. I'd rather make progress on getting rid of `connlib-shared`
though and perhaps re-introduce that feature once the messages are fully
moved into the tunnel.
Related: #4470.
When forwarding DNS queries, we need to temporarily store information
about them to correctly identify the response and mangle them back. This
algorithm needs to support concurrent queries to the same and different
DNS servers.
A critical bug was fixed in #6233 where we had wrongly assumed that DNS
query IDs are globally unique.
To ensure this behaviour doesn't regress, this PR modifies our test
suite to send up to 5 concurrent DNS queries into connlib. Currently,
the nature of `tunnel_test` is such that each `Transition` is a single
action that gets fully completed before we execute the next one. Thus,
the concurrency of DNS queries would never get tested because connlib's
internal data structure would at most ever contain 1 DNS query.
By sampling a set of up to 5 unique combinations of DNS server and query
ID, we make sure that concurrent DNS queries work.
Currently, the logging for which resources get activated and
de-activated is spread between the `dns` and `client` module. It also
doesn't include the sites that the resource is defined in.
The name of a resource alone is not enough to unique identify it. To fix
both of these papercuts, we move the logging to the `client` module and
include the sites in the log message.
The log messages now read like this:
```
2024-08-12T02:26:01.477844Z INFO firezone_tunnel::client: Activating resource name=IPerf3 address=10.0.32.101/32 sites=AWS Dev (Gateways track `main`)
2024-08-12T02:26:01.477904Z INFO firezone_tunnel::client: Activating resource name=*.slack.com address=*.slack.com sites=Vultr Stable (Latest Release Gateways)
2024-08-12T02:26:01.477942Z INFO firezone_tunnel::client: Activating resource name=*.slack-edge.com address=*.slack-edge.com sites=Vultr Stable (Latest Release Gateways)
2024-08-12T02:26:01.477984Z INFO firezone_tunnel::client: Activating resource name=*.spotify.com address=*.spotify.com sites=AWS Dev (Gateways track `main`)
```
Setting up a logger is something that pretty much every entrypoint needs
to do, be it a test, a shared library embedded in another app or a
standalone application. Thus, it makes sense to introduce a dedicated
crate that allows us to bundle all the things together, how we want to
do logging.
This allows us to introduce convenience functions like
`firezone_logging::test` which allow you to construct a logger for a
test as a one-liner.
Crucially though, introducing `firezone-logging` gives us a place to
store a default log directive that silences very noisy crates. When
looking into a problem, it is common to start by simply setting the
log-filter to `debug`. Without further action, this floods the output
with logs from crates like `netlink_proto` on Linux. It is very unlikely
that those are the logs that you want to see. Without a preset filter,
the only alternative here is to explicitly turn off the log filter for
`netlink_proto` by typing something like
`RUST_LOG=netlink_proto=off,debug`. Especially when debugging issues
with customers, this is annoying.
Log filters can be overridden, i.e. a 2nd filter that matches the exact
same scope overrides a previous one. Thus, with this design it is still
possible to activate certain logs at runtime, even if they have silenced
by default.
I'd expect `firezone-logging` to attract more functionality in the
future. For example, we want to support re-loading of log-filters on
other platforms. Additionally, where logs get stored could also be
defined in this crate.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Work for #6074 equivalent to #6166 for MacOS
MacOs view:
<img width="547" alt="image"
src="https://github.com/user-attachments/assets/f465183e-247b-49b5-a916-3ecc5f0a02f4">
iOS(ipad) view:

Other than implementing the resource disabling, this PR also refactor
the IPC between the network extension and the app so that it's some form
of structured IPC instead of relying on it being deserializable to
string to match the message.
One big difference with Android is that we don't introduce the concept
of a `ResourceView` for swift, the main reason for this is that on iOS
the resources are bound to the view instead of just being a parameter
for creating the view. So if we modify the `disabled` property it'd
update the UI unnecessarily, also it'd update the `Store` value for the
resource and then we need to copy that over again to the view. Making it
easier to go out of sync.
When forwarding DNS queries, we need to remember the original source
socket in order to send the response back. Previously, this mapping was
indexed by the DNS query ID. As it turns out, at least Windows doesn't
have a global DNS query ID counter and may reuse them across different
DNS servers. If that happens and two of these queries overlap, then we
match the wrong responses together.
In the best case, this produces bad DNS results on the client. In the
worst case, those queries were for DNS servers with different IP
versions in which case we triggered a panic in connlib further down the
stack where we created the IP packet for the response.
To fix this, we first and foremost remove the explicit `panic!` from the
`make::` functions in `ip-packet`. Originally, these functions were only
used in tests but we started to use them in production code too and
unfortunately forgot about this panic. By introducing a `Result`, all
call-sites are made aware that this can fail.
Second, we fix the actual indexing into the data structure for forwarded
DNS queries to also include the DNS server's socket. This ensures we
don't treat the DNS query IDs as globally unique.
Third, we replace the panicking path in
`try_handle_forwarded_dns_response` with a log statement, meaning if the
above assumption turns out wrong for some reason, we still don't panic
and simply don't handle the packet.
I noticed we sometimes have a flaky integration test with an ICE timeout
in its logs. For example:
https://github.com/firezone/firezone/actions/runs/10278933741/job/28443578376
Analyzing this one more closely turned out to be caused by a race
condition between client and gateway, when they exchange their ICE
candidates.
We send ICE candidates in batches but because they are serialized to
strings early, their ordering actually depends on the so-called
"foundation" of the ICE candidates. that one is simply a hash of several
components. As a result, the ordering of these candidates can vary
between test runs.
We should try ICE candidates in order of their reverse-priority (i.e.
best first). By introducing a helper-collection, we can enforce this
ordering before sending ICE candidates across.