The `expect` attribute is similar to `allow` in that it will silence a
particular lint. In addition to `allow` however, `expect` will fail as
soon as the lint is no longer emitted. This ensures we don't end up with
stale `allow` attributes in our codebase. Additionally, it provides a
way of adding a `reason` to document, why the lint is being suppressed.
In #6592, we started tracking our connected gateways to correctly model,
which packets get dropped as part of establishing new connections. We
forgot to clear this when connections are being reset, causing some test
failures.
As the final step in removing `pnet_packet`, we need to introduce `-Mut`
equivalent slices for UDP, TCP and ICMP packets. As a starting point,
introducing `UpdHeaderSliceMut` and `TcpHeaderSliceMut` is fairly
trivial. The ICMP variants are a bit trickier because those are
different for IPv4 and IPv6. Additionally, ICMP for IPv4 is quite
complex because it can have a variable header length. Additionally. for
both variants, the values in byte range 5-8 are semantically different
depending on the ICMP code.
This requires us to design an API that balances ergonomics and
correctness. Technically, an ICMP identifier and sequence can only be
set if the ICMP code is "echo request" or "echo reply". However, adding
an additional parsing step to guarantee this in the type system is quite
verbose.
The trade-off implemented in this PR allows to us to directly write to
the byte 5-8 using the `set_identifier` and `set_sequence` functions. To
catch errors early, this functions have debug-assertions built in that
ensure that the packet is indeed an ICMP echo packet.
Resolves: #6366.
Currently, we have two structs for representing IP packets: `IpPacket`
and `MutableIpPacket`. As the name suggests, they mostly differ in
mutability. This design was originally inspired by the `pnet_packet`
crate which we based our `IpPacket` on. With subsequent iterations, we
added more and more functionality onto our `IpPacket`, like NAT64 &
NAT46 translation. As a result of that, the `MutableIpPacket` is no
longer directly based on `pnet_packet` but instead just keeps an
internal buffer.
This duplication can be resolved by merging the two structs into a
single `IpPacket`. We do this by first replacing all usages of
`IpPacket` with `MutableIpPacket`, deleting `IpPacket` and renaming
`MutableIpPacket` to `IpPacket`. The final design now has different
`self`-receivers: Some functions take `&self`, some `&mut self` and some
consume the packet using `self`.
This results in a more ergonomic usage of `IpPacket` across the codebase
and deletes a fair bit of code. It also takes us one step closer towards
using `etherparse` for all our IP packet interaction-needs. Lastly, I am
currently exploring a performance-optimisation idea that stack-allocates
all IP packets and for that, the current split between `IpPacket` and
`MutableIpPacket` does not really work.
Related: #6366.
The lifetime of the returned packet is actually already `'static`,
meaning we don't need to call `to_owned`.
Related: #6366.
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: firezone <firezone@firezones-MacBook-Air.local>
Self-hosted users often forget to deploy relays. Without relays,
`snownet` cannot establish any connections because we never figure out
our server-reflexive address and local host address. Even if relays are
configured, if STUN / TURN is blocked, we may end up with no relays.
In that case, any newly created connection will very likely fail unless
new TURN servers are added within the 10s timeout that we have when
waiting for candidates. To make it easier to detect these situations, we
log a warning if we see that a new connection is being created without
any active relays.
One may argue that we should just disallow the connection altogether,
i.e. return a `Result`. Yet, this situation happens so rarely that
having to handle this `Result` further up the stack is quite the
ergonomic hit.
As measured by running `perf` on our tests, a big part of why they are
slow is that we are calling `handle_timeout` basically on every
iteration of the `advance` loop in the test. Similar as in production,
there is no need to do that. Instead, we only call `handle_timeout` of a
particular component (client, gateway or relay) if they indicate that
they have something they are waiting for (as defined by `poll_timeout`).
Simply doing that makes the tests fail for certain scenarios where we
handle IP packets that aren't mean for the tunnel (such as DNS queries
or STUN messages for the relay / ICE agent). To fix this, we call
`handle_timeout` whenever `encapsulate` returns `None`. This is fairly
common across sans-IO systems: When a function that usually 1-to-1
transforms a packet instead handles it internally, it must have changed
internal state. To make code-organisation easier, `handle_timeout` is
treated as the "work-horse" of a sans-IO system: It is where all the
code goes that needs to perform processing upon multiple conditions.
Making this change drops the runtime of `tunnel_test` from ~33s to ~12s
on my machine (tests compiled with opt-level 2).
Creating a new span is fairly expensive when it happens as part of a hot
function. Decapsulating packets in `snownet` is such a hot-function:

Previously, we created a new span for every packet that we decrypted
which accounted for ~3% of spent CPU time. We can optimise this and
remove some duplication by creating the span early and simply only
entering it every time we want it to be active. This results in
`boringtun`'s `decapsulate` being the most expensive function that
happens in `decapsulate`:

On a system with only a single IP stack (either V4 or V6), we will only
have a single socket. When the system gets busy, the `send` function is
extremely hot for obvious reasons. With only a single socket active, we
allocated a lot of strings and errors here that ended up not being used
at all. This accounts for about 1% of CPU time spent during a speedtest.
When CIDR resources get added or removed, we need to update the routing
table on the clients to redirect traffic for these resources to the TUN
device. Currently, this is done in a separate event from the remaining
`TunConfig` tracked in `connlib`. Having this in a separate event means
it is hard to diff, whether anything meaningful changed about the TUN
device. Additionally, changes to these routes are currently not tested
in `tunnel_test`.
Not having this code tested already caused bugs previously, such as
#6387.
To fix these things, we:
- Add the IPv4 and IPv6 routes to the `TunConfig` tracked in `connlib`
- Track the expected routes in `RefClient`
- Assert that we don't emit `TunConfigUpdated` events without any actual
changes
Fixes: #6423.
This PR introduces the `etherparse` dependency for parsing and
generating IP packets.
Using `etherparse`, we can implement the NAT46 & NAT64 implementations
for the gateway more elegantly because it allows us to parse the IP and
protocol headers into a static and much richer representation. The
conversion to the IPv4/IPv6 equivalent is then just a question of
transforming one data structure into another and writing it to the
correct place in the buffer.
We extract this functionality into dedicated `nat64` and `nat46`
modules.
Furthermore, we implement the various functions in `ip_packet::make`
using `etherparse` too. Following that, we also overhaul the NAT
translation tests that we have in `ip_packet::proptests`. Those now use
the more low-level `consume_to_ipX` APIs which makes the tests more
ergonomic to write.
In the future, we should upstream `Ipv4HeaderSliceMut` and
`Ipv6HeaderSliceMut` to `etherparse`.
Moving all of this functionality to `etherparse` will make it easier to
write tests that involve more IP packets as well as customise the
behaviour of our NAT.
Related: #5614.
Related: #6371.
Related: #6353.
With this PR we made internet resource disabled by default.
Since no other resource is disalable and internet resource behavior is
particular we remove all client code to make non internet resource
disalable.
Also, since the portal never makes the internet resource that can't be
disabled we remove the whole code path to handle that.
Additionally, some other smaller refactors across the UI wrt internet
resource
Fix#6509
---------
Signed-off-by: conectado <gabrielalejandro7@gmail.com>
Co-authored-by: conectado <conectado@conectados-MacBook-Air.local>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Currently, we buffer UDP packets whenever the socket is busy and try to
flush them out at a later point. This requires allocations and is tricky
to get right.
In order to solve both of these problems, we refactor `snownet` to
return us an `EncryptedPacket` instead of a `Transmit`. An
`EncryptedPacket` is an indirection-abstraction that can be turned into
a `Transmit` given an `EncryptBuffer`. This combination of types allows
us to hold on to the `EncryptedPacket` (which does not contain any
references itself) in the `io` component whilst we are waiting for the
socket to be ready to send again.
This means we will immediately suspend the event loop in case the socket
is no longer ready for sending and resend the datagram in the
`EncryptBuffer` once we get re-polled.
Instead of tracking pending connections to resources, we need to model
pending connections to gateways. The offending test seed has a CIDR
resource that is a DNS server and the Internet resources, both routed
via the same gateway.
When sending concurrent DNS queries to those resources, we need to track
which _gateways_ we are connecting to as a result to figure out which
queries get lost. In particular, only the _first_ resource to trigger a
connection to a gateway will be authorized. Subsequent queries will be
completely lost and require another packet to authorize the connection.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
In order to see all things that went wrong during a test run, we install
a special subscriber with the logger for `tunnel_test` that panics after
it has received at least 1 `ERROR` log.
The panics changed in this PR are still from a time when we didn't have
that. We change them so that we have better diagnostics for when they
get hit.
This publishes the 1.3.0 clients and gateways so that Internet Resources
will work.
The feature is still disabled for the Stripe plans until we publish the
launch post. Select customers have the feature enabled.
Closes#2667
Currently, `connlib` tracks the `Interface` as it is given it by the
portal. This includes the tunnel IP addresses plus the upstream servers.
Upstreams servers however only take effect when they are defined.
Without upstream DNS servers, `connlib` uses the system-defined DNS
servers. In that case, the `Interface` no longer accurately represents,
what we actually configure on the TUN device.
To fix this, we introduce a dedicated `TunConfig` struct that tracks,
what is actually set on the interface. This also allows us to track,
whether or not we need to re-emit this configuration after a change.
Related: #6423.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Currently, the gateway requires a strict ordering of first receiving a
`request_connection` message, following by multiple `allow_access`
messages. Additionally, access can be granted as part of the initial
`request_connection` message too.
This isn't an ideal design. Setting up a new connection is infallible,
all we need to do is send our ICE credentials back to the client.
However, untangling that will require a bit more effort.
Starting with #6335, following this strict order on the client is a more
difficult. Whilst we can send them in order, it is harder to maintain
those ordering guarantees across all our systems.
To avoid this, we change the gateway to perform an upsert for its local
ACLs for a client. In case that an `allow_access` call would somehow get
to the gateway earlier, we can simply already create the `Peer` and only
set up the actual connection later.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Upon receiving packets for a resource that we are not connected to,
connlib emits a "connection intent" to the portal. In case there are
gateways online for this resource, the portal sends us a "connection
details" event.
Currently, this is handled in a `create_or_reuse_connection` function.
What the current name doesn't capture is that this message is
essentially an update to connlib's "routing table", i.e. which gateway
in which site to use for the given resource. If we move this concern to
the fore-front of the design, whether or not we will make a new
connection or reuse an existing one kind of becomes secondary.
Re-framing the way we handle this messages makes it more natural to
design it in an asynchronous way, i.e. set its return type to `()` and
schedule events to be emitted. The translation of
`Request::NewConnection` is more or less 1-to-1 with the introduction of
`ClientEvent::RequestConnection`. The translation of
`Request::ReuseConnection` turns into the also renamed
`ClientEvent::RequestAccess`. This captures better what we need to do:
When we have an existing connection, we need to request access for it,
otherwise the gateway will drop all packets we send to this resource.
The motivation for this refactoring is #6335. Buffering the initial
packets while establishing a new connection opens up a race condition
where we may send `RequestAccess` before the gateway has processed
`RequestConnection`. In order to avoid this, we need to be able to
locally buffer our `RequestAccess` messages and wait until the gateway
has confirmed our connection.
In the `tunnel_test` test suite, we send ICMP requests with arbitrary
sequence numbers and identifiers. Due to the NAT implementation of the
gateway, the sequence number and identifier chosen by the client are not
necessarily the same as the ones sent to the resource. Thus, it is
impossible to correlate the ICMP packets sent by the client with the
ones arriving at the gateway.
Currently, our test suite thus relies on the ordering of packets to
match them up and assert properties on them, like whether they target
the correct resource. As soon as we want to send multiple packets
concurrently, this order is not necessarily stable.
ICMP echo requests can contain an arbitrary payload. We utilise this
payload to embed a random u64 that acts as the unique identifier of an
ICMP request. This allows us to correlate the packets arriving at the
gateway with the ones sent by the client, making the test suite more
robust and ready for handling concurrent ICMP packets.
`tunnel_test` uses the `FluxCapacitor` component to rapidly advance time
within a test-run. This component is also used by the logger in the
tests to print _relative_ timestamps which makes it easier to compare
different test runs.
Currently, this component is initialised in the `ReferenceState`
although it isn't really part of the test-input itself. It is only
needed during the execution of a transition.
When proptest finds a failing test run, it will reuse the same
`ReferenceState` and attempt to shrink it to find the minimally-failing
input. It does this by calling `.clone`. Because `FluxCapacitor` uses
interior mutability to advance time, this means the time isn't actually
reset to `0` whilst proptest is shrinking the input.
This has / had on impact on the outcome of the test, it only makes the
logs harder and more confusing to read.
We fix this by removing the `StateMachineTest` trait implementation of
`TunnelTest` and passing an additional parameter to `init_state`. This
trait was originally needed because we were using the "no-boilerplate"
version of `proptest-state-machine`. We have recently migrated away from
that to get more fine-grained control over logging and test execution,
meaning we no longer need this trait and can simply call these functions
ourselves.
Bumps [swift-bridge-build](https://github.com/chinedufn/swift-bridge)
from 0.1.55 to 0.1.57.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/chinedufn/swift-bridge/releases">swift-bridge-build's
releases</a>.</em></p>
<blockquote>
<h2>0.1.57</h2>
<ul>
<li>
<p>Support Failable initializers. <a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/276">#276</a>
(thanks <a
href="https://github.com/niwakadev"><code>@niwakadev</code></a>)</p>
<pre lang="rust"><code>// Rust
<p>#[swift_bridge::bridge]
mod ffi {
extern "Rust" {
#[swift_bridge(Equatable)]
type FailableInitType;</p>
<pre><code> #[swift_bridge(init)]
fn new() -&gt; Option&lt;FailableInitType&gt;;
}
</code></pre>
<p>}
</code></pre></p>
<pre lang="swift"><code>// Swift
let failableInitType = FailableInitType()
if failableInitType == nil {
// ...
} else {
// ...
}
</code></pre>
</li>
<li>
<p>Support Throwing initializers <a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/287">#287</a>
(thanks <a
href="https://github.com/niwakadev"><code>@niwakadev</code></a>)</p>
<pre lang="rust"><code>// Rust
<p>#[swift_bridge::bridge]
mod ffi {
enum ResultTransparentEnum {
NamedField { data: i32 },
UnnamedFields(u8, String),
NoFields,
}
extern "Rust" {
type ThrowingInitializer;
#[swift_bridge(init)]
fn new(succeed: bool) -> Result<ThrowingInitializer,
ResultTransparentEnum>;
fn val(&self) -> i32;
}
}
</code></pre></p>
<pre lang="swift"><code>// Swift
do {
</code></pre>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1b547a5f0c"><code>1b547a5</code></a>
Support throwing initializers (<a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/287">#287</a>)</li>
<li><a
href="495611ba39"><code>495611b</code></a>
Support failable initializers (<a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/276">#276</a>)</li>
<li><a
href="d2c09e2e60"><code>d2c09e2</code></a>
0.1.56</li>
<li><a
href="d3d2da4e01"><code>d3d2da4</code></a>
Upgrade macOS CI runners (<a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/285">#285</a>)</li>
<li><a
href="37ac2c2627"><code>37ac2c2</code></a>
Apply input module visibility to output module (<a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/284">#284</a>)</li>
<li><a
href="34ce1ffb6b"><code>34ce1ff</code></a>
Suppress dead_code warning when returning <code>-> Result\<*,
TransparentType></code> (<a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/278">#278</a>)</li>
<li><a
href="717fcef70d"><code>717fcef</code></a>
Add parse-bridges CLI subcommand (<a
href="https://redirect.github.com/chinedufn/swift-bridge/issues/274">#274</a>)</li>
<li><a
href="ef01d21001"><code>ef01d21</code></a>
0.1.55</li>
<li>See full diff in <a
href="https://github.com/chinedufn/swift-bridge/compare/0.1.55...0.1.57">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The only reason we use an `AtomicU32` here is because the closure is
only an `Fn` and not an `FnMut` closure which prevents us from using a
regular `u32`.
Right now, whenever a connection is established we update the site
status.
In order to do that, we call `on_update_resources`, when
`on_update_resources` is called this in turn calls
`set_disabled_resources`, since we apply from the application side the
"disabled" given the current resources.
`set_disabled_resources` currently, always call `on_update_routes`,
which causes connectivity issues on Android and MacOS, since the packets
aren't correctly routed when the routes are changed.
To fix this we make `set_disabled_resources` only emit the routes when
they have actually changed.
Fixes: #6387.
---------
Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
In order to accurately model how `connlib` tracks resources, we need to
store the list of all resources separately from the CIDR resources. That
is because CIDR resources can overlap or target an identical CIDR range.
In that case, `connlib`s current behaviour is "last-wins".
Whenever we reconnect to the portal, we re-add our list of resources in
the order they are given to us. To model this correctly, we store the
list of resources in the tests in the order we receive them throughout
the previous session. This may not necessarily be the order in which the
portal does it but that is irrelevant. What is important is that it is
deterministic.
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Previously, failing to bind to any interfaces was a hard-error. In
reality and in `connlib`'s current state, this is quite unlikely because
machines will at least have a loopback interface that we will bind to.
However, with #6382 in the pipeline, it may be more likely that we
actually end up with no functional UDP sockets. Furthermore, we are
considering to extend those connectivity checks in the future.
Thus, it is important that the case of "no available UDP sockets" is
gracefully handled.
Instead of failing with a hard-error, we now suspend `connlib's` network
stack. The connectivity to the portal is unaffected by this and we will
still also receive commands from the client application like `reset`.
When we receive a `reset`, we attempt to rebind the sockets and thus
retry connectivity.
Because we are suspending the entire eventloop, this won't send any
messages or trigger any timers whatsoever. For example, if we
hypothetically started up without network interfaces, this is now the
log output:
```
2024-08-22T01:50:42.170101Z INFO firezone_headless_client: arch="x86_64" git_version="headless-client-1.2.0-2-gc8eed5938-modified"
2024-08-22T01:50:42.178777Z DEBUG phoenix_channel: Connecting to portal host=api.firez.one user_agent=NixOS/24.5.0 connlib/1.2.1 (x86_64; 6.8.12)
2024-08-22T01:50:42.178978Z DEBUG firezone_headless_client::dns_control::linux: Deactivating DNS control...
2024-08-22T01:50:42.180691Z ERROR firezone_tunnel::sockets: No available UDP sockets
2024-08-22T01:50:42.197098Z INFO firezone_tunnel::device_channel: Initializing TUN device name=tun-firezone
2024-08-22T01:50:42.197165Z DEBUG firezone_tunnel::client: Unable to update DNS servesr without interface configuration
2024-08-22T01:50:42.453988Z DEBUG tungstenite::handshake::client: Client handshake done.
2024-08-22T01:50:42.454161Z INFO phoenix_channel: Connected to portal host=api.firez.one
2024-08-22T01:50:42.676825Z DEBUG firezone_tunnel::client: Updating DNS servers mapping={fd00:2021:1111:8000:100:100:111:0 <> [2606:4700:4700::1111]:53, 100.100.111.1 <> 1.1.1.1:53}
2024-08-22T01:50:42.677084Z INFO firezone_tunnel::client: Activating resource name=IPerf3 address=10.0.32.101/32 sites=AWS Dev (Gateways track `main`)
2024-08-22T01:50:42.677173Z INFO firezone_tunnel::client: Activating resource name=*.slack.com address=**.slack.com sites=Vultr Stable (Latest Release Gateways)
2024-08-22T01:50:42.677223Z INFO firezone_tunnel::client: Activating resource name=*.slack-edge.com address=**.slack-edge.com sites=Vultr Stable (Latest Release Gateways)
2024-08-22T01:50:42.677283Z INFO firezone_tunnel::client: Activating resource name=*.spotify.com address=**.spotify.com sites=AWS Dev (Gateways track `main`)
2024-08-22T01:50:42.677345Z INFO firezone_tunnel::client: Activating resource name=*.github.com address=**.github.com sites=AWS Dev (Gateways track `main`)
2024-08-22T01:50:42.677418Z INFO firezone_tunnel::client: Activating resource name=whatismyip.com address=**.whatismyip.com sites=AWS Dev (Gateways track `main`)
2024-08-22T01:50:42.677489Z INFO firezone_tunnel::client: Activating resource name=ifconfig.net address=ifconfig.net sites=Vultr Stable (Latest Release Gateways)
2024-08-22T01:50:42.677538Z INFO firezone_tunnel::client: Activating resource name=*.google.com address=**.google.com sites=AWS Dev (Gateways track `main`)
2024-08-22T01:50:42.677632Z INFO firezone_tunnel::client: Activating resource name=*.fastmail.com address=**.fastmail.com sites=AWS Dev (Gateways track `main`)
2024-08-22T01:50:42.677682Z INFO firezone_tunnel::client: Activating resource name=speed.cloudflare.com address=speed.cloudflare.com sites=Vultr Stable (Latest Release Gateways)
2024-08-22T01:50:42.678212Z INFO snownet::node: Added new TURN server rid=b6fc4d73-9c8e-44df-a941-da7d2134cb70 address=Dual { v4: 34.40.133.55:3478, v6: [2600:1900:40b0:1504:0:97::]:3478 }
2024-08-22T01:50:42.678322Z INFO snownet::node: Added new TURN server rid=c818b11a-d0cc-4f2a-bb88-473d8298a885 address=Dual { v4: 34.81.229.132:3478, v6: [2600:1900:4030:b0d9:0:9b::]:3478 }
2024-08-22T01:50:42.678365Z INFO connlib_client_shared::eventloop: Firezone Started!
```
After this, nothing will happen other than receiving messages via from
the portal or the client app.
Related: #6382.
Related: #6385.
Previously, `connlib` would only send the currently connected gateways
to the portal upon a new connection intent. With our introduced idle
connection timeout, this could result in the portal choosing a different
gateway upon reconnecting to the resource.
To fix this, we introduce an LRU cache with at most 100 entries.
Iteration over the LRU cache happens in MRU order, meaning a recently
connected gateway will be at the front of the list.
We assume that this list is processed in order and thus still prefer
gateways that we are still connected to.
Related: #6347.
[Refs](https://github.com/firezone/firezone/pull/6299#discussion_r1724108733)
The problem right now, after #6325 we send the internet resource up to
the clients. The clients expect a certain format for the resources and
panic if it isn't followed.
Particularly, in the case of no `address` or no `name`.
To fix this, we add a name and an address for the internet resource when
it is converted to the callback type.
Setting the `name` at that point actually makes a lot of sense since it
homogenizes the name across all platforms. But the internet resource
having an address makes no sense.
So in a next PR, when I do the last UI changes I plan to make `address`
optional for all resources on the clients and specialize the display of
the internet resource.
For now I wanted to get this in so that we don't ever panic on the
internet resource existing. (This was tested on all platforms and it
works)
This PR handles the internet resource both on the gateway and client.
In the gateway, it's handled like any other resource save for the
address which is both ipv6 and ipv4.
In the client it's handled mostly like any other resource but with some
exceptions for DNS forwarded packets, because we want to work as a CIDR
resource for the matter of forwarding packets to the gateway.
Fixes: #6313.
---------
Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Currently, `snownet` allocates a 65KB buffer per connection as a
scratch-space for encrypting packets. 65KB is the theoretical limit of a
UDP packet. In practice, the largest UDP packets we send are 1336 bytes
due to the MTU of 1280 set on our TUN interface and various overheads
for WG, TURN channels and NAT46.
Thus, it is unnecessary to allocate such a large buffer per connection.
For gateways with many connections, reducing these buffers results in a
smaller memory footprint.
Additionally, any UDP packets larger than this buffer could be an
indicator of a DoS attack and we can thus drop them without processing.
A legitimate client / gateway will never send a packet larger than that.
In `connlib`, when a CIDR resource gets disabled, we remove it from the
`IpNetworkTable` that does the routing for the packets. This ensures
that when we check for the `longest_match` of a packet, disabled
resources are not considered.
In
https://github.com/firezone/firezone/actions/runs/10449400486/job/28931681264?pr=6339,
CI found a bug where the reference implementation in the tests diverged
from this behaviour because it implements this behaviour slightly
differently. To ensure we don't match against a disabled resource, we
match all resources, filter out the disabled ones and then pick the one
with the highest netmask which should be the most specific one.
For quite a while now, we have been making extensive use of
property-based testing to ensure `connlib` works as intended. The idea
of proptests is that - given a certain seed - we deterministically
sample test inputs and assert properties on a given function.
If the test fails, `proptest` prints the seed which can then be added to
a regressions file to iterate on the test case and fix it. It is quite
obvious that non-determinism in how the test input gets generated is no
bueno and reduces the value we get out of these tests a fair bit.
The `HashMap` and `HashSet` data structures are known to be
non-deterministic in their iteration order. This causes non-determinism
during the input generation because we make use of a lot of maps and
sets to gradually build up the test input. We fix all uses of `HashMap`
and `HashSet` by replacing them with `BTreeMap` and `BTreeSet`.
To ensure this doesn't regress, we refactor `tunnel_test` to not make
use of proptest's macros and instead, we initialise and run the test
ourselves. This allows us to dump the sampled state and transitions into
a file per test run. In CI, we then run a 2nd iteration of all
regression tests and compare the sampled state and transitions with the
previous run. They must match byte-for-byte.
Finally, to discourage use of non-deterministic iteration, we ban the
use of the iteration functions on `HashMap` and `HashSet` across the
codebase. This doesn't catch iteration in a `for`-loop but it is better
than not linting against it at all.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>