This seems to fix#6033
What **seems** to be happening is that sometimes responses are delayed
and hickory cache the negative response.
We disable the cache, and the multiple attempts to be as transparent as
possible until #6141 is implemented.
Furthermore, the lack of recursion available in responses can cause
issues in some clients and enabling it shouldn't cause any problems.
When a relay disconnects from the portal, either during deployment or
because of a network partition, the portal sends us a `relays_presence`
event. This allows us to discontinue use of a relay. Any connections
that currently use that relay get cut and the next packet reestablishes
a new one.
In the case of relays being re-deployed, their state is gone entirely
and we will receive new relays to use. In the case of a network
partition, the relay would have retained its state but we have already
discarded ours locally. Only one allocation per client (identified by
its 3-tuple) is allowed, so making a new allocation on that relay would
fail.
In order to sync up this inconsistency, we delete our current allocation
and make a new one if we detect this case. To test this, we introduce a
new state transition to `tunnel_test` that simulates such a network
partition.
In addition, we also remove the "upsert" behaviour of relays. The
credentials of a relay can only change if it reboots. Rebooting would
trigger a `relays_presence` event and tell us to disconnect from that
relay. Thus, receiving a relay that we already know is guaranteed to use
the same credentials.
Removal of this upserting behaviour is essentially the fix for #6067.
Due to a portal bug (#6099), we may receive a relay as connected that is
in fact shutting down. In case a channel needs to be refreshed on
exactly that relay - whilst we are trying to refresh the allocation it
as part of upserting - causes a busy loop of attempting to queue a
message but failing to do so because we haven't chosen an
`active_socket` yet for that relay.
Fixes: #6067.
This almost always indicate a user-impacting connectivity error. For
customers troubleshooting their Gateways by greping for `ERROR`, this
will make these much easier to find.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
With the upcoming feature of full-route tunneling aka an "Internet
Resource", we need to expand the reference state machine in
`tunnel_test`. In particular, packets to non-resources will now be
routed the gateway if we have previously activated the Internet
resource.
This is reasonably easy to model as we can see from the small diff.
Because `connlib` doesn't actually support the Internet resource yet,
the code snippet for where it is added to the list of all possible
resources to sample from is commented out.
When `tunnel_test` fails, it prints the initial state in verbose debug
formatting. Most of the fields in `RefClient` track state _during_ the
runtime of the test and are all empty initially. The same thing applies
to `Host`.
To make this output easier to read and scroll, we ignore some of these
fields in the debug output.
On the gateway, the only packets we are interested in receiving on the
TUN device are the ones destined for clients. To achieve this, we
specifically set routes for the reserved IP ranges on our interface.
Multicast packets as such as MLDV2 get sent to all packets and cause
unnecessary noise in our logs. Thus, as a defense-in-depth measure, we
drop all packets outside of the IP ranges reserved for our clients.
Currently, each connection always uses all relays. That is pretty
wasteful in terms of bandwidth usage and processing power because we
only ever need a a single relay for a connection. When we re-deploy
relays, we actively invalidate them, meaning the connection gets cut
instantly without waiting for an ICE timeout and the next packet will
establish a new one.
This is now also asserted with a dedicated transition in `tunnel_test`.
To correctly simulate this in `tunnel_test`, we always cut the
connection to all relays. This frees us from modelling `connlib`'s
internal strategy for picking a relay which keeps the reference state
simple.
Resolves: #6014.
In `connlib`, traffic is sent through sockets via one of three ways:
1. Direct p2p traffic between clients and gateways: For these, we always
explicitly set the source IP (and thus interface).
2. UDP traffic to the relays: For these, we let the OS pick an
appropriate source interface.
3. WebSocket traffic over TCP to the portal: For this too, we let the OS
pick the source interface.
For (2) and (3), it is possible to run into routing loops, depending on
the routes that we have configured on the TUN device.
In Linux, we can prevent routing loops by marking a socket [0] and
repeating the mark when we add routes [1]. Packets sent via a marked
socket won't be routed by a rule that contains this mark. On Android, we
can do something similar by "protecting" a socket via a syscall on the
Java side [2].
On Windows, routing works slightly different. There, the source
interface is determined based on a computed metric [3] [4]. To prevent
routing loops on Windows, we thus need to find the "next best" interface
after our TUN interface. We can achieve this with a combination of
several syscalls:
1. List all interfaces on the machine
2. Ask Windows for the best route on each interface, except our TUN
interface.
3. Sort by Windows' routing metric and pick the lowest one (lower is
better).
Thanks to the abstraction of `SocketFactory` that we already previously
introduced, Integrating this into `connlib` isn't too difficult:
1. For TCP sockets, we simply resolve the best route after creating the
socket and then bind it to that local interface. That way, all packets
will always going via that interface, regardless of which routes are
present on our TUN interface.
2. UDP is connection-less so we need to decide per-packet, which
interface to use. "Pick the best interface for me" is modelled in
`connlib` via the `DatagramOut::src` field being `None`.
- To ensure those packets don't cause a routing loop, we introduce a
"source IP resolver" for our `UdpSocket`. This function gets called
every time we need to send a packet without a source IP.
- For improved performance, we cache these results. The Windows client
uses this source IP resolver to use the above devised strategy to find a
suitable source IP.
- In case the source IP resolution fails, we don't send the packet. This
is important, otherwise, the kernel might choose our TUN interface again
and trigger a routing loop.
The last remark to make here is that this also works for connection
roaming. The TCP socket gets thrown away when we reconnect to the
portal. Thus, the new socket will pick the new best interface as it is
re-created. The UDP sockets also get thrown away as part of roaming.
That clears the above cache which is what we want: Upon roaming, the
best interface for a given destination IP will likely have changed.
[0]:
59014a9622/rust/headless-client/src/linux.rs (L19-L29)
[1]:
59014a9622/rust/bin-shared/src/tun_device_manager/linux.rs (L204-L224)
[2]:
59014a9622/rust/connlib/clients/android/src/lib.rs (L535-L549)
[3]:
https://learn.microsoft.com/en-us/previous-versions/technet-magazine/cc137807(v=msdn.10)?redirectedfrom=MSDN
[4]:
https://learn.microsoft.com/en-us/windows-server/networking/technologies/network-subsystem/net-sub-interface-metricFixes: #5955.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
`connlib`'s event loop performs work in a very particular order:
1. Local buffers like IP, UDP and DNS packets are emptied.
2. Time-sensitive tasks, if any, are performed.
3. New UDP packets are processed.
4. New IP packets (from the TUN device) are processed.
This priority ensures we don't accept more work (i.e. new packets) until
we have finished processing existing work. As a result, we can keep
local buffers small and processing latencies low.
I am not completely confident on the issue of #6067 but if the busy-loop
originates from a bad timer, then the above priority means we never get
to the part where we read new UDP or IP packets and components such a
`PhoenixChannel` - which operate outside of `connlib'`s event loop -
don't get any CPU time.
A naive fix for this problem is to just de-prioritise the polling of the
timer within `Io::poll`. I say naive because without additional changes,
this could delay the processing of time-sensitive tasks on a very busy
client / gateway where packets are constantly arriving and thus we
never[^1] reach the part where the timer gets polled.
To fix this, we make two distinct changes:
1. We pro-actively break from `connlib'`s event loop every 5000
iterations. This ensures that even on a very busy system, other
components like the `PhoenixChannel` get a chance to do _some_ work once
in a while.
2. In case we force-yield from the event loop, we call `handle_timeout`
and immediately schedule a new wake-up. This ensures time does advance
in regular intervals as well and we don't get wrongly suspended by the
runtime.
These changes don't prevent any timer-loops by themselves. With a
timer-loop, we still busy-loop for 5000 iterations and thus
unnecessarily burn through some CPU cycles. The important bit however is
that we stay operational and can accept packets and portal messages. Any
of them might change the state such that the timer value changes, thus
allowing `connlib` to self-heal from this loop.
Fixes: #6067.
[^1]: This is an assumption based on the possible control flow. In
practise, I believe that reading from the sockets or the TUN device is a
much slower operation than processing the packets. Thus, we should
eventually hit the the timer path too.
We don't want the timer to fire multiple times at the same `Instant`
unless it has been specifically set to that `Instant` again. Thus, clear
the timer after it fired.
I don't think this fixed#6067 but it can't hurt.
Connection roaming within `connlib` has changed a fair-bit since we
introduced the `reconnect` function. The new implementation is basically
a hard-reset of all state within `connlib`. Renaming this function
across all layers makes this more obvious.
Resolves: #6038.
As part of debugging full-route tunneling on Windows, we discovered that
we need to always explicitly choose the interface through which we want
to send packets, otherwise Windows may cause a routing loop by routing
our packets back into the TUN device.
We already have a `SocketFactory` abstraction in `connlib` that is used
by each platform to customise the setup of each socket to prevent
routing loops.
So far, this abstraction directly returns tokio sockets which don't
allow us to intercept the actual sending of packets. For some of our
traffic, i.e. the UDP packets exchanged with relays, we don't specify a
source address. To make full-route work on Windows, we need to intercept
these packets and explicitly set the source address.
To achieve that, we introduce dedicated `TcpSocket` and `UdpSocket`
structs within `socket-factory`. With this in place, we will be able to
add Windows-conditional code to looks up and sets the source address of
outgoing UDP packets. For TCP sockets, the lookup will happen prior to
connecting to the address and used to bind to the correct interface.
Related: #2667.
Related: #5955.
In #5948, we start testing network latency within `tunnel_test` to make
sure _some_ time-related things are triggered. Building on top of that,
we now add an `Idle` transition that does nothing for 5 minutes. After 5
minutes of idling, we auto-close a connection.
Using this new state transition, we can replace another test within
`snownet`, further reducing that (duplicated) test suite. In addition,
this gives us some more coverage of code by testing whether allocations
and channel bindings can be refreshed accordingly.
Currently, `tunnel_test` executes all actions within the same `Instant`,
i.e. time is never advanced by itself. The difficulty with advancing
time compared to other actions like sending packets is that all
time-related actions "overlap". In other words, all timers within
connlib advance at the same time. This makes it difficult to model the
expected behaviour after a certain amount of time has passed as we'd
effectively need to model all timers and their relation to particular
actions (like resending of connection intents or STUN requests).
Instead of only advancing time by itself, we can model some aspect of it
by introducing latency on network messages. This allows us to define a
range of an "acceptable" network latency within everything is expected
to work.
Whilst this doesn't cover all failure cases, it gives us a solid
foundation of parameters within which we should not expect any
operational problems.
Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.8.0 to 1.10.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/uuid-rs/uuid/releases">uuid's
releases</a>.</em></p>
<blockquote>
<h2>1.10.0</h2>
<h2>Deprecations</h2>
<p>This release deprecates and renames the following functions:</p>
<ul>
<li><code>Builder::from_rfc4122_timestamp</code> ->
<code>Builder::from_gregorian_timestamp</code></li>
<li><code>Builder::from_sorted_rfc4122_timestamp</code> ->
<code>Builder::from_sorted_gregorian_timestamp</code></li>
<li><code>Timestamp::from_rfc4122</code> ->
<code>Timestamp::from_gregorian</code></li>
<li><code>Timestamp::to_rfc4122</code> ->
<code>Timestamp::to_gregorian</code></li>
</ul>
<h2>What's Changed</h2>
<ul>
<li>Use const identifier in uuid macro by <a
href="https://github.com/Vrajs16"><code>@Vrajs16</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/764">uuid-rs/uuid#764</a></li>
<li>Rename most methods referring to RFC4122 by <a
href="https://github.com/Mikopet"><code>@Mikopet</code></a> / <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/765">uuid-rs/uuid#765</a></li>
<li>prepare for 1.10.0 release by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/766">uuid-rs/uuid#766</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/Vrajs16"><code>@Vrajs16</code></a> made
their first contribution in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/764">uuid-rs/uuid#764</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/uuid-rs/uuid/compare/1.9.1...1.10.0">https://github.com/uuid-rs/uuid/compare/1.9.1...1.10.0</a></p>
<h2>1.9.1</h2>
<h2>What's Changed</h2>
<ul>
<li>Add an example of generating bulk v7 UUIDs by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/761">uuid-rs/uuid#761</a></li>
<li>Avoid taking the shared lock when getting usable bits in
Uuid::now_v7 by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/762">uuid-rs/uuid#762</a></li>
<li>Prepare for 1.9.1 release by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/763">uuid-rs/uuid#763</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/uuid-rs/uuid/compare/1.9.0...1.9.1">https://github.com/uuid-rs/uuid/compare/1.9.0...1.9.1</a></p>
<h2>1.9.0</h2>
<h2><code>Uuid::now_v7()</code> is guaranteed to be monotonic</h2>
<p>Before this release, <code>Uuid::now_v7()</code> would only use the
millisecond-precision timestamp for ordering. It now also uses a global
42-bit counter that's re-initialized each millisecond so that the
following will always pass:</p>
<pre lang="rust"><code>let a = Uuid::now_v7();
let b = Uuid::now_v7();
<p>assert!(a < b);<br />
</code></pre></p>
<h2>What's Changed</h2>
<ul>
<li>Add a get_node_id method for v1 and v6 UUIDs by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/748">uuid-rs/uuid#748</a></li>
<li>Update atomic and zerocopy to latest by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/750">uuid-rs/uuid#750</a></li>
<li>Add repository field to uuid-macro-internal crate by <a
href="https://github.com/paolobarbolini"><code>@paolobarbolini</code></a>
in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/752">uuid-rs/uuid#752</a></li>
<li>update docs to updated RFC (from 4122 to 9562) by <a
href="https://github.com/Mikopet"><code>@Mikopet</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/753">uuid-rs/uuid#753</a></li>
<li>Support counters in v7 UUIDs by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/755">uuid-rs/uuid#755</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/paolobarbolini"><code>@paolobarbolini</code></a>
made their first contribution in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/752">uuid-rs/uuid#752</a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4b4c590ae3"><code>4b4c590</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/766">#766</a> from
uuid-rs/cargo/1.10.0</li>
<li><a
href="68eff32640"><code>68eff32</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/765">#765</a> from
uuid-rs/chore/time-fn-deprecations</li>
<li><a
href="3d5384da4b"><code>3d5384d</code></a>
update docs and deprecation messages for timestamp fns</li>
<li><a
href="de50f2091f"><code>de50f20</code></a>
renaming rfc4122 functions</li>
<li><a
href="4a8841792a"><code>4a88417</code></a>
prepare for 1.10.0 release</li>
<li><a
href="66b4fcef14"><code>66b4fce</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/764">#764</a> from
Vrajs16/main</li>
<li><a
href="8896e26c42"><code>8896e26</code></a>
Use expr instead of ident</li>
<li><a
href="09973d6aff"><code>09973d6</code></a>
Added changes</li>
<li><a
href="6edf3e8cd5"><code>6edf3e8</code></a>
Use const identifer in uuid macro</li>
<li><a
href="36e6f573aa"><code>36e6f57</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/763">#763</a> from
uuid-rs/cargo/1.9.1</li>
<li>Additional commits viewable in <a
href="https://github.com/uuid-rs/uuid/compare/1.8.0...1.10.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The different implementations of `Tun` are the last platform-specific
code within `firezone-tunnel`. By introducing a dedicated crate and a
`Tun` trait, we can move this code into (platform-specific) leaf crates:
- `connlib-client-android`
- `connlib-client-apple`
- `firezone-bin-shared`
Related: #4473.
---------
Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
We almost never `Debug`-print our IDs. Except in the proptests where the
test runner prints them. To allow for better use of full-text search,
apply the same formatting that we have for the `Display` output to the
`Debug` output as well.
In #5917, we introduced a sampled boolean that controls whether direct
traffic from clients to gateways is dropped. To correctly, simulate such
a network scenario, we also need to drop traffic from gateways back to
clients.
For `tunnel_test`, it is very important that each execution of a set of
state transitions is completely deterministic, otherwise the shrinking
behaviour does not work.
Iterating over `HashMap` and `HashSet` is non-deterministic. To fix
this, we convert several maps and sets to `BTreeMap`s and `BTreeSet`s.
Extracted out of #5797.
This is a problem that becomes evident as
https://github.com/firezone/firezone/issues/2667 is implemented:
Whenever connlib sees a DNS packet where the sentinel DNS is a resource,
it's forwarded to the resource instead of requests being resolved
locally. This doesn't work well with system's DNS servers since many
times those are provided by the DHCP to be a local resolver which can't
be reached from a gateway. Meaning that with full route this request
will be just dropped. Preventing all internet connections outside of
Firezone.
Most of the times when an administrator actually wants to forward all
DNS request they will add explicitly an upstream DNS server which makes
sense since depending on what the local DHCP configures isn't a good
idea if you want to tunnel DNS requests.
This makes this behavior explicit and docs and UI should be updated
accordingly.
Co-authored-by: Gabi <gabrielalejandro7@gmail.com>
---------
Co-authored-by: Gabi <gabrielalejandro7@gmail.com>
Currently, the relay path in `tunnel_test` is only hit accidentally
because we don't run the gateways in dual-stack mode and thus, some
testcases have a client and gateways that can't talk to each other (and
thus fall back to the relay).
This requires us to filter out certain resources because we can't route
to an IPv6 CIDR resource from an IPv4-only gateway. This causes quite a
lot of rejections which creates problems when one attempts up the number
of test cases (i.e. 10_000).
To fix this, we run the gateways always in dual-stack mode and introduce
a dedicated flag that sometimes drop all direct traffic between the
client and the gateways.
To determine whether we send proxy IPs we depend on the `allowed_ips`,
since that's where we track what resources we have sent to a given
gateway.
However, the way we were matching if a given resource destination was
sent was using `longest_match` and with overlapping DNS this no longer
works, since this will match for internet resources even if the proxy IP
wasn't sent.
So we check that it's a DNS resource and if it's we exactly match on the
allowed ip table.
Alternatively, we could keep track of `sent_ips` for a gateway, though
this is a bit of a redundant state that we need to keep in sync but has
the benefit of being more explicit, so I'm open to do that in a follow
up PR. But I'd like to merge this to get ready for internet resources.
Currently, `tunnel_test` is broken as a result of #5871. In particular,
adding a resource requires that the resource is assigned to a gateway
which can only be done after it is being added. As a result, no
resources are ever added in the test.
With this patch, we align the test even closer with how Firezone works
in production: We generate all resources ahead of time and selectively
activate / deactivate them on the client. Unfortunately, this requires
quite a few changes but overall, is a net-positive change.
Replaces: #5914.
The connection to the portal could be interrupted at any point, most
notably when it is being re-deployed. Doing so results in a new `init`
message being pushed to all clients and gateways. This must not
interrupt the data plane.
To ensure this, we add a new `ReconnectPortal` transition to
`tunnel_test` where we simulate receiving a new `init` message with the
same values as we already have locally, i.e. same set of relays and
resources.
This resolves an existing TODO where the logic of performing
non-destructive updates to resources in `set_resources` wasn't tested.
The two primary users of the `add_resources` and `remove_resources` are
the client's eventloop and the `tunnel_test`. Both of them only ever
pass a single resource at a time.
It is thus simpler to remove the inner loop from within `ClientState`
and simply process a single resource at a time.
In preparation for #2667, we add an `internet` variant to our list of
possible resource types. This is backwards-compatible with existing
clients and ensures that, once the portal starts sending Internet
resources to clients, they won't fail to deserialise these messages.
The portal will have a version check to not send this to older clients
anyway but the sooner we can land this, the better. It simplifies the
initial development as we start preparing for the next client release.
Adding new fields to a JSON message is always backwards-compatible so we
can extend this later with whatever we need.
Currently, `tunnel_test` aborts a `Transition` as soon as one assertion
fails. This often makes it hard to debug a problem as it can be useful
to see which assertions pass and which fail to figure out, what went
wrong.
To resolve this, we replace all `assert` macros with either `info!` or
`error!` trace events. All "failed assertions" must be logged as
`error!`.
Before running these assertions, we temporarily install a custom tracing
layer that keeps track of how many `error!` events are emitted. If we
emit at least one `error!` event, the layer pancis upon `Drop` which
happens at the end of the `check_invariants` function.
This represents a step towards #3837. Eventually, we'd like the
abstractions of `Session` and `Eventloop` to go away entirely. For that,
we need to thin them out.
The introduction of `ConnectArgs` was already a hint that we are passing
a lot of data across layers that we shouldn't. To avoid that, we can
simply initialise `PhoenixChannel` earlier and thus each callsite can
specify the desired configuration directly.
I've left `ConnectArgs` intact to keep the diff small.
For full route this happens always and if we don't prioritize DNS
resources any packet for DNS IPs will get routed to the full route
gateway which might not have the correct resource.
TODO: this still needs unit tests
TODO: Waiting on #5891
Currently, the relationship between gateways, sites and resources is
modeled in an ad-hoc fashion within `tunnel_test`. The correct
relationship is:
- The portal knows about all sites.
- A resource can only be added for an existing site.
- One or more gateways belong to a single site.
To express this relationship in `tunnel_test`, we first sample between 1
and 3 sites. Then we sample between 1 and 3 gateways and assign them a
site each. When adding new resources, we sample a site that the resource
belongs to. Upon a connection intent, we sample a gateway from all
gateways that belong to the site that the resource is defined in.
In addition, this patch-set removes multi-site resources from the
`tunnel_test`. As far as connlib's routing logic is concerned, we route
packets to a resource on a selected gateway. How the portal selected the
site of the gateway doesn't matter to connlib and thus doesn't need to
be covered in these tests.
Our Rust CI runs various jobs in different configurations of packages
and / or features. Currently, only the clippy job denies warnings which
makes it possible that some code still generates warnings under
particular configurations.
To ensure we always fail on warnings, we set a global env var to deny
warnings for all Rust CI jobs.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Extracted from #5840
Some cleanup on generating IPs and improve performance of picking a host
within an IP range by doing some math instead of iterating through the
ip range.
In the new version, `quinn-udp` no longer supports sending multiple
`Transmit`s at once via `sendmmsg`. We made use of that to send all
buffered packets in one go.
In reality, these buffered packets can only be control messages like
STUN requests to relays or something like that. For the hot-patch of
routing packets, we only ever read a single IP packet from the TUN
device and attempt to send it out right away. At most, we may buffer one
packet at a time here in case the socket is busy.
Getting these wake-ups right is quite tricky. I think we should
prioritise #3837 soon. Once that is integrated, we can use `async/await`
for the high-level integration between `Io` and the state which allows
us to simply suspend until we can send the message, avoiding the need
for a dedicated buffer.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Closes#5026Closes#5879
On the resource-constrained Windows Server 2022 test VM, the median
sign-in time dropped from 5.0 seconds to 2.2 seconds.
# Changes
- Measure end-to-end connection time in the GUI process
- Use `ipconfig` instead of Powershell to flush DNS faster
- Activate DNS control by manipulating the Windows Registry directly
instead of calling Powershell
- Remove deactivate step when changing DNS servers (seals a DNS leak
when roaming networks)
- Remove completely redundant `Set-DnsClientServerAddress` step from
activating DNS control
- Remove `Remove-NetRoute` powershell cmdlet that seems to do nothing
# Benchmark 7
- Optimized release builds
- x86-64 constrained VM (1 CPU thread, 2 GB RAM)
Main with measurement added, `c1c99197e` from #5864
- 6.0 s
- 5.5 s
- 4.1 s
- 5.0 s
- 4.1 s
- (Median = 5.0 s)
Main with speedups added, `2128329f9` from #5375, this PR
- 3.7 s
- 2.2 s
- 1.9 s
- 2.3 s
- 2.0 s
- (Median = 2.2 s)
```[tasklist]
### Next steps
- [x] Benchmark on the resource-constrained VM
- [x] Move raw benchmark data to a comment and summarize in the description
- [x] Clean up tasks that don't need to be in the commit
- [x] Merge
```
# Hypothetical further optimizations
- Ditch the `netsh` subprocess in `set_ips`
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Following the removal of the return type from the callback functions in
#5839, we can now move the use of the `Callbacks` one layer up the stack
and decouple them entirely from the `Tunnel`.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Gabi <gabrielalejandro7@gmail.com>
Currently, only connlib's UDP sockets for sending and receiving STUN &
WireGuard traffic are protected from routing loops. This is was done via
the `Sockets::with_protect` function. Connlib has additional sockets
though:
- A TCP socket to the portal.
- UDP & TCP sockets for DNS resolution via hickory.
Both of these can incur routing loops on certain platforms which becomes
evident as we try to implement #2667.
To fix this, we generalise the idea of "protecting" a socket via a
`SocketFactory` abstraction. By allowing the different platforms to
provide a specialised `SocketFactory`, anything Linux-based can give
special treatment to the socket before handing it to connlib.
As an additional benefit, this allows us to remove the `Sockets`
abstraction from connlib's API again because we can now initialise it
internally via the provided `SocketFactory` for UDP sockets.
---------
Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>