With the latest version of `proptest-state-machine`, we no longer need
to use their traits because `Sequential::new` is now exposed. This makes
the overall things less magical because there is less indirection.
This incorporates the feedback from #6939 after a discussion with
@conectado. We agreed that the protocol should be more event-based,
where each message has its own event type. Events MAY appear in pairs or
other cardinality combinations, meaning semantically they could be seen
as requests and responses. In general though, due to the unreliable
nature of IP, it is better to view them as events. Events are typically
designed to be idempotent which is important to make this protocol work.
Using events also means it is not as easy to fall into the "trap" of
modelling requests / responses on the control protocol level.
To correctly handle overlapping CIDR resources, we need to recompute
which ones are active. The `RoamClient` transition was missing that
despite the system-under-test actually doing that bit.
Adding the necessary function call fixes the two regression seeds
detected in CI.
Resolves: #6924.
Do we want to track 401s in sentry? If we see a lot of them, something
is likely wrong but I guess there is some level of 401s that users will
just run into.
Is there a way of marking these as "might not be a really bad error"?
---------
Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>
At present, `connlib` utilises the portal as a signalling layer for any
kind of control message that needs to be exchanged between clients and
gateways. For anything regard to connectivity, this is crucial: Before
we have a direct connection to the gateway, we don't really have a
choice other than using the portal as a "relay" to e.g. exchange address
candidates for ICE.
However, once a direct connection has been established, exchanging
information directly with the gateway is faster and removes the portal
as a potential point of failure for the data plane.
For DNS resources, `connlib` intercepts all DNS requests on the client
and assigns its own IPs within the CG-NAT range to all domains that are
configured as resources. Thus, all packets targeting DNS resources will
have one of these IPs set as their destination. The gateway needs to
learn about all the IPs that have been assigned to a certain domain by
the client and perform NAT. We call this concept "DNS resource NAT".
Currently, the domain + the assigned IPs are sent together with the
`allow_access` or `request_connection` message via the portal. The new
control protocol defined in #6732 purposely excludes this information
and only authorises traffic to the entire resource which could also be a
wildcard-DNS resource.
To exchange the assigned IPs for a certain domain with the gateway, we
introduce our own p2p control protocol built on top of IP. All control
protocol messages are sent through the tunnel and thus encrypted at all
times. They are differentiated from regular application traffic as
follows:
- IP src is set to the unspecified IPv6 address (`::`)
- IP dst is set to the unspecified IPv6 address (`::`)
- IP protocol is set to reserved (`0xFF`)
The combination of all three should never appear as regular traffic.
To ensure forwards-compatibility, the control protocol utilises a fixed
8-byte header where the first byte denotes the message kind. In this
current design, there is no concept of a request or response in the
wire-format. Each message is unidirectional and the fact that the two
messages we define in here appear in tandem is purely by convention. We
use the IPv6 payload length to determine the total length of the packet.
The payloads are JSON-encoded. Message types are free to chose whichever
encoding they'd like.
This protocol is sent through the WireGuard tunnel, meaning we are
effectively limited by our device MTU of 1280, otherwise we'd have to
implement fragmentation. For the messages of setting up the DNS resource
NAT, we are below this limit:
- UUIDs are 16 bytes
- Domain names are at most 255 bytes
- IPv6 addresses are 16 bytes * 4
- IPv4 addressers are 4 bytes * 4
Including the JSON serialisation overhead, this results in a total
maximum payload size of 402 bytes, which is well below our MTU.
Finally, another thing to consider here is that IP is unreliable,
meaning each use of this protocol needs to make sure that:
- It is resilient against message re-ordering
- It is resilient against packet loss
The details of how this is ensured for setting up the DNS resource NAT
is left to #6732.
One of the key differences of the new control protocol designed in #6461
is that creating new connections is idempotent. We achieve this by
having the portal generate the ICE credentials and the preshared-key for
the WireGuard tunnel. As long as the ICE credentials don't change, we
don't need to make a new connection.
For `snownet`, this means we are deprecating the previous APIs for
making connections. The client-side APIs will have to stay around until
we merge the client-part of the new control protocol. The server-side
APIs will have to stay around until we remove backwards-compatibility
from the gateway.
As part of #6732, we will be using the tunnel for the p2p control
protocol to setup the DNS resource NAT on the gateway. These messages
will be immediately queued after creating the connection, before ICE is
finished. In order to avoid additional retries within `firezone_tunnel`,
we directly attempt to encapsulate these packets.
`boringtun` internally has a buffer for these because prior to having a
WireGuard session, we don't actually have the necessary keys to encrypt
a packet. Thus, the packet passed here won't actually end up in the
referenced buffer of the `Connecting` state. Instead, if we haven't
already initiated a WireGuard handshake, attempting to encapsulate a
packet will trigger a WireGuard handshake initiation and _that_ is the
packet we need to buffer.
Dropping this packet would require us to wait until `boringtun`
retransmits it as part of the handshake timeout, causing an unnecessary
delay in the connection setup.
We call the `add_ips_with_resource` function with a list of `IpAddr` or
`IpNetwork`s. To make this more ergonomic for the caller, we can accept
an iterator that converts the items on the fly.
To reduce the TTFB, we immediately force a WireGuard handshake as soon
as ICE completes. Currently, both the client and the gateway do this
which is somewhat counter-productive as there can only be one active
session.
With this patch, only the client will force a new WireGuard handshake as
soon as ICE completes.
With the new control protocol specified in #6461, the client will no
longer initiate new connections. Instead, the credentials are generated
deterministically by the portal based on the gateway's and the client's
public key. For as long as they use the same public key, they also have
the same in-memory state which makes creating connections idempotent.
What we didn't consider in the new design at first is that when clients
roam, they discard all connections but keep the same private key. As a
result, the portal would generate the same ICE credentials which means
the gateway thinks it can reuse the existing connection when new flows
get authorized. The client however discarded all connections (and
rotated its ports and maybe IPs), meaning the previous candidates sent
to the gateway are no longer valid and connectivity fails.
We fix this by also rotating the private keys upon reset. Rotating the
keys itself isn't enough, we also need to propagate the new public key
all the way "over" to the phoenix channel component which lives
separately from connlib's data plane.
To achieve this, we change `PhoenixChannel` to now start in the
"disconnected" state and require an explicit `connect` call. In
addition, the `LoginUrl` constructed by various components now acts
merely as a "prototype", which may require additional data to construct
a fully valid URL. In the case of client and gateway, this is the public
key of the `Node`. This additional parameter needs to be passed to
`PhoenixChannel` in the `connect` call, thus forming a type-safe
contract that ensures we never attempt to connect without providing a
public key.
For the relay, this doesn't apply.
Lastly, this allows us to tidy up the code a bit by:
a) generating the `Node`'s private key from the existing RNG
b) removing `ConnectArgs` which only had two members left
Related: #6461.
Related: #6732.
Following up from #6919, this PR introduces a dedicated, internal model
for resources as to how the client uses them. This separation serves
several purposes:
1. It allows us to introduce an `Unknown` resource type, ensuring
forwards-compatibility with future resource types.
2. It allows us to remove trait implementations like `PartialEq` or
`PartialOrd` from the message types. With #6732, the messages will
include types like `SecretKey`s which cannot be compared.
3. A decoupling of serialisation and domain models is good practice in
general and has long been overdue.
The `connlib-shared` crate has become a bit of a dependency magnet
without a clear purpose. It hosts utilities like `get_user_agent`,
messages for the client and gateway to communicate with the portal and
domain types like `ResourceId`.
To create a better dependency structure in our workspace, we repurpose
`connlib-shared` as a `connlib-model` crate. Its purpose is to host
domain-specific model types that multiple crates may want to use. For
that purpose, we rename the `callbacks::ResourceDescription` type to
`ResourceView`, designating that this is a _view_ onto a resource as
seen by `connlib`. The message types which currently double up as
connlib-internal model thus become an implementation detail of
`firezone-tunnel` and shouldn't be used for anything else.
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
When debugging DNS-related issues, it is useful to see all DNS queries
that go into `connlib` and the responses that we generate. Analogous to
the `wire::net` and `wire::dev` TRACE logs, we introduce `wire::dns`
which logs incoming queries and the responses on TRACE. The output looks
like this:
```
2024-10-02T00:16:47.522847Z TRACE wire::dns::qry: A caldav.fastmail.com qid=55845
2024-10-02T00:16:47.522926Z TRACE wire::dns::qry: AAAA caldav.fastmail.com qid=56277
2024-10-02T00:16:47.531347Z TRACE wire::dns::res: AAAA caldav.fastmail.com => [] qid=56277 rcode=NOERROR
2024-10-02T00:16:47.538984Z TRACE wire::dns::res: A caldav.fastmail.com => [103.168.172.46 | 103.168.172.61] qid=55845 rcode=NOERROR
2024-10-02T00:16:47.580237Z TRACE wire::dns::qry: HTTPS cloudflare-dns.com qid=21518
2024-10-02T00:16:47.580338Z TRACE wire::dns::qry: A example.org qid=35459
2024-10-02T00:16:47.580364Z TRACE wire::dns::qry: AAAA example.org qid=60073
2024-10-02T00:16:47.580699Z TRACE wire::dns::qry: AAAA ipv4only.arpa qid=17280
2024-10-02T00:16:47.580782Z TRACE wire::dns::qry: A ipv4only.arpa qid=47215
2024-10-02T00:16:47.581134Z TRACE wire::dns::qry: A detectportal.firefox.com qid=34970
2024-10-02T00:16:47.581261Z TRACE wire::dns::qry: AAAA detectportal.firefox.com qid=39505
2024-10-02T00:16:47.609502Z TRACE wire::dns::res: AAAA example.org => [2606:2800:21f:cb07:6820:80da:af6b:8b2c] qid=60073 rcode=NOERROR
2024-10-02T00:16:47.609640Z TRACE wire::dns::res: AAAA ipv4only.arpa => [] qid=17280 rcode=NOERROR
2024-10-02T00:16:47.610407Z TRACE wire::dns::res: A ipv4only.arpa => [192.0.0.170 | 192.0.0.171] qid=47215 rcode=NOERROR
2024-10-02T00:16:47.617952Z TRACE wire::dns::res: HTTPS cloudflare-dns.com => [1 alpn=h3,h2 ipv4hint=104.16.248.249,104.16.249.249 ipv6hint=2606:4700::6810:f8f9,2606:4700::6810:f9f9] qid=21518 rcode=NOERROR
2024-10-02T00:16:47.631124Z TRACE wire::dns::res: A example.org => [93.184.215.14] qid=35459 rcode=NOERROR
2024-10-02T00:16:47.640286Z TRACE wire::dns::res: AAAA detectportal.firefox.com => [detectportal.prod.mozaws.net. | prod.detectportal.prod.cloudops.mozgcp.net. | 2600:1901:0:38d7::] qid=39505 rcode=NOERROR
2024-10-02T00:16:47.641332Z TRACE wire::dns::res: A detectportal.firefox.com => [detectportal.prod.mozaws.net. | prod.detectportal.prod.cloudops.mozgcp.net. | 34.107.221.82] qid=34970 rcode=NOERROR
2024-10-02T00:16:48.737608Z TRACE wire::dns::qry: AAAA myfiles.fastmail.com qid=52965
2024-10-02T00:16:48.737710Z TRACE wire::dns::qry: A myfiles.fastmail.com qid=5114
2024-10-02T00:16:48.745282Z TRACE wire::dns::res: AAAA myfiles.fastmail.com => [] qid=52965 rcode=NOERROR
2024-10-02T00:16:49.027932Z TRACE wire::dns::res: A myfiles.fastmail.com => [103.168.172.46 | 103.168.172.61] qid=5114 rcode=NOERROR
2024-10-02T00:16:49.190054Z TRACE wire::dns::qry: HTTPS github.com qid=64696
2024-10-02T00:16:49.190171Z TRACE wire::dns::qry: A github.com qid=11912
2024-10-02T00:16:49.190502Z TRACE wire::dns::res: A github.com => [100.96.0.1 | 100.96.0.2 | 100.96.0.3 | 100.96.0.4] qid=11912 rcode=NOERROR
2024-10-02T00:16:49.190619Z TRACE wire::dns::qry: A github.com qid=63366
2024-10-02T00:16:49.190730Z TRACE wire::dns::res: A github.com => [100.96.0.1 | 100.96.0.2 | 100.96.0.3 | 100.96.0.4] qid=63366 rcode=NOERROR
```
As with the other filters, seeing both queries and responses can be
achieved with `RUST_LOG=wire::dns=trace`. If you are only interested in
the responses, you can activate a more specific log filter using
`RUST_LOG=wire::dns::res=trace`. All responses also print the original
query that they are answering.
Resolves: #6862.
Our relays are essential for connectivity because they also perform STUN
for us through which we learn our server-reflexive address. Thus, we
must at all times have at least one relay that we can reach in order to
establish a connection.
The portal tracks the connectivity to the relays for us and in case any
of them go down, sends us a `relays_presence` message, meaning we can
stop using that relay and migrate any relayed connections to a new one.
This works well for as long as we are connected to the portal while the
relay is rebooting / going-down. If we are not currently connected to
the portal and a relay we are using reboots, we don't learn about it. At
least if we are actively using it, the connection will fail and further
attempted communication with the relay will time-out and we will stop
using it.
In case we aren't currently using the relay, this gets a bit trickier.
If we aren't using the relay but it rebooted while we were partitioned
from the portal, logging in again might return the same relay to us in
the `init` message, but this time with different credentials.
The first bug that we are fixing in this PR is that we previously
ignored those credentials because we already knew about the relay,
thinking that we can still use our existing credentials. The fix here is
to also compare the credentials and ditch the local state if they
differ.
The second bug identified during fixing the first one is that we need to
pro-actively probe whether all other relays that we know about are
actually still responsive. For that, we issue a `REFRESH` message to
them. If that one times-out or fails otherwise, we will remove that one
from our list of `Allocation`s too.
To fix the 2nd bug, several changes were necessary:
1. We lower the log-level of `Disconnecting from relay` from ERROR to
WARN. Any ERROR emitted during a test-run fails our test-suite which is
what partially motivated this. The test suite builds on the assumption
that ERRORs are fatal and thus should never happen during our tests.
This change surfaces that disconnecting from a relay can indeed happen
during normal operation, which justifies lowering this to WARN. Users
should at the minimum monitor on WARN to be alerted about problems.
2. We reduce the total backoff duration for requests to relays from 60s
to 10s. The current 60s result in total of 8 retries. UDP is unreliable
but it isn't THAT unreliable to justify retrying everything for 60s. We
also use a 10s timeout for ICE, which means these are now aligned to
better match each other. We had to change the max backoff duration
because we only idle-spin for at most 10s in the tests and thus the
current 60s were too long to detect that a relay actually disappeared.
3. We had to shuffle around some function calls to make sure all
intermediary event buffers are emptied at the right point in time to
make the test deterministic.
Fixes: #6648.
We want to associate additional device information for the device
verification, these are all parameters that tries to uniquely identify
the hardware.
For that reason we read system information and send it as part of the
query params to the portal, that way the portal can store this when
device is verified and match against that later on.
These are the parameters according to each platform:
|Platform|Query Field|Field Meaning|
|-----|----|-----|
|MacOS|`device_serial`|Hardware's Serial|
|MacOS| `device_uuid`|Hardware's UUID|
|iOS|`identifier_for_vendor`| Identifier for vendor, resets only on
uninstall/install|
|Android|`firebase_installation_id`| Firebase installation ID, resets
only on uninstall/install|
|Windows|`device_serial`|Motherboard's Serial|
|Linux|`device_serial`|Motherboard's Serial|
Fixes#6837
When relays reboot or get redeployed, the portal sends us new relays to
use and or relays we should discontinue using. To be more efficient with
battery and network usage, `connlib` only ever samples a single relay
out of all existing ones for a particular connection.
In case of a network topology where we need to use relays, there are
situations we can end up in:
- The client connects to the gateway's relay, i.e. to the port the
gateway allocated on the relay.
- The gateway connects to the client's relay, i.e to the port the client
allocated on the relay.
When we detect that a relay is down, the party that allocated the port
will now immediately (once #6666 is merged). The other party needs to
wait until it receives the invalidated candidates from its peer.
Invalidating that candidate will also invalidate the currently nominated
socket and fail the connection. In theory at least. That only works if
there are no other candidates available to try.
This is where this patch becomes important. Say we have the following
setup:
- Client samples relay A.
- Gateway samples relay B.
- The nominated candidate pair is "client server-reflexive <=> relay B",
i.e. the client talks to the allocated port on the gateway.
Next:
1. Client and portal get network-partitioned.
2. Relay B disappears.
3. Relay C appears.
4. Relay A reboots.
5. Client reconnects.
At this point, the client is told by the portal to use relays A & C.
Note that relay A rebooted and thus the allocation previously present on
the client is no longer valid. With #6666, we will detect this by
comparing credentials & IPs. The gateway is being told about the same
relays and as part of that, tests that relay B is still there. It learns
that it isn't, invalidates the candidates which fails the connection to
the client (but only locally!).
Meanwhile, as part of the regular `init` procedure, the client made a
new allocation with relays A & C. Because it had previously selected
relay A for the connection with the gateway, the new candidates are
added to the agent, forming new pairs. The gateway has already given up
on this connection however so it won't ever answer these STUN requests.
Concurrently, the gateway's invalidated candidates arrive the client.
They however don't fail the connection because the client is probing the
newly added candidates. This creates a state mismatch between the client
and gateway that is only resolved after the candidates start timing out,
adding an additional delay during which the connection isn't working.
With this PR, we prevent this from happening by only ever adding new
candidates while we are still in the nomination process of a socket. In
theory, there exists a race condition in which we nominate a relay
candidate first and then miss out on a server-reflexive candidate not
being added. In practice, this won't happen because:
- Our host candidates are always available first.
- We learn server-reflexive candidates already as part of the initial
BINDING, before creating the allocation.
- We learn server-reflexive candidates from all relays, not just the one
that has been assigned.
Related: #6666.
When constructing a span, any currently set span will automatically be
set as the parent. In the case of the `connection` span, this was the
`accept_answer` or `new_connection` span from the client / gateway.
Those are not meant to be re-activated every time we enter the
`connection` span.
By setting an explicit parent, we avoid that.
Unfortunately, this means that this span will never have a parent, even
if other spans are active whilst we enter this one. We enter this one in
the hot-path, which is why it is being constructed ahead of time.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Within `snownet` - `connlib`'s connectivity library - we use ICE to set
up a UDP "connection" between a client and a gateway. UDP is an
unreliable transport, meaning the only way how can detect that the
connection is broken is for both parties to constantly send messages and
acknowledgements back and forth. ICE uses STUN binding requests for
this.
In the default configuration of `str0m`, a STUN binding is sent every
3s, and we tolerate at most 9 missing responses before we consider the
connection broken. As these responses go missing, `str0m` halves this
interval, which results in a total ICE timeout of around 17 seconds. We
already tweak these values by reducing the number of requests to 8 and
setting the interval to 1.5s. This results in a total ICE timeout of
~10s which effectively means that there is at most a 10s lag between the
connection breaking and us considering it broken at which point new
packets arriving at the TUN interface can trigger the setup of a new
connection with the gateway.
Lowering these timeouts improves the user experience in case of a broken
connection because the user doesn't have to wait as long before they can
access their resources again. The downside of lowering these timeouts is
that we generate a lot of background noise. Especially on mobile
devices, this is bad because it prevents the CPU from going to sleep and
thus simply being signed into Firezone will drain your battery, even if
you don't use it.
Note that this doesn't apply at all if the client application on top
detects a network change. In that case, we hard-reset all connections
and instantly create new ones.
We attempted to fix this in #5576 by closing idle connections after 5
minutes. This however created new problems such as #6778.
The original problem here is that we send too many STUN messages as soon
as a connection is established. Simply increasing the timeout is not an
option because it would make the user experience really bad in case the
connection actually drops for reasons that the client app can't detect.
In this patch, we attempt to solve this in a different way: Detecting a
broken connection is only critical if the user is actively using the
tunnel (i.e. sending traffic). If there is no traffic, it doesn't matter
if we need longer to detect a broken connection. The user won't notice
because their phone is probably in their pocket or something.
With this patch, we now implement the following behaviour:
- A connection is considered idle after 10s of no application traffic.
- On idle connections, we send a STUN requests every 60s
- On idle connections, we wait for at most 4 missing responses before
considering the connection broken.
- Every connection will perform a client-initiated WireGuard keep-alive
every 25s, unless there is application traffic.
These values have been chosen while considering the following sources:
1. [RFC4787,
REQ-5](https://www.rfc-editor.org/rfc/rfc4787.html#section-12) requires
NATs to keep UDP NAT mappings alive for at least 2 minutes.
2.
[`conntrack`](https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.rst)
adopts this requirement via the `nf_conntrack_udp_timeout_stream`
configuration.
3. 25s is the default keep-alive of the WireGuard kernel module.
In theory the WireGuard keep-alive itself should be good enough to keep
all NAT bindings alive. In practice, missed keep-alives are not exposed
by boringtun (the WireGuard implementation we rely on) and thus we need
the additional STUN keep-alives to detect broken connections. We set
those somewhat conservatively to 60s.
As soon as the user triggers new application traffic, these values are
reverted back to their defaults, meaning even if the connection died
just before the user is starting to use it again, we will know within
the usual 10s because we are triggering new STUN requests more often.
Note that existing gateways still implement the "close idle connections
after 5 minutes" behaviour. Customers will need to upgrade to a new
gateway version to fully benefit from these new always-on, low-power
connections.
Resolves: #6778.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Refs #6138
Sentry is always enabled for now. In the near future we'll make it
opt-out per device and opt-in per org (see #6138 for details)
- Replaces the `crash_handling` module
- Catches panics in GUI process, tunnel daemon, and Headless Client
- Added a couple "breadcrumbs" to play with that feature
- User ID is not set yet
- Environment is set to the API URL, e.g. `wss://api.firezone.dev`
- Reports panics from the connlib async task
- Release should be automatically pulled from the Cargo version which we
automatically set in the version Makefile
Example screenshot of sentry.io with a caught panic:
<img width="861" alt="image"
src="https://github.com/user-attachments/assets/c5188d86-10d0-4d94-b503-3fba51a21a90">
In the past, we struggled a lot of the reproducibility of `tunnel_test`
failures because our input state and transition strategies were not
deterministic. In the end, we found out that it was due to the iteration
order of `HashMap`s.
To make sure this doesn't regress, we added a check to CI at the time
that compares the debug output of all regression seeds against a 2nd run
and ensures they are the same. That is overall a bit wonky.
We can do better by simple sampling a value from the strategy twice from
a test runner with the same seed. If the strategy is deterministic,
those need to be the same. We still rely on the debug output being
identical because:
a. Deriving `PartialEq` on everything is somewhat cumbersome
b. We actually care about the iteration order which a fancy `PartialEq`
implementation might ignore
At the moment, the mapping of proxy IPs to the resolved IPs of a DNS
resource happens at the same time as the "authorisation" that the client
is allowed to talk to that resource. This is somewhat convoluted
because:
- Mapping proxy IPs to resolved IPs only needs to happen for DNS
resources, yet it is called for all resources (and internally skipped).
- Wildcard DNS resources only need to be authorised once, after which
the client is allowed to communicate with any domain matching the
wildcard address.
- The code that models resources within `ClientOnGateway` doesn't
differentiate between resource types at all.
With #6461, the authorisation of a resource will be completely decoupled
from the domain resolution for a particular domain of a DNS resource. To
make that easier to implement, we re-model the internals of
`ClientOnGateway` to differentiate the various resource types. Instead
of holding a single vec of addresses, the IPs are now indexed by the
respective domain. For CIDR resources, we only hold a single address
anyway and for the Internet Resource, the IP networks are static.
This new model now implies that allowing a resource that has already
been allowed essentially implies an update and the filters get
re-calculated.
Firefox uses this so-called canary domain `use-application-dns.net` to
detect, whether it should use DoH for its DNS queries. If answered with
a server error or without records, Firefox disables DoH as long as it
only its "Default protection" is enabled. If a user forces DoH, this
hint from the network is ignored.
See
https://support.mozilla.org/en-US/kb/canary-domain-use-application-dnsnet
for details.
I tested this on MacOS and Firefox does indeed instantly disable DoH. A
default installation of Chrome doesn't use DoH for me.
Related: #6375.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Currently, `connlib` is entirely single-threaded. This allows us to
reuse a single buffer for processing IP packets and makes reasoning of
the packet processing code very simple. Being single-threaded also means
we can only make use of a single CPU core and all operations have to be
sequential.
Analyzing `connlib` using `perf` shows that we spend 26% of our CPU time
writing packets to the TUN interface [0]. Because we are
single-threaded, `connlib` cannot do anything else during this time. If
we could offload the writing of these packets to a different thread,
`connlib` could already process the next packet while the current one is
writing.
Packets that we send to the TUN interface arrived as an encrypted WG
packet over UDP and get decrypted into a - currently - shared buffer.
Moving the writing to a different thread implies that we have to have
more of these buffer that the next packet(s) can be decrypted into.
To avoid IP fragmentation, we set the maximum IP MTU to 1280 bytes on
the TUN interface. That actually isn't very big and easily fits into a
stackframe. The default stack size for threads is 2MB [1].
Instead of creating more buffers and cycling through them, we can also
simply stack-allocate our IP packets. This incurs some overhead from
copying packets but it is only ~3.5% [2] (This was measured without a
separate thread). With stack-allocated packets, almost all
lifetime-annotations go away which in itself is already a welcome
ergonomics boost. Stack-allocated packets also means we can simply spawn
a new thread for the packet processing. This thread is connected with
two channel to connlib's main thread. The capacity of 1000 packets will
at most consume an additional 3.5 MB of memory which is fine even on our
most-constrained devices such as iOS.
[0]: https://share.firefox.dev/3z78CzD
[1]: https://doc.rust-lang.org/std/thread/#stack-size
[2]: https://share.firefox.dev/3Bf4zlaResolves: #6653.
Resolves: #5541.
Currently, the order in which connlib matches against the patterns of
DNS resources is not specified. We simply iterate over all patterns and
take the first one that matches. Due to the iteration order of
`HashMap`s, this also isn't deterministic.
With this patch, we introduce a defined order in which we attempt to
match a particular domain against the defined DNS resources:
- Resources without wildcards are always prioritised over wildcard
domains
- Single-char wildcards (`?`) take priority over label wildcards (`*`)
- Label wildcards (`*`) take priority over catch-all wildcards (`**`)
By matching against the DNS resources in a defined order, we ensure that
DNS resources that overlap always resolve to the most specific resource.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
This log is currently printed after we receive the `init` message from
the client. It is a left-over from early days of connlib where receiving
`init` itself already triggered all kinds of actions.
These days, we are mostly just updating state. In addition, `init` can
be received multiple times during a client's session which is somewhat
confusing when you see multiple "Firezone started" logs.
Fix#6781Fix#6375
The problem was that browsers in iOS(and possible other OSes) queries
for A, AAAA and HTTPS, and we correctly intercept A and AAAA.
Correctly intercepting HTTPS queries is more tricky since we need the
server's alpn, before this PR we were just forwarding those and then the
response back but the problem with that is that it'd return the real IP
for the service instead of our proxy IP.
So to quickly fix this we simply blackhole the query so the browser
never use that response.
In the future an improvement over this would be to intercept the
response instead of the query and mangle the ips there.
---------
Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Previously, the type signatures around parsing DNS queries were kind of
weird and hard to follow. Instead of nesting `Option` and `Result`, we
now use the `ControlFlow` type which is more expressive in what we
expect the caller to do with the return value.
We reserve an IP range _within_ the CG-NAT range for the sentinel DNS
servers. It is unnecessary to explicitly add that one as a route because
it is already covered by the routing entry of the entire CG-NAT range.
When encountering a PTR query, `connlib` checks if the query is for a
Firezone-managed resource and resolve it to the correct IP. If it isn't
for a DNS resource, we should forward the query to the upstream
resolver.
This isn't what is currently happening though. Instead of forwarding the
query, we bail early from `StubResolver::handle` and thus attempt to
route the packet through the tunnel. This however fails because the DNS
query was targeted at `connlib`'s stub resolver address which never
corresponds to a resource IP.
When TRACE logs where activated, this resulted in several entries such
as
> Unknown resource dst=100.100.111.1
To ensure this doesn't regress, we now generate PTR and MX record
queries in `tunnel_test`. We don't assert the response of those but we
do assert that we always get a response. The inclusion of MX records
asserts that unknown query types get correctly forwarded.
Resolves: #6749.
There were 3 problems currently on main, one on the tests and the actual
bug.
## Test problem
The routes were kept in a `BTreeSet` that when a new route was added it
was `insert`ed into and when it was removed it was `remove`d from using
the address of the route.
The problem is if there were overlapping route added twice in a row and
then a single one of those resources is removed the test would believe
the route no longer exists.
## Test solution
Keep the routes in a `BTreeMap` which maps the id to the ip and then we
calculate the routes based on that combined with the default routes,
that way we just remove the ID and the routes are kept in the correct
expected state.
## Real bug
So fixing this revealed a similar bug in connlib, since we kept things
in a similar struct, `active_cidr_resources` using `IpNetworkTable`.
To fix this I re-calculate the whole table each time we add/remove a
resource.
Note that this really doesn't properly fixes overlapping routes, this is
just helpful to fix the test, to fix them we need #4789
Furthermore, fixing these issues revealed an additional problem,
whenever we add an overlapping CIDR resource the old resource might be
overridden, causing the connection to be lost, furthermore this happened
in a non-deterministic(it's deterministic really but not explicit) way
causing the tests to fail.
To fix this we always sort resources by ID(it's an arbitrary order to
keep consistency with the proptests) and then we don't replace the
routing for resources that already had a connection.
Sadly, to model this in the test I had to almost copy exactly how we
calculate resources in connlib.
Fixes#6721
---------
Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Merging #6708 had an unintended side-effect that we are seeing a lot of
WARN logs from phoenix-channel because we can no longer parse the
response from gateways. We didn't do anything with these responses but
gateways are sending them for backwards-compatibility reasons.
To not confuse ourselves while debugging, we revert the client-side bit
of #6708 to remove these warnings.
Currently, checking whether a packet is a DNS query has multiple silent
exit paths. This makes it DNS problems difficult to debug because the
packets will be treated as if they have to get routed through the
tunnel.
This is also something we should fix but that isn't done in this PR: If
we know that a packet is for connlib's DNS stub resolver, we should
never route it through the tunnel. Currently, this isn't possible to
express with the type signature of our DNS module and requires more
refactoring.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
DNS queries for non-resources get forwarded by connlib byte-for-byte and
only the IP & UDP headers gets rewritten. The responses for some DNS
queries don't fit into UDP packets. To signal this, DNS servers set the
TC (truncated) bit on the response.
If set, most clients will discard the response and requery over TCP. DNS
over TCP is not yet supported in connlib, meaning any query response
that has the TC bit set will most likely lead to DNS issues downstream.
To make these cases easier to debug, we log a warning whenever we
encounter a DNS response from an upstream server that that has the TC
bit set.
Prior to version 1.1.0, clients did not have an embedded DNS resolver
and relied on the gateway for DNS resolution. In that design, the
gateway responded with the IPs that the domain resolved to.
Our next iteration of the control protocol (#6461) will decouple the
details of how DNS works from the flow-authorization. As a result, we
will need to be able to establish a flow for a DNS resource without
knowing which concrete domain the client is going to access.
Without a concrete domain, we cannot send anything back to these old
clients, meaning we unfortunately have to break compatibility with <
1.1.0 clients as part of implementing the new control protocol.
This only happens once for each domain, so logging this on DEBUG should
be fine. It is disabled by default and shouldn't be too spammy during
development.
The `expect` attribute is similar to `allow` in that it will silence a
particular lint. In addition to `allow` however, `expect` will fail as
soon as the lint is no longer emitted. This ensures we don't end up with
stale `allow` attributes in our codebase. Additionally, it provides a
way of adding a `reason` to document, why the lint is being suppressed.
In #6592, we started tracking our connected gateways to correctly model,
which packets get dropped as part of establishing new connections. We
forgot to clear this when connections are being reset, causing some test
failures.
As the final step in removing `pnet_packet`, we need to introduce `-Mut`
equivalent slices for UDP, TCP and ICMP packets. As a starting point,
introducing `UpdHeaderSliceMut` and `TcpHeaderSliceMut` is fairly
trivial. The ICMP variants are a bit trickier because those are
different for IPv4 and IPv6. Additionally, ICMP for IPv4 is quite
complex because it can have a variable header length. Additionally. for
both variants, the values in byte range 5-8 are semantically different
depending on the ICMP code.
This requires us to design an API that balances ergonomics and
correctness. Technically, an ICMP identifier and sequence can only be
set if the ICMP code is "echo request" or "echo reply". However, adding
an additional parsing step to guarantee this in the type system is quite
verbose.
The trade-off implemented in this PR allows to us to directly write to
the byte 5-8 using the `set_identifier` and `set_sequence` functions. To
catch errors early, this functions have debug-assertions built in that
ensure that the packet is indeed an ICMP echo packet.
Resolves: #6366.
Currently, we have two structs for representing IP packets: `IpPacket`
and `MutableIpPacket`. As the name suggests, they mostly differ in
mutability. This design was originally inspired by the `pnet_packet`
crate which we based our `IpPacket` on. With subsequent iterations, we
added more and more functionality onto our `IpPacket`, like NAT64 &
NAT46 translation. As a result of that, the `MutableIpPacket` is no
longer directly based on `pnet_packet` but instead just keeps an
internal buffer.
This duplication can be resolved by merging the two structs into a
single `IpPacket`. We do this by first replacing all usages of
`IpPacket` with `MutableIpPacket`, deleting `IpPacket` and renaming
`MutableIpPacket` to `IpPacket`. The final design now has different
`self`-receivers: Some functions take `&self`, some `&mut self` and some
consume the packet using `self`.
This results in a more ergonomic usage of `IpPacket` across the codebase
and deletes a fair bit of code. It also takes us one step closer towards
using `etherparse` for all our IP packet interaction-needs. Lastly, I am
currently exploring a performance-optimisation idea that stack-allocates
all IP packets and for that, the current split between `IpPacket` and
`MutableIpPacket` does not really work.
Related: #6366.
The lifetime of the returned packet is actually already `'static`,
meaning we don't need to call `to_owned`.
Related: #6366.
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: firezone <firezone@firezones-MacBook-Air.local>