Currently, the Gateway reads all nameservers from `/etc/resolv.conf` on
startup and evaluates the fastest one to use for SRV and TXT DNS queries
that are forwarded by the Client. If the machine just booted and we do
not have Internet connectivity just yet, this fails which leaves the
Gateway in state where it cannot fulfill those queries.
In order to ensure we always use the fastest one and to self-heal from
such situations, we add a 60s timer that refreshes this state.
Currently, this will **not** re-read the nameservers from
`/etc/resolv.conf` but still use the same IPs read on startup.
When the Gateway is handed an IP packet for a DNS resource that it
cannot route, it sends back an ICMP unreachable error. According to RFC
792 [0] (for ICMPv4) and RFC 4443 [1] (for ICMPv6), parts of the
original packet should be included in the ICMP error payload to allow
the sending party to correlate, what could not be sent.
For ICMPv4, the RFC says:
```
Internet Header + 64 bits of Data Datagram
The internet header plus the first 64 bits of the original
datagram's data. This data is used by the host to match the
message to the appropriate process. If a higher level protocol
uses port numbers, they are assumed to be in the first 64 data
bits of the original datagram's data.
```
For ICMPv6, the RFC says:
```
As much of invoking packet as possible without the ICMPv6 packet exceeding the minimum IPv6 MTU
```
[0]: https://datatracker.ietf.org/doc/html/rfc792
[1]: https://datatracker.ietf.org/doc/html/rfc4443#section-3.1
Presently, the network change detection on Windows is very naive and
simply emits a change event everytime _anything_ changes. We can
optimise this and therefore improve the start-up time of Firezone by:
- Filtering out duplicate events
- Filtering out network change events for our own network adapter
This reduces the number of network change events to 1 during startup. As
far as I can tell from the code comments in this area, we explicitly
send this one to ensure we don't run into a race condition whilst we are
starting up.
Resolves: #8905
ECN information is helpful to allow the congestion controllers to more
easily fine-tune their send and receive windows. When a Firezone Client
receives an IP packet where the ECN bits signal an ECN capable
transport, we mirror this bit on the UDP datagram that carries the
encrypted IP packet.
When receiving a datagram with ECN bits set, the Gateway will then apply
these bits to the decrypted IP packet and pass it along towards its
destination.
This implementation is unfortunately a bit too naive. Not all devices on
the Internet support ECN and therefore, we may receive a datagram that
has its ECN bits cleared when the ECN bits on the inner IP packet still
signal an ECN capable transport. In this case, we should _not_ override
the ECN bits and instead pass the IP packet along as is. Network devices
along the path between Gateway and Resource may still use these ECN bits
to signal congestion.
We fix this by making the `with_ecn` function on `IpPacket` private. It
is not meant to be used outside of the module. We supersede it with a
`with_ecn_from_transport` function that implements the above logic.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Whenever the Gateway is instructed to (re)create the NAT for a DNS
resource, it performs a DNS query and then overwrite the existing
entries in the NAT table. Depending on how the DNS records are defined,
this may lead to a very bad user experience where connections are cut
regularly.
In particular, if a service utilises round-robin DNS where a DNS query
only ever returns a single entry yet that entry may change as soon as
the TTL expires, all connections for this particular DNS resource for a
Client get cut.
To fix this, we now first check for active NAT sessions for a given
proxy IP and only replace it if we don't have an open NAT session. The
NAT sessions have a TTL of 1 minute, meaning there needs to be at least
1 outgoing packet from the Client every minute to keep it open.
In order to detect changes to DNS records of DNS resources, `connlib`
will recreate the DNS resource NAT whenever it receives a query for a
DNS resource. The way we implemented this was by clearing the local
state of the DNS resource NAT, which triggered us to perform the
handshake with the Gateway again upon the next packet for this resource.
The Gateway would then perform the DNS query and respond back when this
was finished.
In order to not drop any packets, `connlib` has a buffer where it keeps
the packets that are arriving in the meantime. This works reasonably
well when the connection is first set up because we are only buffering a
TCP SYN or equivalent handshake packet. Yet, when the connection is full
use, and the application just so happens to make another DNS query, we
halt the entire flow of packets until this is confirmed again. To
prevent high memory use, the buffer for this packets is constrained to
32 packets which is nowhere near enough when a connection is actively
transferring data (like a file upload).
In most cases, the DNS query on the Gateway will yield the exact same
results as because the records haven't changed. Thus, there is no reason
for us to actually halt the flow of these packets when we are
_recreating_ the DNS resource NAT. That way, this handshake happens in
parallel to the actual packet flow and does not interrupt anything in
the happy path case.
Currently, when `connlib`'s log file gets deleted, we write logs into
nirvana until the corresponding process gets restarted. This is painful
for users to do because they need to restart the IPC service or Network
Extension. Instead, we can simply check if the log file exists prior to
writing to it and re-create it if it doesn't.
Resolves: #6850
Related: #7569
Having multiple threads for reading and writing the TUN device can cause
packet re-orderings on the client. All other clients only use a single
TUN thread, so aligning this value means a more consistent behaviour of
Firezone across all platforms.
Sufficiently large receive buffers are important to sustain
high-throughput as latency increases. If the receive buffer in the
kernel is too small, packets need to be dropped on arrival.
Firefox uses 1MB in its QUIC stack [0]. `quic-go` recommends to set send
and receive buffers to 7.5 MB [1]. Power users of Firezone are likely
receiving a lot more traffic than the average Firefox user (especially
with Internet Resource activated) so setting it to 10 MB seems
reasonable. Sending packets is likely not as critical because we have
back-pressure through our system such that we will stop reading IP
packets when we cannot write to our UDP socket. The UDP socket is
sitting in a separate thread and those threads are connected with
dedicated queues which act as another buffer. However, as the data below
shows, some systems have really small send buffers which are currently
likely a speed bottleneck because we need to suspend writing so
frequently.
Assuming a 50ms latency, the bandwidth-delay product tells us that we
can (in theory) saturate a 1.6 Gbps link with a 10MB receive buffer
(assuming the OS also has large enough buffer sizes in its TCP or QUIC
stack):
```
80 Mb / 0.05s = 1600Mbps
```
Experiments and research [2] show the following:
|OS|Receive buffer (default)|Receive buffer (this PR)|Send buffer
(default)|Send buffer (this PR)|
|---|---|---|---|---|
|Windows|65KB|10MB|65KB|1MB|
|MacOS|786KB|8MB|9KB|1MB|
|Linux|212KB|212KB|212KB|212KB|
With the exception of Linux, the OSes appear to be quite generous with
how big they allow receive buffers to be. On Linux, these limit can be
changed by setting the `core.net.rmem_max` and `core.net.wmem_max`
parameters using `sysctl`.
Most of our users are on Windows and MacOS, meaning they immediately
benefit from this without having to change any system settings. Larger
client-side UDP receive buffers are critical for any "download" scenario
which is likely the majority of usecases that Firezone is used for.
On Windows, increasing this receive buffer almost doubles the throughput
in an iperf3 download test.
[0]: https://github.com/mozilla/neqo/pull/2470
[1]: https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes
[2]: https://unix.stackexchange.com/a/424381
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
When calculating preferences for candidates, `str0m` currently always
prefer IPv6 over IPv4. This is as per the ICE spec. Howver, this can
lead to sub-optimal situations when a connection ends up using a TURN
server.
TURN allows a client to allocate an IPv4 and an IPv6 address in the same
allocation. This makes it possible for e.g. an IPv4-only client to
connect to an IPv6-only peer as long as the TURN server runs in
dual-stack AND the client requests an IPv6 address in addition to an
IPv4 address with the `ADDITIONAL-ADDRESS-FAMILY` attribute.
Assume that a client sits behind symmetric NAT and therefore needs to
rely on a TURN server to communicate with its peers. The TURN server as
well as all the peers operate in dual-stack mode.
The current priority calculation will yield a communication path that
uses IPv4 to talk to the TURN server (as that is the only one available)
but due to the preference ordering of IPv6 over IPv4, will use an IPv6
path to the peer, despite the peer also supporting IPv4.
This isn't a problem per-se but makes our life unnecessarily difficult.
Our TURN servers use eBPF to efficiently deal with TURN's channel-data
messages. This however is at present only implemented for the IPv4 <>
IPv4 and IPv6 <> IPv6 path. Implementing the other paths is possible but
complicates the eBPF code because we need to also translate IP headers
between versions and not just update the source and destination IPs.
We have since patched `str0m` to extend the `Candidate::relayed`
constructor to also take a `base` address which is - similar to the
other candidate types - the address the client is sending from in order
to use this candidate. In the context of relayed candidates, this is the
address the client is using to talk to the TURN server. We can use this
information in the candidate's priority calculation to prefer candidates
that allow traffic to remain within one IP version, i.e. if the client
talks to the TURN server over IPv4, the candidate with an allocated IPv4
address will have a higher priority than the one with the IPv6 address
because we are applying a "punishment" factor as part of the
local-preference component in the priority formula.
Staying within the same IP version whilst relaying traffic allows our
TURN servers to use their eBPF kernel which results in a better UX due
to lower latency and higher throughput.
The final candidate ordering is ultimately decided by the controlling
ICE agent which in our case is the Firezone Client. Thus, we don't
necessarily need to update Gateways in order to test / benefit from
this. Building a Client with this patch included should be enough to
benefit from this change.
Related: https://github.com/algesten/str0m/pull/640
Related: https://github.com/algesten/str0m/pull/644
If another VPN has been activated on the system while Firezone is
active, Apple OSes will deactivate our configuration, and never
reactivate it.
We knew this already, and always activated the configuration when
starting during the sign in flow, but failed to also do this when
autoStarting on launch.
This PR updates ensures that during autoStart, we re-enable the
configuration as well.
Fixes#8813
Microsoft Intune's DMG provisioner currently fails unexpectedly when
trying to provision our published DMG file with the error:
> The DMG file couldn't be mounted for installation. Check the DMG file
if the error persists. (0x87D30139)
I ran the following verification commands locally, which all passed:
```
hdiutil verify -verbose <dmg>
hdiutil imageinfo -verbose <dmg>
hdiutil hfsanalyze -verbose <dmg>
hdiutil checksum -type SHA256 -verbose <dmg>
hdiutil info -verbose
hdiutil pmap -verbose <dmg>
```
So the issue appears to be most likely that Intune doens't like the
`/Applications` shortcut in the DMG. This is a UX feature to make it
easy to drag the application the /Applications folder upon opening the
DMG.
So we're publishing an PKG in addition to the DMG, which should be a
more reliable artifact for MDMs to use.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
In #8480, we changed the location that `firezone-gateway` gets
downloaded to but forgot to update the knowledgebase with the new path.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Correctly implementing asynchronous IO is notoriously hard. In order to
not drop packets in the process, one has to ensure a given socket is
ready to accept packets, buffer them if it is not case, suspend
everything else until the socket is ready and then continue.
Until now, we did this because it was the only option to run the UDP
sockets on the same thread as the actual packet processing. That in turn
was motivated by wanting to pass around references of the received
packets for processing. Rust's borrow-checker does not allow to pass
references between threads which forced us to have the sockets on the
same thread as the packet processing.
Like we already did in other places in `connlib`, this can be solved
through the use of buffer pools. Using a buffer pool, we can use heap
allocations to store the received packets without having to make a new
allocation every time we read new packets. Instead, we can have a
dedicated thread that is connected to `connlib`'s packet processing
thread via two channels (one for inbound and one for outbound packets).
These channels are bounded, which ensures backpressure is maintained in
case one of the two threads lags behind. These bounds also mean that we
have at most N buffers from the buffer pool in-flight (where N is the
capacity of the channel).
Within those dedicated threads, we can then use `async/await` notation
to suspend the entire task when a socket isn't ready for sending.
Resolves: #8000
We are currently naively chunking our buffer into `segment_size *
max_gso_segments()`. `max_gso_segments` is by default 64. Assuming we
processed several IP packets, this would quickly balloon to a size that
the kernel cannot handle. For example, during an `iperf3` run, we
receive _a lot_ of packets at maximum MTU size (1280). With the overhead
that we are adding to the packet, this results in a UDP payload size of
1320.
```
1320 x 64 = 84480
```
That is way too large for the kernel to handle and it will fail the
`sendmsg` call with `EMSGSIZE`. Unfortunately, this error wasn't
surfaced because `quinn_udp` handles it internally because it can also
happen as a result of MTU probes.
We've already patched `quinn_udp` in the past to move the handling of
more quinn-specific errors to the infallible `send` function. The same
is being done for this error in
https://github.com/quinn-rs/quinn/pull/2199.
Resolves: #8699
At present, the Gateway implements a NAT64 conversion that can convert
IPv4 packets to IPv6 and vice versa. Doing this efficiently creates a
fair amount of complexity within our `ip-packet` crate. In addition,
routing ICMP errors back through our NAT is also complicated by this
because we may have to translate the packet embedded in the ICMP error
as well.
The NAT64 module was originally conceived as a result of the new stub
resolver-based DNS architecture. When the Client resolves IPs for a
domain, it doesn't know whether the domain will actually resolve to IPv4
AND IPv6 addresses so it simply assigns 4 of each to every domain. Thus,
when receiving an IPv6 packet for such a DNS resource, the Gateway may
only have IPv4 addresses available and can therefore not route the
packet (unless it translates it).
This problem is not novel. In fact, an IP being unroutable or a
particular route disappearing happens all the time on the Internet. ICMP
was conceived to handle this problem and it is doing a pretty good job
at it. We can make use of that and simply return an ICMP unreachable
error back to the client whenever it picks an IP that we cannot map to
one that we resolved.
In this PR, we leave all of the NAT64 code intact and only add a
feature-flag that - when active - sends aforementioned ICMP error. While
offline (and thus also for our tests), the feature-flag evaluates to
false. It is however set to `true` in the backend, meaning on staging
and later in production, we will send these ICMP errors.
Once this is rolled out and indeed proving to be working as intended, we
can simplify our codebase and rip out the NAT64 module. At that point,
we will also have to adapt the test-suite.
This is a regression introduced in c9f085c102. The `status` at this
point is still `nil` because we have not yet fully subscribed to VPN
status change updates from the system.
That actually shouldn't prevent us from trying to start the tunnel
anyway. If the `token` is missing from the Keychain, the tunnel process
will no-op. So we simply try to start a session on launch always.
Fixes#8456
Before, we would receive an `NSError` object and the type-matching
wouldn't take effect at all, causing the default alert to show every
time. This solves that by introducing a `UserFriendlyError` protocol
which is more robust against the two main `Error` and `NSError`
variants.
Whenever a Resource's name, address_description, or assigned sites
change, it is not currently reflected in clients. For that to happen the
address is changed.
This PR updates that behavior so that if any display fields are changed,
the `on_update_resources` callback is called which properly updates the
resource list views in clients.
Fixes#8284