If another VPN has been activated on the system while Firezone is
active, Apple OSes will deactivate our configuration, and never
reactivate it.
We knew this already, and always activated the configuration when
starting during the sign in flow, but failed to also do this when
autoStarting on launch.
This PR updates ensures that during autoStart, we re-enable the
configuration as well.
Fixes#8813
Microsoft Intune's DMG provisioner currently fails unexpectedly when
trying to provision our published DMG file with the error:
> The DMG file couldn't be mounted for installation. Check the DMG file
if the error persists. (0x87D30139)
I ran the following verification commands locally, which all passed:
```
hdiutil verify -verbose <dmg>
hdiutil imageinfo -verbose <dmg>
hdiutil hfsanalyze -verbose <dmg>
hdiutil checksum -type SHA256 -verbose <dmg>
hdiutil info -verbose
hdiutil pmap -verbose <dmg>
```
So the issue appears to be most likely that Intune doens't like the
`/Applications` shortcut in the DMG. This is a UX feature to make it
easy to drag the application the /Applications folder upon opening the
DMG.
So we're publishing an PKG in addition to the DMG, which should be a
more reliable artifact for MDMs to use.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
In #8480, we changed the location that `firezone-gateway` gets
downloaded to but forgot to update the knowledgebase with the new path.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Correctly implementing asynchronous IO is notoriously hard. In order to
not drop packets in the process, one has to ensure a given socket is
ready to accept packets, buffer them if it is not case, suspend
everything else until the socket is ready and then continue.
Until now, we did this because it was the only option to run the UDP
sockets on the same thread as the actual packet processing. That in turn
was motivated by wanting to pass around references of the received
packets for processing. Rust's borrow-checker does not allow to pass
references between threads which forced us to have the sockets on the
same thread as the packet processing.
Like we already did in other places in `connlib`, this can be solved
through the use of buffer pools. Using a buffer pool, we can use heap
allocations to store the received packets without having to make a new
allocation every time we read new packets. Instead, we can have a
dedicated thread that is connected to `connlib`'s packet processing
thread via two channels (one for inbound and one for outbound packets).
These channels are bounded, which ensures backpressure is maintained in
case one of the two threads lags behind. These bounds also mean that we
have at most N buffers from the buffer pool in-flight (where N is the
capacity of the channel).
Within those dedicated threads, we can then use `async/await` notation
to suspend the entire task when a socket isn't ready for sending.
Resolves: #8000
We are currently naively chunking our buffer into `segment_size *
max_gso_segments()`. `max_gso_segments` is by default 64. Assuming we
processed several IP packets, this would quickly balloon to a size that
the kernel cannot handle. For example, during an `iperf3` run, we
receive _a lot_ of packets at maximum MTU size (1280). With the overhead
that we are adding to the packet, this results in a UDP payload size of
1320.
```
1320 x 64 = 84480
```
That is way too large for the kernel to handle and it will fail the
`sendmsg` call with `EMSGSIZE`. Unfortunately, this error wasn't
surfaced because `quinn_udp` handles it internally because it can also
happen as a result of MTU probes.
We've already patched `quinn_udp` in the past to move the handling of
more quinn-specific errors to the infallible `send` function. The same
is being done for this error in
https://github.com/quinn-rs/quinn/pull/2199.
Resolves: #8699
At present, the Gateway implements a NAT64 conversion that can convert
IPv4 packets to IPv6 and vice versa. Doing this efficiently creates a
fair amount of complexity within our `ip-packet` crate. In addition,
routing ICMP errors back through our NAT is also complicated by this
because we may have to translate the packet embedded in the ICMP error
as well.
The NAT64 module was originally conceived as a result of the new stub
resolver-based DNS architecture. When the Client resolves IPs for a
domain, it doesn't know whether the domain will actually resolve to IPv4
AND IPv6 addresses so it simply assigns 4 of each to every domain. Thus,
when receiving an IPv6 packet for such a DNS resource, the Gateway may
only have IPv4 addresses available and can therefore not route the
packet (unless it translates it).
This problem is not novel. In fact, an IP being unroutable or a
particular route disappearing happens all the time on the Internet. ICMP
was conceived to handle this problem and it is doing a pretty good job
at it. We can make use of that and simply return an ICMP unreachable
error back to the client whenever it picks an IP that we cannot map to
one that we resolved.
In this PR, we leave all of the NAT64 code intact and only add a
feature-flag that - when active - sends aforementioned ICMP error. While
offline (and thus also for our tests), the feature-flag evaluates to
false. It is however set to `true` in the backend, meaning on staging
and later in production, we will send these ICMP errors.
Once this is rolled out and indeed proving to be working as intended, we
can simplify our codebase and rip out the NAT64 module. At that point,
we will also have to adapt the test-suite.
This is a regression introduced in c9f085c102. The `status` at this
point is still `nil` because we have not yet fully subscribed to VPN
status change updates from the system.
That actually shouldn't prevent us from trying to start the tunnel
anyway. If the `token` is missing from the Keychain, the tunnel process
will no-op. So we simply try to start a session on launch always.
Fixes#8456
Before, we would receive an `NSError` object and the type-matching
wouldn't take effect at all, causing the default alert to show every
time. This solves that by introducing a `UserFriendlyError` protocol
which is more robust against the two main `Error` and `NSError`
variants.
Whenever a Resource's name, address_description, or assigned sites
change, it is not currently reflected in clients. For that to happen the
address is changed.
This PR updates that behavior so that if any display fields are changed,
the `on_update_resources` callback is called which properly updates the
resource list views in clients.
Fixes#8284
This effectively reverts #8223 due to how this interacts with the
generated packages on Linux. The _package_ itself should still be called
`firezone-client-gui` because that is what we are installing. Perhaps we
will one day add a headless-client package so the naming chosen here
should allow for that.
To customize the desktop entry, we instead make use of the
`desktopTemplate` configuration of the Tauri bundler where we can
provide a custom `.desktop` file where we can specify a particular
application name.
As part of this, we are also updating the docs on the website to mention
the new name `Firezone Client`.
In #8159, we introduced a regression that could lead to a deadlock when
shutting down the TUN device. Whilst we did close the channel prior to
awaiting the thread to exit, we failed to notice that _another_ instance
of the sender could be alive as part of an internally stored "sending
permit" with the `PollSender` in case another packet is queued for
sending. We need to explicitly call `abort_send` to free that.
Judging from the comment and a prior bug, this shutdown logic has been
buggy before. To further avoid this deadlock, we introduce two changes:
- The worker threads only receive a `Weak` reference to the
`wintun::Session`
- We move all device-related state into a dedicated `TunState` struct
that we can drop prior to joining the threads
The combination of these features means that all strong references to
channels and the session are definitely dropped without having to wait
for anything. To provide a clean and synchronous shutdown, we wait for
at most 5s on the worker-threads. If they don't exit until then, we log
a warning and exit anyway.
This should greatly reduce the risk of future bugs here because the
session (and thus the WinTUN device) gets shutdown in any case and so at
worst, we have a few zombie threads around.
Resolves: #8265
`@Published` properties that views subscribe to for UI updates need to
be updated from the main thread only. This PR annotates the relevant
variable and function from the original author's implementation with
`@MainActor` so that Swift will properly warn us when modifying these in
the future.
A regression was introduced in #8218 that removed the `menuBar` as an
environment object for `AppView`.
Unfortunately this compiles just fine, as EnvironmentObjects are loaded
at runtime, causing the "Open Menu" button to crash since it's looking
for a non-existent EnvironmentObject.
This configures the GUI client to log to journald in addition to files
as well. For better or worse, this logs all events such that structured
information is preserved, e.g. all additional fields next to the message
are also saved as fields in the journal. By default, when viewing the
logs via `journalctl`, those fields are not displayed. This makes the
default output of `journalctl` for the FIrezone GUI not as useful as it
could be. Fixing that is left to a later stage.
Related: #8173
With the addition of the Firezone Control Protocol, we are now issuing a
lot more DNS queries on the Gateway. Specifically, every DNS query for a
DNS resource name always triggers a DNS query on the Gateway. This
ensures that changes to DNS entries for resources are picked up without
having to build any sort of "stale detection" in the Gateway itself. As
a result though, a Gateway has to issue a lot of DNS queries to upstream
resolvers which in 99% or more cases will return the same result.
To reduce the load on these upstream, we cache successful results of DNS
queries for 5 minutes.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
On Linux, logs sent to stdout from a systemd-service are automatically
captured by `journald`. This is where most admins expect logs to be and
frankly, doing any kind of debugging of Firezone is much easier if you
can do `journalctl -efu firezone-client-ipc.service` in a terminal and
check what the IPC service is doing.
On Windows, stdout from a service is (unfortunately) ignored.
To achieve this and also allow dynamically changing the log-filter, I
had to introduce a (long-overdue) abstraction over tracing's "reload"
layer that allows us to combine multiple reload-handles into one.
Unfortunately, neither the `reload::Layer` nor the `reload::Handle`
implement `Clone`, which makes this unnecessarily difficult.
Related: #8173
Now that we have error reporting via Sentry in Swift-land as well, we
can handle errors in the FFI layer more gracefully and return them to
Swift.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
The `wintun` crate will already shutdown the session for us when the
last instance of `Session` gets dropped. Shutting down the session prior
to that already results in an attempt to close an adapter that is no
longer present, causing WinTUN to log (unactionable) errors.