Currently, it would theoretically be possible for an admin to connect
non-internet Resources to the Internet site. This PR fixes that by
enforcing only the `internet` Resource type can belong to the `Internet`
gateway group.
Related: #6834
`@Published` properties that views subscribe to for UI updates need to
be updated from the main thread only. This PR annotates the relevant
variable and function from the original author's implementation with
`@MainActor` so that Swift will properly warn us when modifying these in
the future.
A regression was introduced in #8218 that removed the `menuBar` as an
environment object for `AppView`.
Unfortunately this compiles just fine, as EnvironmentObjects are loaded
at runtime, causing the "Open Menu" button to crash since it's looking
for a non-existent EnvironmentObject.
Sentry uncovered a bug in the resources index liveview where it looks
like some code copy-pasted from the policies index view wasn't updated
properly to work in the resources live view, causing the view to crash
if an admin was viewing the table while the resources are changed in
another page.
In debugging that, I realized the best UX when viewing these tables is
usually just to show a `Reload` button and not update the data live
while the admin is viewing it, as this can cause missed clicks and other
annoyances.
This PR adds an optional `stale` component attribute that, if true, will
render a `Reload` button in the live table which upon clicking will
reload the live table.
Not all index views are updated with this - in some views there is
already logic to handle making an intelligent update without breaking
the view if the data is updated - for example for the clients table.
Ideally, we live-update things that don't reflow layout inline (such as
`online/offline` presence) and for things that do cause layout reflow
(create/delete), we show the `Reload` button.
However that work is saved for a future PR as this one fixes the
immediate bug and this is not the highest priority.
<img width="1195" alt="Screenshot 2025-02-16 at 8 44 43 AM"
src="https://github.com/user-attachments/assets/114efffa-85ea-490d-9cea-78c607081ce3"
/>
<img width="401" alt="Screenshot 2025-02-16 at 9 59 53 AM"
src="https://github.com/user-attachments/assets/8a570213-d4ec-4b6c-a489-dcd9ad1c351c"
/>
It's possible for a client or admin to try and load the redirect URL
directly, or a misconfigured IdP may redirect back to us with missing
params. We should redirect with an error flash instead of 500'ing.
This configures the GUI client to log to journald in addition to files
as well. For better or worse, this logs all events such that structured
information is preserved, e.g. all additional fields next to the message
are also saved as fields in the journal. By default, when viewing the
logs via `journalctl`, those fields are not displayed. This makes the
default output of `journalctl` for the FIrezone GUI not as useful as it
could be. Fixing that is left to a later stage.
Related: #8173
The original MenuBar developer set a few anti-patterns that were
somewhat followed by subsequent developers. As of now, the entire file
is too large and woefully cluttered.
This PR takes a big step towards #7771 by first organizing the current
thing into a more comprehensible file:
- `private` is removed. These are not needed in Swift unless you
actually need to make something private. The default `internal` level is
appropriate for most cases.
- state change handlers are consistently named `handleX`
- functions are reorganized, and `MARK` comments used to group similar
functions together
Our application state is incredibly simple, only consisting of a handful
of properties.
Throughout our codebase, we use a singular global state store called
`Store`. We then inject this as the singular piece of state into each
view's model.
This is unnecessary boilerplate and leads to lots of duplicated logic.
Instead we refactor away (nearly) all of the application's view models
and instead use an `@EnvironmentObject` to inject the store into each
view.
This convention drastically simplifies state tracking logic and
boilerplate in the views.
Using `rustup` - even on NixOS - is easier to manage the Rust toolchain
as some tools rely on being able to use the `rustup` shims such as
`+nightly` to run a nightly toolchain.
With the addition of the Firezone Control Protocol, we are now issuing a
lot more DNS queries on the Gateway. Specifically, every DNS query for a
DNS resource name always triggers a DNS query on the Gateway. This
ensures that changes to DNS entries for resources are picked up without
having to build any sort of "stale detection" in the Gateway itself. As
a result though, a Gateway has to issue a lot of DNS queries to upstream
resolvers which in 99% or more cases will return the same result.
To reduce the load on these upstream, we cache successful results of DNS
queries for 5 minutes.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
On Linux, logs sent to stdout from a systemd-service are automatically
captured by `journald`. This is where most admins expect logs to be and
frankly, doing any kind of debugging of Firezone is much easier if you
can do `journalctl -efu firezone-client-ipc.service` in a terminal and
check what the IPC service is doing.
On Windows, stdout from a service is (unfortunately) ignored.
To achieve this and also allow dynamically changing the log-filter, I
had to introduce a (long-overdue) abstraction over tracing's "reload"
layer that allows us to combine multiple reload-handles into one.
Unfortunately, neither the `reload::Layer` nor the `reload::Handle`
implement `Clone`, which makes this unnecessarily difficult.
Related: #8173
This PR fixes two issues:
1. Since we weren't updating any actual fields in the telemetry reporter
log record, it was never being updated, thus optimistic locking was not
taking effect. To fix this, we use `Repo.update(force: true)`.
2. If a buffer is full, we write immediately, but we provider an empty
`%Log{}` which causes a repetitive `the current value of last_flushed_at
is nil and will not be used as a filter for optimistic locking.`
The Gateway keeps some state for each client connection. Part of this
state are filters which can be controlled via the Firezone portal. Even
if no filters are set in the portal, the Gateway uses this data
structure to ensure only packets to allowed resources are forwarded. If
a resource is not allowed, its IP won't exist in the `IpNetworkTable` of
filters and thus won't be allowed.
When a Client disconnects, the Gateway cleans up this data structure and
thus all filters etc are gone. As soon as a Client reconnects, default
filters are installed (which don't allow anything) under the same IP
(the portal always assigns the same IP to Clients).
These filters are only applied on _outbound_ traffic (i.e. from the
Client towards Resources). As a result, packets arriving from Resources
to a Client will still be routed back, causing "Source not allowed"
errors on the client (which has lost all of its state when restarting).
To fix this, we apply the Gateway's filters also on the reverse path of
packets from Resources to Clients.
Resolves: #5568Resolves: #7521Resolves: #6091
At present our Rust implementation of the Phoenix Channel client tries
to detect missing heartbeat responses from the portal. This is
unnecessary and causes brittleness in production.
The WebSocket connection runs over TCP, meaning any kind of actual
network problem / partition will be detected by TCP itself and cause an
IO error further up the stack. In order to keep NAT bindings alive, we
only need to send _some_ traffic every so often, meaning sending a
heartbeat is good enough. We don't need to actually handle the response
in any particular way.
Lastly, by just using an interval, I realised that we can very easily
implement an optimisation from the Phoenix spec: Only send heartbeats if
you haven't sent anything else.
In theory, WebSocket ping/pong frames could be used for this keep-alive
mechanism. Unfortunately, as I understand the Phoenix spec, it requires
its own heartbeat to be sent, otherwise it will disconnect the
WebSocket.
This is pretty confusing when reading logs. For inbound packets, we
assume that if we don't have a NAT session, they belong to the Internet
Resource or a CIDR resource, meaning this log shows up for all packets
for those resources and even for packets that don't belong to any
resource at all.
On Linux desktops, we install a dedicated `.desktop` file that is
responsible for handling our deep-links for sign-in. This desktop entry
is not meant to be launched manually and therefore should be hidden from
the application menus.
Loading images async isn't fixing the App Hanging reports we continue to
receive, so it's something else. Rather than trying to load them
asynchronously, we revert that change.
We instead eager-load all images needed by the MenuBar at init instead
of lazy-loading them, which in rare cases could cause apparent UI hangs.
If we can't load them we log an error but continue to try an operate, as
the icons are not strictly needed for Firezone operation.
Reverts firezone/firezone#8090
For some reason, this was being initialized twice, when it doesn't need
to be.
The whole reason Favorites is initialized in the FirezoneApp module is
so we can have one instance of it passed down to children.
The WebSocket connection to the portal from within the Clients, Gateways
and Relays may be temporarily interrupted by IO errors. In such cases we
simply reconnect to it. This isn't as much of a problem for Clients and
Gateways. For Relays however, a disconnect can be disruptive for
customers because the portal will send `relays_presence` events to all
Clients and Gateways. Any relayed connection will therefore be
interrupted. See #8177.
Relays run on our own infrastructure and we want to be notified if their
connection flaps.
In order to differentiate between these scenarios, we remove the logging
from within `phoenix-channel` and report these connection hiccups one
layer up. This allows Clients and Gateways to log them on DEBUG whereas
the Relay can log them on WARN.
Related: #8177
Related: #7004
Now that we have error reporting via Sentry in Swift-land as well, we
can handle errors in the FFI layer more gracefully and return them to
Swift.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Some customers have already picked the `Internet` name, which is making
our migrations fail.
This scopes the unique name index by `managed_by` so that our attempts
to create them succeed.
Turns out that if we change the module structure at all, Terraform will
delete all the old resources contained within it before creating new
ones, because modules don't accept a `lifecycle` block. The old
resources are deemed no longer "needed" and so `create_before_destroy`
doesn't save us.
This updates the ref to environments that contains a reverting back to
the plural `module.relays[0]` structure.
The way to get around this is going to have to be to duplicate the
existing relays module, `terraform apply` to bring those up, and then
remove the old module. That is saved for a later PR.
This has been tested to achieve near-zero downtime on staging.
When making any modification that taints any Relay infrastructure, some
Relay components are destroyed before they're created, and some are
created before they're destroyed.
This results in failures that can lead to downtime, even if we bump
subnet numbering to trigger a rollover of the `naming_suffix`. See
https://app.terraform.io/app/firezone/workspaces/staging/runs
To fix this, we ensure `create_before_destroy` is applied to all Relay
module resources, and we ensure that the `naming_suffix` is properly
used in all resources that require unique names or IDs within the
project.
Thus, we need to remember to make sure to bump subnet numbering whenever
changing any Relay infrastructure so that: (1) the subnet numbering
doesn't collide, and (2) to trigger the `naming_suffix` change which
prevents other resource names from colliding.
Unfortunately there doesn't seem to be a better alternative here. The
only other alternative I could determine as of now is to derive the
subnet numbering dynamically on each deploy, incrementing them, which
would taint all Relay resources upon each and every deploy, which is
wasteful and prone to random timeouts or failures.