firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 18:18:55 +00:00

Author	SHA1	Message	Date
Jamil	bfbb2425b0	fix(infra): further reduce postgresql parallelism (#8262 ) Seems that we need to reduce parallelism further in order to reliably execute postgres permissions updates in terraform.	2025-02-24 23:20:55 -08:00
Jamil	281f60c47d	fix(infra): Permissions can't be concurrently updated for a role (#8261 ) We have two `postgresql_grant` resources. This PR updates them so that one depends on the other, ensuring they run sequentially. See https://discuss.hashicorp.com/t/for-each-support-sequential-operation/34680 <img width="796" alt="Screenshot 2025-02-24 at 10 00 26 PM" src="https://github.com/user-attachments/assets/5b0ce752-b6fe-4e10-9fee-24bffd4e1f7f" />	2025-02-24 22:01:55 -08:00
Jamil	bc02a7f6a5	chore(website): Hide landing page banner (#8260 ) This isn't new any longer.	2025-02-25 05:32:21 +00:00
Thomas Eizinger	d12ce29c94	chore(docker-compose): add `init` shim for Gateway (#8257 ) In order for the Gateway container to correctly react to signals for restarting / rebuilding, it needs to have the `init` shim activated.	2025-02-25 03:51:05 +00:00
Jamil	5650150b3f	chore(portal): Enforce only `internet` resource in internet site (#8254 ) Currently, it would theoretically be possible for an admin to connect non-internet Resources to the Internet site. This PR fixes that by enforcing only the `internet` Resource type can belong to the `Internet` gateway group. Related: #6834	2025-02-25 03:45:40 +00:00
Jamil	31111257e1	fix(portal): Fix layout of live_table filters controls (#8256 ) Fixes layout of filters controls. A more permanent fix will be introduced as part of #8255	2025-02-24 18:27:50 -08:00
Jamil	8bd94599a9	chore(portal): Remove dead ScimController (#8253 ) This is leftover from a previous prototype and can be removed.	2025-02-24 22:58:38 +00:00
Jamil	c9a2917a52	fix(apple): Encode/Decode SemanticVersion (#8251 ) `SemanticVersion` needs to be properly encoded / decoded before saving to UserDefaults.	2025-02-24 12:04:44 -08:00
Jamil	e4121918f6	fix(website): Fix changelog typo (#8250 )	2025-02-24 11:04:55 -08:00
Jamil	4defd9695d	fix(apple): Process update notifications on main thread only (#8248 ) `@Published` properties that views subscribe to for UI updates need to be updated from the main thread only. This PR annotates the relevant variable and function from the original author's implementation with `@MainActor` so that Swift will properly warn us when modifying these in the future.	2025-02-24 10:46:31 -08:00
Jamil	cf837a3507	fix(apple): Pass menuBar to AppView (#8249 ) A regression was introduced in #8218 that removed the `menuBar` as an environment object for `AppView`. Unfortunately this compiles just fine, as EnvironmentObjects are loaded at runtime, causing the "Open Menu" button to crash since it's looking for a non-existent EnvironmentObject.	2025-02-24 10:44:55 -08:00
Jamil	0bc3895c3e	ci: Bump Apple clients to 1.4.4 (#8245 ) These have been released / published. Need to merge this to get website links and changelog updated.	2025-02-24 09:01:45 -08:00
Jamil	c5929d4063	fix(portal): Show reload button when table data is stale (#8143 ) Sentry uncovered a bug in the resources index liveview where it looks like some code copy-pasted from the policies index view wasn't updated properly to work in the resources live view, causing the view to crash if an admin was viewing the table while the resources are changed in another page. In debugging that, I realized the best UX when viewing these tables is usually just to show a `Reload` button and not update the data live while the admin is viewing it, as this can cause missed clicks and other annoyances. This PR adds an optional `stale` component attribute that, if true, will render a `Reload` button in the live table which upon clicking will reload the live table. Not all index views are updated with this - in some views there is already logic to handle making an intelligent update without breaking the view if the data is updated - for example for the clients table. Ideally, we live-update things that don't reflow layout inline (such as `online/offline` presence) and for things that do cause layout reflow (create/delete), we show the `Reload` button. However that work is saved for a future PR as this one fixes the immediate bug and this is not the highest priority. <img width="1195" alt="Screenshot 2025-02-16 at 8 44 43 AM" src="https://github.com/user-attachments/assets/114efffa-85ea-490d-9cea-78c607081ce3" /> <img width="401" alt="Screenshot 2025-02-16 at 9 59 53 AM" src="https://github.com/user-attachments/assets/8a570213-d4ec-4b6c-a489-dcd9ad1c351c" />	2025-02-24 15:39:16 +00:00
Jamil	29f0ac0a00	fix(portal): Handle missing params in idp callback (#8239 ) It's possible for a client or admin to try and load the redirect URL directly, or a misconfigured IdP may redirect back to us with missing params. We should redirect with an error flash instead of 500'ing.	2025-02-24 13:38:10 +00:00
Jamil	2cd1b388d5	chore(infra): Decrease relays_down auto_close to 28800s (#8233 ) These will happen on each deploy. Currently they auto resolve a few days after we deploy, which is confusing.	2025-02-24 05:04:26 +00:00
Thomas Eizinger	a0f079f1cd	feat(gui-client): send Linux GUI logs to journald (#8236 ) This configures the GUI client to log to journald in addition to files as well. For better or worse, this logs all events such that structured information is preserved, e.g. all additional fields next to the message are also saved as fields in the journal. By default, when viewing the logs via `journalctl`, those fields are not displayed. This makes the default output of `journalctl` for the FIrezone GUI not as useful as it could be. Fixing that is left to a later stage. Related: #8173	2025-02-24 04:28:56 +00:00
Jamil	b9c0ba9c3a	fix(apple): web auth session mem leak (#8237 ) We had a very small memory leak due to a circular reference in the `WebAuthSession` class. <img width="1552" alt="Screenshot 2025-02-23 at 7 00 50 PM" src="https://github.com/user-attachments/assets/d91a6fb9-8d96-4648-a451-0d1361870b28" />	2025-02-24 03:26:56 +00:00
Jamil	d7ed8ac248	refactor(apple): Clean up MenuBar vars, function names (#8234 ) The original MenuBar developer set a few anti-patterns that were somewhat followed by subsequent developers. As of now, the entire file is too large and woefully cluttered. This PR takes a big step towards #7771 by first organizing the current thing into a more comprehensible file: - `private` is removed. These are not needed in Swift unless you actually need to make something private. The default `internal` level is appropriate for most cases. - state change handlers are consistently named `handleX` - functions are reorganized, and `MARK` comments used to group similar functions together	2025-02-24 03:13:15 +00:00
Jamil	83b2c7a71a	refactor(apple): Collapse ViewModels to app-wide Store (#8218 ) Our application state is incredibly simple, only consisting of a handful of properties. Throughout our codebase, we use a singular global state store called `Store`. We then inject this as the singular piece of state into each view's model. This is unnecessary boilerplate and leads to lots of duplicated logic. Instead we refactor away (nearly) all of the application's view models and instead use an `@EnvironmentObject` to inject the store into each view. This convention drastically simplifies state tracking logic and boilerplate in the views.	2025-02-24 02:28:29 +00:00
Thomas Eizinger	4cb2b01c26	build(nix): manage Rust installation via `rustup` (#8235 ) Using `rustup` - even on NixOS - is easier to manage the Rust toolchain as some tools rely on being able to use the `rustup` shims such as `+nightly` to run a nightly toolchain.	2025-02-24 01:33:13 +00:00
Thomas Eizinger	57ce0ee469	feat(gateway): cache DNS queries for resources (#8225 ) With the addition of the Firezone Control Protocol, we are now issuing a lot more DNS queries on the Gateway. Specifically, every DNS query for a DNS resource name always triggers a DNS query on the Gateway. This ensures that changes to DNS entries for resources are picked up without having to build any sort of "stale detection" in the Gateway itself. As a result though, a Gateway has to issue a lot of DNS queries to upstream resolvers which in 99% or more cases will return the same result. To reduce the load on these upstream, we cache successful results of DNS queries for 5 minutes. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2025-02-23 04:27:09 +00:00
Thomas Eizinger	f882edb3bd	feat(gui-client): configure IPC service to log to stdout (#8219 ) On Linux, logs sent to stdout from a systemd-service are automatically captured by `journald`. This is where most admins expect logs to be and frankly, doing any kind of debugging of Firezone is much easier if you can do `journalctl -efu firezone-client-ipc.service` in a terminal and check what the IPC service is doing. On Windows, stdout from a service is (unfortunately) ignored. To achieve this and also allow dynamically changing the log-filter, I had to introduce a (long-overdue) abstraction over tracing's "reload" layer that allows us to combine multiple reload-handles into one. Unfortunately, neither the `reload::Layer` nor the `reload::Handle` implement `Clone`, which makes this unnecessarily difficult. Related: #8173	2025-02-23 00:23:29 +00:00
Jamil	0c5fc8fe0a	chore(infra): Increase terraform create timeouts to 30m (#8230 ) With the additional number of resources we are now managing on each deploy, these can sometimes timeout, even though they would have succeeded. https://app.terraform.io/app/firezone/workspaces/production/runs/run-qnyFGhyjX8ZxMWvf	2025-02-21 16:20:08 +00:00
Jamil	d9a513fa54	fix(portal): optionally enable optimistic lock (#8229 ) When the buffer is full, we want to update immediately, without locking.	2025-02-20 23:42:29 -08:00
Jamil	a797e350c0	fix(portal): Force update last_flushed_at for optimistic lock (#8228 ) This PR fixes two issues: 1. Since we weren't updating any actual fields in the telemetry reporter log record, it was never being updated, thus optimistic locking was not taking effect. To fix this, we use `Repo.update(force: true)`. 2. If a buffer is full, we write immediately, but we provider an empty `%Log{}` which causes a repetitive `the current value of last_flushed_at is nil and will not be used as a filter for optimistic locking.`	2025-02-20 23:12:17 -08:00
Thomas Eizinger	ea9796e346	feat(gateway): apply filter engine to inbound packets (#7702 ) The Gateway keeps some state for each client connection. Part of this state are filters which can be controlled via the Firezone portal. Even if no filters are set in the portal, the Gateway uses this data structure to ensure only packets to allowed resources are forwarded. If a resource is not allowed, its IP won't exist in the `IpNetworkTable` of filters and thus won't be allowed. When a Client disconnects, the Gateway cleans up this data structure and thus all filters etc are gone. As soon as a Client reconnects, default filters are installed (which don't allow anything) under the same IP (the portal always assigns the same IP to Clients). These filters are only applied on _outbound_ traffic (i.e. from the Client towards Resources). As a result, packets arriving from Resources to a Client will still be routed back, causing "Source not allowed" errors on the client (which has lost all of its state when restarting). To fix this, we apply the Gateway's filters also on the reverse path of packets from Resources to Clients. Resolves: #5568 Resolves: #7521 Resolves: #6091	2025-02-21 05:59:36 +00:00
Thomas Eizinger	f22a285678	feat(phoenix-channel): don't try to detect missing heartbeats (#8220 ) At present our Rust implementation of the Phoenix Channel client tries to detect missing heartbeat responses from the portal. This is unnecessary and causes brittleness in production. The WebSocket connection runs over TCP, meaning any kind of actual network problem / partition will be detected by TCP itself and cause an IO error further up the stack. In order to keep NAT bindings alive, we only need to send _some_ traffic every so often, meaning sending a heartbeat is good enough. We don't need to actually handle the response in any particular way. Lastly, by just using an interval, I realised that we can very easily implement an optimisation from the Phoenix spec: Only send heartbeats if you haven't sent anything else. In theory, WebSocket ping/pong frames could be used for this keep-alive mechanism. Unfortunately, as I understand the Phoenix spec, it requires its own heartbeat to be sent, otherwise it will disconnect the WebSocket.	2025-02-21 05:42:49 +00:00
Thomas Eizinger	9bc23732f3	chore(apple): downgrade warning about installed crypto provider (#8226 ) With the introduction of system extensions, the memory is no longer free'd after the tunnel disconnects meaning this can easily happen.	2025-02-21 05:27:12 +00:00
Thomas Eizinger	273d723729	fix(gui-client): use "Firezone" as the application name on Linux (#8223 ) The current `.desktop` file uses the `firezone-client-gui` name from the Tauri config. This looks ugly and unprofessional. Instead, we should just call this "Firezone". ![image](https://github.com/user-attachments/assets/3c4705fb-3611-4da9-9254-eaee06a8d749) Resolves: #8205	2025-02-21 05:26:34 +00:00
Thomas Eizinger	deb47d956e	chore(gateway): remove log around "No NAT session" (#8227 ) This is pretty confusing when reading logs. For inbound packets, we assume that if we don't have a NAT session, they belong to the Internet Resource or a CIDR resource, meaning this log shows up for all packets for those resources and even for packets that don't belong to any resource at all.	2025-02-21 05:24:59 +00:00
Thomas Eizinger	b10b6e75ea	fix(gui-client): hide the `.desktop` entry for deep-links (#8224 ) On Linux desktops, we install a dedicated `.desktop` file that is responsible for handling our deep-links for sign-in. This desktop entry is not meant to be launched manually and therefore should be hidden from the application menus.	2025-02-21 05:19:19 +00:00
Jamil	31e7aef77a	revert: loading NSImage asynchronously (#8215 ) Loading images async isn't fixing the App Hanging reports we continue to receive, so it's something else. Rather than trying to load them asynchronously, we revert that change. We instead eager-load all images needed by the MenuBar at init instead of lazy-loading them, which in rare cases could cause apparent UI hangs. If we can't load them we log an error but continue to try an operate, as the icons are not strictly needed for Firezone operation. Reverts firezone/firezone#8090	2025-02-20 23:41:39 +00:00
Jamil	0ae74fe126	refactor(apple): Don't initialize Favorites twice (#8216 ) For some reason, this was being initialized twice, when it doesn't need to be. The whole reason Favorites is initialized in the FirezoneApp module is so we can have one instance of it passed down to children.	2025-02-20 21:27:52 +00:00
Jamil	a07f1725c6	chore(portal): Refactor GCP labels logger to relax sentry alerts (#8213 )	2025-02-20 11:20:45 +00:00
Thomas Eizinger	6f68b97558	chore(gui-client): release `v1.4.6` (#8211 )	2025-02-20 04:25:38 +00:00
Thomas Eizinger	d5fdb5fda8	test(connlib): remove assertion around idle packets / sec (#8210 ) This has been flaky recently but it isn't a priority right now.	2025-02-20 01:33:18 +00:00
Thomas Eizinger	81da120c17	fix(phoenix-channel): report connection hiccups to upper layer (#8203 ) The WebSocket connection to the portal from within the Clients, Gateways and Relays may be temporarily interrupted by IO errors. In such cases we simply reconnect to it. This isn't as much of a problem for Clients and Gateways. For Relays however, a disconnect can be disruptive for customers because the portal will send `relays_presence` events to all Clients and Gateways. Any relayed connection will therefore be interrupted. See #8177. Relays run on our own infrastructure and we want to be notified if their connection flaps. In order to differentiate between these scenarios, we remove the logging from within `phoenix-channel` and report these connection hiccups one layer up. This allows Clients and Gateways to log them on DEBUG whereas the Relay can log them on WARN. Related: #8177 Related: #7004	2025-02-20 00:54:43 +00:00
Thomas Eizinger	cad84922db	fix(apple): don't panic in FFI functions (#8202 ) Now that we have error reporting via Sentry in Swift-land as well, we can handle errors in the FFI layer more gracefully and return them to Swift. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2025-02-20 00:51:56 +00:00
Jamil	2dae8bd656	fix(portal): move rename index before create internet site (#8209 ) Otherwise prod won't run this migration...	2025-02-19 16:10:24 -08:00
Jamil	407085d7ec	fix(portal): Add `managed_by` to gateway groups index (#8208 ) Some customers have already picked the `Internet` name, which is making our migrations fail. This scopes the unique name index by `managed_by` so that our attempts to create them succeed.	2025-02-19 15:55:51 -08:00
Jamil	80210a5093	fix(portal): fix typo on settings -> dns page (#8207 )	2025-02-19 15:41:34 -08:00
Jamil	03558a5899	feat(website): Internet resource migration blogpost (#8150 ) This is an announcement we will be linking to.	2025-02-19 15:28:52 -08:00
Jamil	9a2f2c0fa6	fix(infra): Add missing naming suffix to lb ingress (#8206 ) Adds a naming_suffix I left out on the relays module.	2025-02-19 15:13:04 -08:00
Jamil	96ae1117dc	fix(infra): Revert to plural relays module to avoid downtime (#8201 ) Turns out that if we change the module structure at all, Terraform will delete all the old resources contained within it before creating new ones, because modules don't accept a `lifecycle` block. The old resources are deemed no longer "needed" and so `create_before_destroy` doesn't save us. This updates the ref to environments that contains a reverting back to the plural `module.relays[0]` structure. The way to get around this is going to have to be to duplicate the existing relays module, `terraform apply` to bring those up, and then remove the old module. That is saved for a later PR. This has been tested to achieve near-zero downtime on staging.	2025-02-19 21:45:21 +00:00
Jamil	762f16bfea	fix(infra): `create_before_destroy` for all Relay resources (#8198 ) When making any modification that taints any Relay infrastructure, some Relay components are destroyed before they're created, and some are created before they're destroyed. This results in failures that can lead to downtime, even if we bump subnet numbering to trigger a rollover of the `naming_suffix`. See https://app.terraform.io/app/firezone/workspaces/staging/runs To fix this, we ensure `create_before_destroy` is applied to all Relay module resources, and we ensure that the `naming_suffix` is properly used in all resources that require unique names or IDs within the project. Thus, we need to remember to make sure to bump subnet numbering whenever changing any Relay infrastructure so that: (1) the subnet numbering doesn't collide, and (2) to trigger the `naming_suffix` change which prevents other resource names from colliding. Unfortunately there doesn't seem to be a better alternative here. The only other alternative I could determine as of now is to derive the subnet numbering dynamically on each deploy, incrementing them, which would taint all Relay resources upon each and every deploy, which is wasteful and prone to random timeouts or failures.	2025-02-19 07:10:12 -08:00
Jamil	0346d13627	docs: Update viewing logs for Linux GUI and Windows headless (#8192 )	2025-02-19 10:27:17 +00:00
Jamil	bb999b73f3	chore(infra): bump tf environments to fast-forward (#8197 ) This will perform the final staging test that will ensure the pending prod deploy will work smoothly.	2025-02-19 00:36:12 -08:00
Jamil	0c324139e5	revert: staging env to pre-relay changes (#8196 ) This is needed to test a full rollout of the recent infra changes to ensure zero-downtime will be achieved when going out to prod soon.	2025-02-18 23:36:06 -08:00
Jamil	d1de22e7cc	fix(infra): Keep reservation names in sync (#8194 )	2025-02-18 22:40:48 -08:00
Jamil	63c3529aa4	fix(infra): Bump tf environments to fix relays ssh (#8193 ) See firezone/environments#4	2025-02-18 21:42:32 -08:00

1 2 3 4 5 ...

6655 Commits