firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 18:18:55 +00:00

Author	SHA1	Message	Date
Jamil	bebc69e2bc	fix(portal): use distinct slot names (#9672 ) These were being configured using the same default `events_` value.	2025-06-25 17:28:17 +00:00
Thomas Eizinger	bf03e13cf0	feat(gateway): vary DNS resource NAT TTL by protocol (#9655 ) Instead of a 1 minute TTL for all connections, we vary the TTL based on the protocol being used. For TCP, that is 2 hours. For UDP and ICMP, we use 2 minutes. Resolves: #9645	2025-06-25 17:24:40 +00:00
Thomas Eizinger	d5be185ae4	chore(rust): remove telemetry spans and events (#9634 ) Originally, we introduced these to gather some data from logs / warnings that we considered to be too spammy. We've since merged a burst-protection that will at most submit the same event once every 5 minutes. The data from the telemetry spans themselves have not been used at all.	2025-06-25 17:15:57 +00:00
Thomas Eizinger	6972d4d62a	test(windows): sleep before asserting on keyring (#9670 ) I suspect that the new Windows runners are "too fast" and we hit a race condition in the use of the keyring on Windows which causes failing CI jobs. The attempt to fix this is to sleep for 1 seconds before every assert in the test.	2025-06-25 17:05:30 +00:00
Jamil	343717b502	refactor(portal): broadcast client struct when updated (#9664 ) When a client is updated, we may need to re-initialize it if "breaking" fields are updated. If non-breaking fields are changed, such as name, we don't need to re-initialize the client. This PR also adds a helper `struct_from_params/2` which will create a schema struct from WAL data in order to type cast any needed data for convenience. This avoid having to do a DB hit - we _already have the data from the DB_ - we just need to format and send it. Related: #9501	2025-06-25 17:04:41 +00:00
Thomas Eizinger	3b972643b1	feat(rust): stream logs to Sentry when enabled in PostHog (#9635 ) Sentry has a new "Logs" feature where we can stream logs directly to Sentry. Doing this for all Clients and Gateways would be way too much data to collect though. In order to aid debugging from customer installations, we add a PostHog-managed feature flag that - if set to `true` - enables the streaming of logs to Sentry. This feature flag is evaluated every time the telemetry context is initialised: - For all FFI usages of connlib, this happens every time a new session is created. - For the Windows/Linux Tunnel service, this also happens every time we create a new session. - For the Headless Client and Gateway, it happens on startup and afterwards, every minute. The feature-flag context itself is only checked every 5 minutes though so it might take up to 5 minutes before this takes effect. The default value - like all feature flags - is `false`. Therefore, if there is any issue with the PostHog service, we will fallback to the previous behaviour where logs are simply stored locally. Resolves: #9600	2025-06-25 16:14:14 +00:00
Jamil	02dd21018d	fix(portal): log error when connected_nodes crossed (#9668 ) To avoid log spam, we only log an error when the threshold boundary is crossed.	2025-06-24 21:47:17 -07:00
Jamil	95624211cd	fix(portal): update publications when config changes (#9667 ) Creating a table publication(s) (and associated replication slot) is sticky. These will outlive the lifetime of the process that created them. We don't want to remove them on shutdown, because this will pause WAL writing to disk. However, when starting the _new_ application, it's possible `table_subscriptions` has changed (such as if we decide we no longer want events for a certain table). We weren't updating the created publication(s) with these added/removed tables, so this PR updates the replication connection setup state machine to pass through a few conditionals to get these properly updated with the diff of old vs new.	2025-06-24 21:31:40 -07:00
Jamil	a9f49629ae	feat(portal): add change_logs table and insert data (#9553 ) Building on the WAL consumer that's been in development over the past several weeks, we introduce a new `change_logs` table that stores very lightly up-fitted data decoded from the WAL: - `account_id` (indexed): a foreign key reference to an account. - `inserted_at` (indexed): the timestamp of insert, for truncating rows later. - `table`: the table where the op took place. - `op`: the operation performed (insert/update/delete) - `old_data`: a nullable map of the old row data (update/delete) - `data`: a nullable map of the new row data(insert/update) - `vsn`: an integer version field we can bump to signify schema changes in the data in case we need to apply operations to only new or only old data. Judging from our prod metrics, we're currently average about 1,000 write operations a minute, which will generate about 1-2 dozen changelogs / s. Doing the math on this, 30 days at our current volume will yield about 50M / month, which should be ok for some time, since this is an append-only, rarely (if ever) read from table. The one aspect of this we may need to handle sooner than later is batch-inserting these. That raises an issue though - currently, in this PR, we process each WAL event serially, ending with the final acknowledgement `:ok` which will signal to Postgres our status in processing the WAL. If we do anything async here, this processing "cursor" then becomes inaccurate, so we may need to think about what to track and what data we care about. Related: #7124	2025-06-25 02:06:20 +00:00
Jamil	2b154d88bf	fix(ci): use relaxed naming for ignored checks (#9666 ) These jobs have the `ci / ` prefix when run on main, but no prefix when run on PRs. To fix the ignored checks, we need to use `contains`.	2025-06-24 18:56:34 -07:00
Jamil	75740e4377	fix(ci): check for correct ignored job names (#9665 ) These need the `ci / ` prefix.	2025-06-24 16:15:00 -07:00
Jamil	ff5a632d2a	fix(portal): only show `never synced` correctly (#9652 ) It's confusing that we clear this field upon sync failure. Instead, we let it track the time of the last sync. Will be cleaned up in #6294 so just applying a minimal fix now. Fixes #7715	2025-06-24 22:54:30 +00:00
Jamil	b68d037ef4	fix(deps): remove unused android-client-ffi dep (#9662 ) fixes https://github.com/firezone/firezone/actions/runs/15859533881/job/44713030395	2025-06-24 21:13:53 +00:00
Jamil	110d504516	fix(ci): maintain whitespace in sources list (#9663 ) Another issue was introduced in #9590 - we need to maintain the whitespace in the sources list when generating them. Fixes https://github.com/firezone/firezone/actions/runs/15859521283/job/44713395755	2025-06-24 21:03:11 +00:00
Jamil	85e67f1925	fix(ci): preserve sources whitespace (#9661 ) Fixes a whitespace issue introduced in #9590	2025-06-24 19:13:54 +00:00
Jamil	caa21accf9	feat(portal): add mock sync adapter staging (#9660 ) This needs to be enabled here too.	2025-06-24 19:08:58 +00:00
Jamil	933d51e3d0	feat(portal): send account_slug in gateway init (#9653 ) Adds the `account_slug` to the gateway's `init` message. When the account slug is changed, the gateway's socket is disconnected using the same mechanism as gateway deletion, which causes the gateway to reconnect immediately and receive a new `init`. Related: #9545	2025-06-24 18:35:06 +00:00
Brian Manifold	27f482e061	fix(portal): trim whitespace in all remaining forms (#9654 ) Why: * After updating the Auth Provider changesets to trim all whitespace from user editable string fields we realized we needed to do the same for all forms/entities within Firezone. This commit updates all entities to trim whitespace on string fields. Fixes: #9579	2025-06-24 14:28:51 +00:00
Thomas Eizinger	4be73da21c	fix(gateway): reply with cookie when rate limit is hit (#9657 ) WireGuard implements a rate-limit mechanism when the number of handshake initiations increases a certain limit. This is important because handshakes involve asymmetric cryptography and are cryptographically expensive. To prevent DoS attacks where other peers repeatedly ask for new handshakes, the rate limiter implements a cookie mechanism where - when under load - the remote peer needs to include a given cookie in new handshakes. This cookie is tied to the peer's IP address to prevent it from being reused by other peers. Up until now, we have not been passing the sender's IP address to `boringtun` and therefore, the only option when the rate limit was hit was to error with `UnderLoad`. By passing the source IP of the packet, `boringtun` can engage in the cookie-reply mechanism and therefore avoid the `UnderLoad` error. Resolves: #9643	2025-06-24 11:33:38 +00:00
Thomas Eizinger	91edd11a47	feat(gateway): send `$identify` event with account-slug (#9658 ) When we receive the `account_slug` from the portal, the Gateway now sends a `$identify` event to PostHog. This will allow us to target Gateways with feature-flags based on the account they are connected to.	2025-06-24 11:31:56 +00:00
Thomas Eizinger	d376a122e4	feat(telemetry): send `account_slug` to PostHog (#9636 ) In order to more easily target customers with certain feature flags, we include the `account_slug` in the `$identify` event to PostHog. This will allow us to create Cohorts in PostHog and enable / disable feature flags for all installations of Firezone for a particular customer.	2025-06-24 09:00:24 +00:00
Thomas Eizinger	3c0e866e77	feat(connlib): listen on 52625 by default (#9593 ) Presently, `connlib` always just lets the OS pick a random port for our UDP socket. This works well in many cases but has the downside that IF network admins would like to aid in the process of establishing direct connections, they cannot open a specific port because it is always random. It doesn't cost us anything to try and bind to a particular port (here 52625) and fallback to a random one if something is listening there. The port 52625 was chosen because: - It is within the ephemeral port range and will therefore never be registered to anything else. - It is an palindrome and therefore easy to remember. - When typing FIRE on a phone keypad, it you get the numbers 3473. 52625 is the port at the offset 3473 from the ephemeral port range. In order for this port to be useful in establishing direct connections, we generate optimistic candidates based on existing remote candidates by combining the IP of all server-reflexive candidates with the port of all host candidates. This patch deliberately does not publicly announce this feature in the docs or the changelog so we can first gather experience with it in our own test environment. Resolves: #9559	2025-06-24 08:41:08 +00:00
Thomas Eizinger	40f0609d90	ci: lint GitHub workflows with `actionlint` (#9590 ) [`actionlint`](https://github.com/rhysd/actionlint) is a static analysis tool for GitHub workflows and actions. It detects various issues ahead of time and runs shellcheck on all `run` blocks. It is worth noting that this does not lint the contents of composite actions so we still need to be vigilant when working with those.	2025-06-24 08:05:10 +00:00
Jamil	56b70215a7	fix(ci): dont require upload-bencher (#9650 ) Bencher is not the most reliable service, so this PR prevent us from failing CI runs on the `uploader-bencher` job. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-06-24 08:03:06 +00:00
Thomas Eizinger	a91dda139f	feat(connlib): only conditionally hash firezone ID (#9633 ) A bit of legacy that we have inherited around our Firezone ID is that the ID stored on the user's device is sha'd before being passed to the portal as the "external ID". This makes it difficult to correlate IDs in Sentry and PostHog with the data we have in the portal. For Sentry and PostHog, we submit the raw UUID stored on the user's device. As a first step in overcoming this, we embed an "external ID" in those services as well IF the provided Firezone ID is a valid UUID. This will allow us to immediately correlate those events. As a second step, we automatically generate all new Firezone IDs for the Windows and Linux Client as `hex(sha256(uuid))`. These won't parse as valid UUIDs and therefore will be submitted as is to the portal. As a third step, we update all documentation around generating Firezone IDs to use `uuidgen \| sha256` instead of just `uuidgen`. This is effectively the equivalent of (2) but for the Headless Client and Gateway where the Firezone ID can be configured via environment variables. Resolves: #9382 --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2025-06-24 07:05:48 +00:00
Thomas Eizinger	686918f1d1	chore(rust): bump str0m (#9591 ) The latest `main` of str0m undoes a breaking change in the constructor of `Candidate::relayed` by flipping the parameters back. This will make it easier to upgrade to the latest release once it is out.	2025-06-24 06:57:55 +00:00
Thomas Eizinger	1bd3d2a382	chore(gateway): remove NAT64/46 module (#9626 ) This has been disabled for several releases now and is not causing any problems in production. We can therefore safely remove it. It is about time we do this because our tests are actually still testing the variant without the feature flag and therefore deviate from what we do in production. We therefore have to convert the tests as well. Doing so uncovered a minor problem in our ICMP error parsing code: We attempted to parse the payload of an ICMP error as a fully-valid layer 4 header (e.g. TCP header or UDP header). However, per the RFC a node only needs to embed the first 8 bytes of the original packet in an ICMPv4 error. That is not enough to parse a valid TCP header as those are at least 20 bytes. I don't expect this to be a huge problem in production right now though. We only use this code to parse ICMP errors arriving on the Gateway and I _think_ most devices actually include more than 8 bytes. This only surfaced because we are very strict with only embedding exactly 8 bytes when we generate an ICMP error. Additionally, we change our ICMP errors to be sent from the resource IP rather than the Gateway's TUN device. Given that we perform NAT on these IPs anyway, I think this can still be argued to be RFC conform. The _proxy_ IP which we are trying to contact can be reached but it cannot be routed further. Therefore the destination is unreachable, yet the source of this error is the proxy IP itself. I think this is actually more correct than sending the packets from the Gateway's TUN device because the TUN device itself is not a routing hop per-se: its IP won't ever show up in the routing path.	2025-06-24 06:48:30 +00:00
Thomas Eizinger	9616296ebc	ci: run all jobs if `docker-compose.yml` changes (#9639 )	2025-06-24 06:16:25 +00:00
Jamil	a68d46bd24	chore(ci): remove write perms on winget workflow (#9598 ) This wasn't the issue - the issue was that @firezone-bot needed access to the firezone/winget-pkgs repo. Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2025-06-23 22:26:31 +00:00
Thomas Eizinger	f211c9d46a	feat(apple): use `.zip` for logs (#9536 ) This PR replaces the use of Apple Archive with an API that allows us to zip the log file contents. This API doesn't handle symlinks well so we move the symlink out of the way before making the zip. The symlink is then moved back after the process is completed. Any errors in this process are ignored as the symlink itself is not a critical component of Firezone. The zip compression is marginally less efficient than the Apple Archive. Instead of compressing ~2GB of logs to 11.8 MB we now get an archive of 12.4 MB. Considering how much easier zip files are to handle, this seems like a fine trade-off. <img width="774" alt="Screenshot 2025-06-16 at 00 04 52" src="https://github.com/user-attachments/assets/8fb6bade-5308-40b9-a446-2a2c364cb621" /> Resolves: #7475 --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>	2025-06-23 22:25:57 +00:00
Jamil	0cd919a5e2	fix(portal): use account_id index in flow expiration (#9623 ) There were a couple more instances where we weren't using the `account_id` which prevented use of the index, causing a DB Connection queue drop.	2025-06-23 21:51:21 +00:00
Jamil	ec5c433f5b	feat(ci): use larger runners for all jobs (#9646 ) Append `-xlarge` to the previous runner labels to match new larger runners.	2025-06-23 14:23:22 -07:00
Thomas Eizinger	950afd9b2d	chore(gateway): set account-slug in telemetry context (#9545 ) This PR adds an optional field `account_slug` to the Gateway's init message. If populated, we will use this field to set the account-slug in the telemetry context. This will allow us to know, which customers a particular Sentry issue is related to.	2025-06-23 18:52:39 +00:00
Jamil	f55596be4e	fix(portal): index auth_providers on adapter (#9625 ) The `refresh_tokens` job for each auth provider uses a cross-account query that unfortunately hits no indexes. This can cause slow queries each time the job runs for the adapter. We add a simple sparse index to speed this query up. Related: https://firezone-inc.sentry.io/issues/6346235615/?project=4508756715569152&query=is%3Aunresolved&referrer=issue-stream&stream_index=1	2025-06-23 18:50:22 +00:00
Thomas Eizinger	7a344836a2	fix(rust): use `rust-lld` linker for MSVC (#9641 ) The latest VisualStudio version shipped a bug in the MSVC linker that cannot handle symbols above a certain size. Switching to the Rust linker fixes this issue. Related: https://github.com/rust-lang/rust/issues/141626	2025-06-24 01:55:36 +10:00
Thomas Eizinger	e36efa5d62	ci: set static Firezone ID for docker-compose setup (#9637 )	2025-06-23 14:59:53 +00:00
Jamil	0af7582ab6	fix(portal): flush metrics as we accumulate (#9622 ) Unfortunately #9608 did not handle the case where we receive more than 200 compressed metrics in a single call. To fix this, we ensure we `flush` the metrics buffer inside the `reduce` so that we never grow the accumulated metrics buffer larger than 200 points. The log string was updated to roll the issue over in Sentry as well as the old issue was set to delete and destroy to prevent issue spam. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-06-23 14:58:18 +00:00
Jamil	a5b4ec489f	fix(docs): fix spacing due to new prettier (#9630 ) Prettier was upgraded and has changed its mind on some spacing rules in markdown files.	2025-06-23 05:52:57 +00:00
Thomas Eizinger	259b8e2a32	ci: fix Tauri workflow permissions (#9628 )	2025-06-23 15:52:35 +10:00
Thomas Eizinger	692b61d159	ci: move GUI smoke tests to tauri workflow (#9627 )	2025-06-23 08:37:52 +10:00
Thomas Eizinger	94651093cb	chore(rust): remove unused `Dockerfile-rpm` (#9624 )	2025-06-23 05:29:18 +10:00
Jamil	3029e00355	fix(android): fix view state lifecycle around tunnel/auth (#9621 ) `onViewCreated()` is called when the view initializes, and then `onResume()` is called right after, in addition to anytime the view is shown again. To prevent showing the VPN permission activity twice, we remove the `checkTunnelState()` from onViewCreated, allowing only `onResume()` to call it. A boolean flag is added to track whether this is the "first" launch of the app in order to determine whether to `connectOnStart`. Fixes #9584 --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com>	2025-06-22 16:20:11 +00:00
Jamil	867f9dfad3	fix(ci): set github token for publish workflow (#9620 ) This env var needs to be explicitly set. Related: #9618	2025-06-21 20:37:38 -07:00
Jamil	e970e3f15a	fix(ci): split newline correctly in github workflow file (#9619 ) GitHub doesn't like this syntax. Related: #9618	2025-06-21 20:26:02 -07:00
Jamil	2e065d6719	fix(ci): use publish inputs directly (#9618 ) We can't use job outputs in the job specification for a subsequent workflow. Related: #9617	2025-06-21 20:22:41 -07:00
Jamil	cb4441eafa	fix(ci): publish sha of images from release (#9617 ) To publish retroactively artifacts for the gateway and headless client, we need to pull the sha of the corresponding release tag. Related: #9615	2025-06-21 20:18:01 -07:00
Jamil	3baefd0fcf	fix(ci): remove unused id from step in publish (#9616 ) This isn't a valid name and can be removed anyway. Related: #9615	2025-06-21 19:47:16 -07:00
Jamil	2598df3030	feat(ci): allow publish workflow to be run manually (#9615 ) This allows us to retroactively run publish workflows that may have failed due to workflow bugs. Needed to publish the 1.4.11 gateway image.	2025-06-21 19:44:34 -07:00
Jamil	c783b23bae	refactor(portal): rename conditional->manual (#9612 ) These only have one condition - to run manually. `manual migrations` better implies that these migrations _must_ typically be run manually.	2025-06-21 21:17:33 +00:00
Thomas Eizinger	a2c122a3c0	refactor(apple): use `guard` for checking valid handle (#9614 ) Follow-up to #9597	2025-06-21 21:17:01 +00:00

1 2 3 4 5 ...

7634 Commits