Commit Graph

867 Commits

Author SHA1 Message Date
Brian Manifold
28e35b6500 refactor(portal): remove all soft delete columns (#10685)
Why:

* Now that soft delete fields have been removed being referenced in the
codebase and soft deleted rows have been removed from the DB we can now
remove any remaining traces of soft delete functionality.

Resolves #8187
2025-10-22 21:13:59 +00:00
Jamil
f2f8665c6a fix(portal): renew session on sign in (#10616)
When signing in, it's a good idea to clear any previous session cookie
and regenerate it, preventing the chance that any unchecked data in a
possible-fixated session cookie is used.
2025-10-21 15:21:07 -07:00
Jamil
2729a7731b chore(portal): remove dead Web.ControllerDocumentation (#10619) 2025-10-21 03:07:36 +00:00
Jamil
8cd469a9a9 refactor(portal): recreate remaining indexes without deleted_at (#10615)
Followup to #10607
2025-10-18 11:09:44 -07:00
Brian Manifold
27565ea5c8 refactor(portal): remove soft delete elements from portal code (#10607)
Why:

* In previous commits, the portal code had been updated to use hard
deletion rather than soft deletion of data. The fields used in the soft
deletion were still kept in the DB and the code to allow for zero
downtime rollout and an easy rollback if necessary. To continue with
that work the portal code has now been updated to remove any reference
to the soft deleted fields (e.g. deleted_at, persistent_id, etc...).
While the code has been updated the actual data in the DB will need to
remain for now, to once again allow for a zero downtime rollout. Once
this commit has been deployed to production another PR can follow to
remove the columns from the necessary tables in the DB.


Related: #8187
2025-10-18 17:02:26 +00:00
Mariusz Klochowicz
a27676a903 fix(portal): support dark mode in outbound emails (#10493)
Ensure that users with dark mode enabled system-wide get nice experience
whilst reading the emails.

Add a `mix test_emails` task to send all the emails and quickly inspect
them locally.

Before:

<img width="767" height="924" alt="image"
src="https://github.com/user-attachments/assets/aaac75bd-67ad-4fd8-82e8-6726ffea6bae"
/>


After (viewed via `mix test_emails`):

<img width="1063" height="928" alt="image"
src="https://github.com/user-attachments/assets/57d3a4d9-5b8f-4a45-8546-7615e15422d8"
/>

---------

Signed-off-by: Mariusz Klochowicz <mariusz@klochowicz.com>
Co-authored-by: Brian Manifold <bmanifold@users.noreply.github.com>
2025-10-16 19:35:36 +00:00
Jamil
4d43d2cb77 fix(portal): trigger email for Okta 4xx errors (#10578)
When Okta returned a 4xx status code from the API, we had updated error
handler to grab the errors from body or headers and return these.

However, the caller was expecting an explicit empty string for 401 and
403 errors in order to trigger the email send behavior.

Since that wasn't being matched, we were logging the error internally
only, and continuing to retry the sync indefinitely without sending the
user an email.


Fixes #8744 
Fixes #9825
2025-10-16 15:10:04 +00:00
Jamil
e81b4dbdac build(deps): bump floki from 0.37.1 to 0.38.0 in /elixir (#10585)
Bumps floki and updates calls to `Floki.find` and `Floki.attribute` to
use the new API.

Supersedes #9758
2025-10-16 15:09:50 +00:00
Jamil
cfc410626c chore(portal): remove unused nimble_csv dep (#10548)
This was added I believe to export certain live tables as CSV and won't
be used soon.
2025-10-13 22:50:21 +00:00
Jamil
d329880ec8 fix(portal): don't use Web functions from Domain (#10546)
Fixes an issue introduced in #10510 where Web functions (like
VerifiedRoutes) cannot be called from Domain because they are not
available in the release.

This happens to work in dev mode because everything is available under
the same dev context.
2025-10-13 20:24:46 +00:00
Jamil
b61fd20de8 chore(portal): remove Jason in favor of JSON (#10550)
Since Elixir 1.18, json encoding and decoding support is included in the
standard library. This is built on OTP's native json support which is
often faster than other implementations.

It mostly has the same API as the popular Jason library, differing
mainly in the format of the error responses returned when decoding
fails.

To minimize dependence on external libraries, we remove the Jason lib in
favor of this external dependency.

Fixes #8011
2025-10-13 17:39:53 +00:00
Jamil
1635c81a69 chore(portal): remove dead telemetry/timer.ex (#10549) 2025-10-13 16:36:19 +00:00
Jamil
3a06962497 chore(portal): remove unused file_size dep (#10547)
This doesn't appear to be used anywhere and eliminates one compile
warning due to the seemingly unmaintained
[sizeable](https://github.com/arvidkahl/sizeable) dep.
2025-10-13 16:24:03 +00:00
Jamil
bb089846d7 chore(portal): bump phoenix to 1.8 (#10510)
Bumps Phoenix to 1.8 and Phoenix LiveView to 1.1. As part of the bump a
number of issues had to be addressed. Comments inline provide more
context.

Supersedes #10475 
Supersedes #10448
2025-10-10 15:08:50 +00:00
dependabot[bot]
e2e592301a build(deps): bump @fontsource-variable/source-sans-3 from 5.2.8 to 5.2.9 in /elixir/apps/web/assets (#10514)
Bumps
[@fontsource-variable/source-sans-3](https://github.com/fontsource/font-files/tree/HEAD/fonts/variable/source-sans-3)
from 5.2.8 to 5.2.9.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/fontsource/font-files/commits/HEAD/fonts/variable/source-sans-3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@fontsource-variable/source-sans-3&package-manager=npm_and_yarn&previous-version=5.2.8&new-version=5.2.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-08 16:26:56 +00:00
dependabot[bot]
e1c6bb84c3 build(deps): bump open_api_spex from 3.21.5 to 3.22.0 in /elixir (#10482)
Bumps [open_api_spex](https://github.com/open-api-spex/open_api_spex)
from 3.21.5 to 3.22.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/open-api-spex/open_api_spex/releases">open_api_spex's
releases</a>.</em></p>
<blockquote>
<h2>v3.22.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Support multiple apps in Plug.SwaggerUI by <a
href="https://github.com/zorbash"><code>@​zorbash</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/676">open-api-spex/open_api_spex#676</a></li>
<li>Validate keys given to operation/2 macro by <a
href="https://github.com/xxdavid"><code>@​xxdavid</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/675">open-api-spex/open_api_spex#675</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/xxdavid"><code>@​xxdavid</code></a> made
their first contribution in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/675">open-api-spex/open_api_spex#675</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/open-api-spex/open_api_spex/compare/v3.21.5...v3.22.0">https://github.com/open-api-spex/open_api_spex/compare/v3.21.5...v3.22.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/open-api-spex/open_api_spex/blob/master/CHANGELOG.md">open_api_spex's
changelog</a>.</em></p>
<blockquote>
<h2>v3.22.0 - 2025-08-05</h2>
<ul>
<li>Support multiple apps in Plug.SwaggerUI by <a
href="https://github.com/zorbash"><code>@​zorbash</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/676">open-api-spex/open_api_spex#676</a></li>
<li>Validate keys given to operation/2 macro by <a
href="https://github.com/xxdavid"><code>@​xxdavid</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/675">open-api-spex/open_api_spex#675</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="be4f59a3d0"><code>be4f59a</code></a>
Release version 3.22.0</li>
<li><a
href="7de53cb763"><code>7de53cb</code></a>
Validate keys given to operation/2 macro (<a
href="https://redirect.github.com/open-api-spex/open_api_spex/issues/675">#675</a>)</li>
<li><a
href="9967e2f9d8"><code>9967e2f</code></a>
Support multiple apps in Plug.SwaggerUI (<a
href="https://redirect.github.com/open-api-spex/open_api_spex/issues/676">#676</a>)</li>
<li>See full diff in <a
href="https://github.com/open-api-spex/open_api_spex/compare/v3.21.5...v3.22.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=open_api_spex&package-manager=hex&previous-version=3.21.5&new-version=3.22.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-01 03:41:15 +00:00
Jamil
f07d2932dc feat(portal): show outdated clients (#10456)
Clients more than one minor away from a particular won't be able to
connect, so it would be useful to help admins recognize when this might
be the case and encourage them to upgrade.

We accomplish that with two small UX improvements in this PR:

- In the outdated gateways email, we show them a count of clients that
will no longer be able to connect to the new gateway version, linking
them to a sorted view of the clients table
- In the clients table, we add a new sortable `version` column to allow
admins to see which clients are outdated

Fixes #7727 
Fixes #10385

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
2025-09-30 13:23:30 +00:00
Thomas Eizinger
b11adfcfe4 feat(connlib): create flow on ICMP error "prohibited" (#10462)
In Firezone, a Client requests an "access authorization" for a Resource
on the fly when it sees the first packet for said Resource going through
the tunnel. If we don't have a connection to the Gateway yet, this is
also where we will establish a connection and create the WireGuard
tunnel.

In order for this to work, the access authorization state between the
Client and the Gateway MUST NOT get out of sync. If the Client thinks it
has access to a Resource, it will just route the traffic to the Gateway.
If the access authorization on the Gateway has expired or vanished
otherwise, the packets will be black-holed.

Starting with #9816, the Gateway sends ICMP errors back to the
application whenever it filters a packet. This can happen either because
the access authorization is gone or because the traffic wasn't allowed
by the specific filter rules on the Resource.

With this patch, the Client will attempt to create a new flow (i.e.
re-authorize) traffic for this resource whenever it sees such an ICMP
error, therefore acting as a way of synchronizing the view of the world
between Client and Gateway should they ever run out of sync.

Testing turned out to be a bit tricky. If we let the authorization on
the Gateway lapse naturally, we portal will also toggle the Resource off
and on on the Client, resulting in "flushing" the current
authorizations. Additionally, it the Client had only access to one
Resource, then the Gateway will gracefully close the connection, also
resulting in the Client creating a new flow for the next packet.

To actually trigger this new behaviour we need to:

- Access at least two resources via the same Gateway
- Directly send `reject_access` to the Gateway for this particular
resource

To achieve this, we dynamically eval some code on the API node and
instruct the Gateway channel to send `reject_access`. The connection
stays intact because there is still another active access authorization
but packets for the other resource are answered with ICMP errors.

To achieve a safe roll-out, the new behaviour is feature-flagged. In
order to still test it, we now also allow feature flags to be set via
env variables.

Resolves: #10074

---------

Co-authored-by: Mariusz Klochowicz <mariusz@klochowicz.com>
2025-09-30 08:23:39 +00:00
Jamil
2e0517ed7b feat(api): GET /account API (#10302)
By customer request, it would be helpful to expose an endpoint to
retrieve current account / billing details like seats used and other
usage-based metrics.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-28 11:59:39 -07:00
Thomas Eizinger
685acdac3a feat: add more specific component type to user-agent header (#10457)
In order to allow the portal to more easily classify, what kind of
component is connecting, we extend the `get_user_agent` header to
include a component type instead of the generic `connlib/`.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2025-09-26 00:18:36 +00:00
Jamil
81ddf22aa0 fix(portal): use href for non-live routes (#10407)
When redirecting to paths that don't have LiveViews attached to them,
LiveView complains and emits a warning. To reduce alarm noise this PR
attempts to fix the issue.
2025-09-22 15:35:13 +00:00
Jamil
7ab5fee43a chore(portal): add remaining simple indexes (#10403)
- recreates the flows actor_group_membership index that didn't get
created due to name collision with an existing index
- adds missing resource_id, actor_group_id indexes on policies
- removes redundant `resource_id` index on resource_connections since
there's a composite index that matches already

Related: #10396
2025-09-20 10:06:39 -07:00
Brian Manifold
c3e1bc8a5b chore(portal): add non-composite indexes (#10396)
Why:

* Now that hard-delete has been rolled out, we need to make sure that
all cascade deletes are efficient. Some of the foreign key references
didn't have indexes but needed them.

Fixes #10393
2025-09-19 22:13:35 -07:00
Jamil
15283f1af5 feat(portal): batch_upsert and delete_unsynced functions (#10369)
In order to support the new, upcoming directory sync implementations, we
need the ability to batch upsert auth_identities, actors, actor_groups,
and actor_group_memberships. We also need the ability to delete entities
that were not upserted at the tail end of a sync job iteration in order
to remove entities that are no longer in the directory.

To support this, we add these functions and related tests here.

Related: #6294

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-19 20:48:26 +00:00
Jamil
5f4d2c14ea fix(ci): use correct index module name (#10383)
This gets redefined twice which could lead to indexes failing to run
properly.
2025-09-19 05:17:49 +00:00
Jamil
bfac486df5 refactor(portal): use list comprehensions in cache (#10376)
Elixir's [list
comprehensions](https://hexdocs.pm/elixir/comprehensions.html) are more
concise and [often
faster](https://stackoverflow.com/questions/55038704/elixir-enum-map-vs-for-comprehension)
(~2x) than using multiple Enum.filter and Enum.map calls.

Since I was in these modules debugging possible a race condition for
#10375, I decided to go ahead and update some of these hot functions to
use the more modern approach.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-18 18:37:37 +00:00
Jamil
c043359c21 fix(portal): don't count internet site in limits (#10336)
Starter plans don't have access to the internet site so it's not fair to
count it against their limits.

Related:
https://app.hubspot.com/contacts/23723443/record/0-5/29628021256
2025-09-12 14:11:02 +00:00
Brian Manifold
e2e370fd76 fix(portal): fix client show page sign-in method (#10327) 2025-09-11 04:33:56 +00:00
Brian Manifold
56a3ce9041 fix(portal): move hard delete migrations (#10316)
Move some of the hard-delete migrations from manual to inline to allow
us to deploy the `HEAD` of main
2025-09-09 23:41:45 +00:00
Brian Manifold
826a304071 feat(portal): enable outdated gateway email (#10281)
Enables 'outdated gateway' notifications for all accounts.

Closes #8361
2025-09-04 03:56:01 +00:00
Brian Manifold
5797511ebd fix(portal): update directory sync job w/ new hard delete (#10267)
Why:

* During the refactor to move to hard delete data in the portal there
were a couple places of inconsistency in the directory sync job where
deletion was concerned.
2025-08-31 01:19:59 +00:00
Brian Manifold
6bd19ee9b0 refactor(portal): hard delete data (#9694) 2025-08-29 22:13:44 +00:00
Brian Manifold
3673ce01a0 refactor(portal): add migrations needed for hard delete (#10257)
In preparation for transitioning from the portal from soft-delete to
hard-delete some updates to the foreign key constraints were required.
Most of these updates are adding `ON DELETE CASCADE` constraints and
relevant indexes. These new constraints should have no effect on the
current portal code as soft-deletes are still being used.

One other update in this PR is changing the FK constraint names on the
clients table. The names were `devices_*` due to the table originally
being called `devices`. This PR updates them to `clients_*`.

Related: #8187
2025-08-26 19:10:01 +00:00
Jamil
6d2ea0b224 fix(portal): adapt resource on resource_updated (#10247)
When filters are updated for a Resource, we need to first adapt the
resource before rendering it down to the Gateway. Otherwise, the gateway
may see a Resource that does not match its expected schema.
2025-08-23 17:53:20 +00:00
Jamil
cafe6554ff refactor(portal): reduce cache memory usage (#10058)
Napkin math shows that we can save substantial memory (~3x or more) on
the API nodes as connected clients/gateways grow if we just store the
fields we need in order to keep the client and gateway state maintained
in the channel pids.

To facilitate this, we create new `Cacheable` structs that represent
their `Domain` cousins, which use byte arrays for `id`s and strip out
unused fields.

Additionally, all business logic involved with maintaining these caches
is now contained within two modules: `Domain.Cache.Client` and
`Domain.Cache.Gateway`, and type specs have been added to aid in static
analysis and code documentation.

Comprehensive testing is now added not only for the cache modules, but
for their associated channel modules as well to ensure we handle
different kinds of edge cases gracefully.

The `Events` nomenclature was renamed to `Changes` to better name what
we are doing: Change-Data-Capture.

Lastly, the following related changes are included in this PR since they
were "in the way" so to speak of getting this done:

- We save the last received LSN in each channel and drop the `change`
with a warning if we receive it twice in a row, or we receive it out of
order
- The client/gateway version compatibility calculations have been moved
to `Domain.Resources` and `Domain.Gateways` and have been simplified to
make them easier to understand and maintain going forward.


Related: #10174 
Fixes: #9392 
Fixes: #9965
Fixes: #9501 
Fixes: #10227

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-22 21:52:29 +00:00
Brian Manifold
551ceafb13 fix(portal): REST api updates (#10191)
* Minor updates to the REST API to more gracefully handle incorrect
input data from requests.
* Minor updates to the OpenAPI spec.
2025-08-20 21:08:07 +00:00
Jamil
0698e0d35f ci: test IPv6 for CIDR resources (#10168)
Docker for Mac finally supports IPv6 in general availability. It's time
to add IPv6 to our suite of integration tests.

The thinking behind this PR is try and not slow down CI much, if at all,
by testing IPv6 side-by-side with the existing IPv4 tests.

More comprehensive testing is being developed in #10131 that will test
things like IPv4-in-6 relaying, client / gateway IP stack mismatches,
and so forth.
2025-08-18 20:59:40 +00:00
Jamil
e5b2af1d4e chore(portal): add ChangeLogs.truncate/2 and tests (#10155)
In preparation to delete old change_logs based on account and insertion
time, we introduce a simple `truncate` function that removes old change
logs past a cutoff date.

Related: https://github.com/firezone/firezone/issues/10146

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
2025-08-06 19:19:30 +00:00
Jamil
25e15bbd14 chore(portal): drop id in favor of lsn pkey (#10152)
On the `change_logs` table, we want to minimize write overhead as much
as possible. One major way to do this is the minimize the number of
indexes maintained.

Because `lsn` is guaranteed to be unique, we can use it as the primary,
saving us an index (and column).

**NOTE**: This migration will need to acquire a lock on the table, so
it's added as a manual migration to execute out of band. Since we don't
read ChangeLogs anywhere, it should be fine for the app servers to come
up without this migration applied.
2025-08-06 15:04:34 +00:00
Jamil
2c788a31aa test(portal): Flows.delete_expired_flows/0 (#10150)
Adds a missing test for the `Flows.delete_expired_flows/0` function.
2025-08-06 14:28:36 +00:00
Brian Manifold
80e1c3255f refactor(portal): refactor billing event handler (#10064)
Why:

* There were intermittent issues with accounts updates from Stripe
events. Specifically, when an account would update it's subscription
from Starter to Team. The reason was due to the fact that Stripe does
not guarantee order of delivery for it's webhook events. At times we
were seeing and responding to an event that was a few seconds old after
processing a newer event. This would have the effect of quickly
transitioning an account from Team back to Starter. This commit
refactors our event handler and adds a `processed_stripe_events` DB
table to make sure we don't process duplicate events as well as prevent
processing an event that was created prior to the last event we've
processed for a given account.

* Along with refactoring the billing event handling, the Stripe mock
module has also been refactored to better reflect real Stripe objects.

Related: #8668
2025-08-05 16:56:52 +00:00
Jamil
cacb44f7bb test(portal): fix flaky acceptance auth test (#10140)
Occasionally, this fails because the element is found, but not visible
due to a race condition. To fix this, we assert that the element should
be visible before clicking on it.

Fixes
https://github.com/firezone/firezone/actions/runs/16751908154/job/47424125321
2025-08-05 14:53:18 +00:00
Jamil
54d91e2004 fix(portal): don't send reject_access for remaining flows (#10071)
This fixes a simple logic bug where we were mistakenly reacting to a
flow deletion event where flows still existed in the cache by sending
`reject_access`. This fixes that bug, and adds more comprehensive
logging to help diagnose issues like this more quickly in the future.

This PR also fixes the following issues found during the investigation:

- We were redundantly reacting to Token deletion in the channel pids.
This is unnecessary: we send a global socket disconnect from the Token
hook module instead.
- We had a bug that would crash the WAL consumer if a "global" token
(i.e. relay) was deleted or expired - these have no `account_id`.
- We now always use `min(max(all_conforming_polices_expiration),
token.expires_at)` when setting expiration on a new flow to minimize the
possibility for access churn.
- We now check to ensure the token and gateway are still undeleted when
re-authorizing a given flow. This prevents us from failing to send
`reject_access` when a token or gateway is deleted corresponding to a
flow, but the other entities would have granted access.


Related: https://firezone.statuspage.io/incidents/xrsm13tml3dh
Related: #10068 
Related: #9501
2025-08-01 00:03:00 +00:00
Jamil
ef3ee3aba8 fix(portal): relax gateway group perms (#10034)
This is hit by the client channel when a gateway group needs to be
hydrated, which should only require "connect gateways" permissions.
2025-07-28 19:58:11 +00:00
Jamil
44a9691df5 refactor(portal): don't store account assoc on client (#10009)
The full `account` struct is only used to render the client's interface,
and doesn't need to be stored in the `client` struct when the `subject`
struct already tracks it.
2025-07-28 16:24:58 +00:00
Jamil
3ff31e3a33 fix(portal): maintain identity preload on client (#10008)
When updating a client, we need to maintain the preloaded `identity`
association to use for the IdP policy condition.
2025-07-26 00:42:19 +00:00
Jamil
f1a5af356d fix(portal): groom resource list and flows periodically (#10005)
Time-based policy conditions are tricky. When they authorize a flow, we
correctly tell the Gateway to remove access when the time window
expires.

However, we do nothing on the client to reset the connectivity state.
This means that whenever the window of time of access was re-entered,
the client would essentially never be able to connect to it again until
the resource was toggled.

To fix this, we add a 1-minute check in the client channel that
re-checks allowed resources, and updates the client state with the
difference. This means that policies that have time-based conditions are
only accurate to the minute, but this is how they're presented anyhow.


For good measure, we also add a periodic job that runs every minute to
delete expired Flows. This will propagate to the Gateway where, if the
access for a particular client-resource is determined to be actually
gone, will receive `reject_access`.

Zooming out a bit, this PR furthers the theme that:

- Client channels react to underlying resource / policy / membership
changes directly, while
- Gateway channels react primarily to flows being deleted, or the
downstream effects of a prior client authorization
2025-07-25 21:04:41 +00:00
Jamil
2959cca8ce fix(portal): use consistent wireguard psk (#10004)
Whenever a client requests a connection to gateway, we need to generate
a preshared key that will be used for the underlying WireGuard tunnel.

When the connection setup broke or otherwise was lost, _after_ the
gateway the received the authorize_flow call, but _before_ the client
could receive the response (and initiate a tunnel), we would have to
wait until an ICE timeout occurred in order to reset state on the
gateway.

This is because the psk was not used to determine if this was a _new_
flow authorization. So the old authorization would be matched, and the
client would never be able to connect, since its tunnel was using the
new psk, and the gateway the old.

To fix this, we generate a secure random 32-byte `psk_base` on each
client and gateway. When a client wishes to connect to a gateway, we
compute the WireGuard preshared key as an HMAC over these two inputs.

This fixes the issue by ensuring that subsequent flow authorization
requests from a particular client to a particular gateway will yield the
same psk.

Related: #9999 
Related: https://github.com/firezone/infra/issues/99
2025-07-25 19:28:47 +00:00
Jamil
ccc736e63e fix(portal): reauthorize new flow when last flow deleted (#9974)
The `flows` table tracks authorizations we've made for a resource and
persists them, so that we can determine which authorizations are still
valid across deploys or hiccups in the control plane connections.

Before, when the "in-use" authorization for a resource was deleted, we
would have flapped the resource in the client, and sent `reject_access`
to the gateway. However, that would cause issues in the following edge
case:

- Client is currently connected to Resource A through Policy B
- Client websocket goes down
- Policy B is created for Resource A (for another actor group), and
Policy A is deleted by admin
- Client reconnects
- Client sees that its resource list is the same
- Gateway has since received `reject_access` because no new flows were
created for this client-resource combination

To prevent this from happening, we now try to "reauthorize" the flow
whenever the last cached flow is removed for a particular
client-resource pair. This avoids needing to toggle the resource on the
client since we won't have sent `reject_access` to the gateway.
2025-07-25 01:53:10 +00:00
Jamil
f41a6f9e0b fix(portal): don't use process.alive on remote pid (#9964)
This can be removed, since we handle the ArgumentError in the link
operation.
2025-07-22 09:42:51 -07:00