Commit Graph

998 Commits

Author SHA1 Message Date
Jamil
81ddf22aa0 fix(portal): use href for non-live routes (#10407)
When redirecting to paths that don't have LiveViews attached to them,
LiveView complains and emits a warning. To reduce alarm noise this PR
attempts to fix the issue.
2025-09-22 15:35:13 +00:00
Jamil
7ab5fee43a chore(portal): add remaining simple indexes (#10403)
- recreates the flows actor_group_membership index that didn't get
created due to name collision with an existing index
- adds missing resource_id, actor_group_id indexes on policies
- removes redundant `resource_id` index on resource_connections since
there's a composite index that matches already

Related: #10396
2025-09-20 10:06:39 -07:00
Brian Manifold
c3e1bc8a5b chore(portal): add non-composite indexes (#10396)
Why:

* Now that hard-delete has been rolled out, we need to make sure that
all cascade deletes are efficient. Some of the foreign key references
didn't have indexes but needed them.

Fixes #10393
2025-09-19 22:13:35 -07:00
Jamil
15283f1af5 feat(portal): batch_upsert and delete_unsynced functions (#10369)
In order to support the new, upcoming directory sync implementations, we
need the ability to batch upsert auth_identities, actors, actor_groups,
and actor_group_memberships. We also need the ability to delete entities
that were not upserted at the tail end of a sync job iteration in order
to remove entities that are no longer in the directory.

To support this, we add these functions and related tests here.

Related: #6294

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-19 20:48:26 +00:00
Jamil
5f4d2c14ea fix(ci): use correct index module name (#10383)
This gets redefined twice which could lead to indexes failing to run
properly.
2025-09-19 05:17:49 +00:00
Jamil
bfac486df5 refactor(portal): use list comprehensions in cache (#10376)
Elixir's [list
comprehensions](https://hexdocs.pm/elixir/comprehensions.html) are more
concise and [often
faster](https://stackoverflow.com/questions/55038704/elixir-enum-map-vs-for-comprehension)
(~2x) than using multiple Enum.filter and Enum.map calls.

Since I was in these modules debugging possible a race condition for
#10375, I decided to go ahead and update some of these hot functions to
use the more modern approach.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-18 18:37:37 +00:00
Jamil
c043359c21 fix(portal): don't count internet site in limits (#10336)
Starter plans don't have access to the internet site so it's not fair to
count it against their limits.

Related:
https://app.hubspot.com/contacts/23723443/record/0-5/29628021256
2025-09-12 14:11:02 +00:00
Brian Manifold
e2e370fd76 fix(portal): fix client show page sign-in method (#10327) 2025-09-11 04:33:56 +00:00
Brian Manifold
56a3ce9041 fix(portal): move hard delete migrations (#10316)
Move some of the hard-delete migrations from manual to inline to allow
us to deploy the `HEAD` of main
2025-09-09 23:41:45 +00:00
Brian Manifold
826a304071 feat(portal): enable outdated gateway email (#10281)
Enables 'outdated gateway' notifications for all accounts.

Closes #8361
2025-09-04 03:56:01 +00:00
Brian Manifold
5797511ebd fix(portal): update directory sync job w/ new hard delete (#10267)
Why:

* During the refactor to move to hard delete data in the portal there
were a couple places of inconsistency in the directory sync job where
deletion was concerned.
2025-08-31 01:19:59 +00:00
Brian Manifold
6bd19ee9b0 refactor(portal): hard delete data (#9694) 2025-08-29 22:13:44 +00:00
Brian Manifold
3673ce01a0 refactor(portal): add migrations needed for hard delete (#10257)
In preparation for transitioning from the portal from soft-delete to
hard-delete some updates to the foreign key constraints were required.
Most of these updates are adding `ON DELETE CASCADE` constraints and
relevant indexes. These new constraints should have no effect on the
current portal code as soft-deletes are still being used.

One other update in this PR is changing the FK constraint names on the
clients table. The names were `devices_*` due to the table originally
being called `devices`. This PR updates them to `clients_*`.

Related: #8187
2025-08-26 19:10:01 +00:00
Jamil
6c485be45e fix(portal): fix socket based postgres connections (#10189) (#10245)
This adds the option to do socket based postgres connection to the
replication connections. Basically just a copy of the existing config
for the base postgres connection.

---------

Co-authored-by: PatrickDaG <patrickdag@failmail.dev>
2025-08-25 17:23:03 +00:00
Jamil
6d2ea0b224 fix(portal): adapt resource on resource_updated (#10247)
When filters are updated for a Resource, we need to first adapt the
resource before rendering it down to the Gateway. Otherwise, the gateway
may see a Resource that does not match its expected schema.
2025-08-23 17:53:20 +00:00
Jamil
cafe6554ff refactor(portal): reduce cache memory usage (#10058)
Napkin math shows that we can save substantial memory (~3x or more) on
the API nodes as connected clients/gateways grow if we just store the
fields we need in order to keep the client and gateway state maintained
in the channel pids.

To facilitate this, we create new `Cacheable` structs that represent
their `Domain` cousins, which use byte arrays for `id`s and strip out
unused fields.

Additionally, all business logic involved with maintaining these caches
is now contained within two modules: `Domain.Cache.Client` and
`Domain.Cache.Gateway`, and type specs have been added to aid in static
analysis and code documentation.

Comprehensive testing is now added not only for the cache modules, but
for their associated channel modules as well to ensure we handle
different kinds of edge cases gracefully.

The `Events` nomenclature was renamed to `Changes` to better name what
we are doing: Change-Data-Capture.

Lastly, the following related changes are included in this PR since they
were "in the way" so to speak of getting this done:

- We save the last received LSN in each channel and drop the `change`
with a warning if we receive it twice in a row, or we receive it out of
order
- The client/gateway version compatibility calculations have been moved
to `Domain.Resources` and `Domain.Gateways` and have been simplified to
make them easier to understand and maintain going forward.


Related: #10174 
Fixes: #9392 
Fixes: #9965
Fixes: #9501 
Fixes: #10227

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-22 21:52:29 +00:00
Brian Manifold
551ceafb13 fix(portal): REST api updates (#10191)
* Minor updates to the REST API to more gracefully handle incorrect
input data from requests.
* Minor updates to the OpenAPI spec.
2025-08-20 21:08:07 +00:00
Jamil
0698e0d35f ci: test IPv6 for CIDR resources (#10168)
Docker for Mac finally supports IPv6 in general availability. It's time
to add IPv6 to our suite of integration tests.

The thinking behind this PR is try and not slow down CI much, if at all,
by testing IPv6 side-by-side with the existing IPv4 tests.

More comprehensive testing is being developed in #10131 that will test
things like IPv4-in-6 relaying, client / gateway IP stack mismatches,
and so forth.
2025-08-18 20:59:40 +00:00
Jamil
e5b2af1d4e chore(portal): add ChangeLogs.truncate/2 and tests (#10155)
In preparation to delete old change_logs based on account and insertion
time, we introduce a simple `truncate` function that removes old change
logs past a cutoff date.

Related: https://github.com/firezone/firezone/issues/10146

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
2025-08-06 19:19:30 +00:00
Jamil
25e15bbd14 chore(portal): drop id in favor of lsn pkey (#10152)
On the `change_logs` table, we want to minimize write overhead as much
as possible. One major way to do this is the minimize the number of
indexes maintained.

Because `lsn` is guaranteed to be unique, we can use it as the primary,
saving us an index (and column).

**NOTE**: This migration will need to acquire a lock on the table, so
it's added as a manual migration to execute out of band. Since we don't
read ChangeLogs anywhere, it should be fine for the app servers to come
up without this migration applied.
2025-08-06 15:04:34 +00:00
Jamil
2c788a31aa test(portal): Flows.delete_expired_flows/0 (#10150)
Adds a missing test for the `Flows.delete_expired_flows/0` function.
2025-08-06 14:28:36 +00:00
Brian Manifold
80e1c3255f refactor(portal): refactor billing event handler (#10064)
Why:

* There were intermittent issues with accounts updates from Stripe
events. Specifically, when an account would update it's subscription
from Starter to Team. The reason was due to the fact that Stripe does
not guarantee order of delivery for it's webhook events. At times we
were seeing and responding to an event that was a few seconds old after
processing a newer event. This would have the effect of quickly
transitioning an account from Team back to Starter. This commit
refactors our event handler and adds a `processed_stripe_events` DB
table to make sure we don't process duplicate events as well as prevent
processing an event that was created prior to the last event we've
processed for a given account.

* Along with refactoring the billing event handling, the Stripe mock
module has also been refactored to better reflect real Stripe objects.

Related: #8668
2025-08-05 16:56:52 +00:00
Jamil
cacb44f7bb test(portal): fix flaky acceptance auth test (#10140)
Occasionally, this fails because the element is found, but not visible
due to a race condition. To fix this, we assert that the element should
be visible before clicking on it.

Fixes
https://github.com/firezone/firezone/actions/runs/16751908154/job/47424125321
2025-08-05 14:53:18 +00:00
dependabot[bot]
4e40afb959 build(deps): bump hackney from 1.24.1 to 1.25.0 in /elixir (#10083)
Bumps [hackney](https://github.com/benoitc/hackney) from 1.24.1 to
1.25.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/benoitc/hackney/releases">hackney's
releases</a>.</em></p>
<blockquote>
<h2>1.25.0 - 2025-07-24</h2>
<p><strong>IMPORTANT CHANGE</strong></p>
<ul>
<li>
<p>change: <code>insecure_basic_auth</code> now defaults to
<code>true</code> instead of <code>false</code></p>
<p>This restores backward compatibility with pre-1.24.0 behavior where
basic auth
was allowed over HTTP connections. If you need strict HTTPS-only basic
auth:</p>
<ul>
<li>Set globally: <code>application:set_env(hackney,
insecure_basic_auth, false)</code></li>
<li>Or per-request: <code>{insecure_basic_auth, false}</code> in
options</li>
</ul>
</li>
</ul>
<p>Hex.pm : <a
href="https://hex.pm/packages/hackney/1.25.0">https://hex.pm/packages/hackney/1.25.0</a>
Doc: <a
href="https://hexdocs.pm/hackney/readme.html">https://hexdocs.pm/hackney/readme.html</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/benoitc/hackney/blob/master/NEWS.md">hackney's
changelog</a>.</em></p>
<blockquote>
<h2>1.25.0 - 2025-07-24</h2>
<p>** IMPORTANT CHANGE **</p>
<ul>
<li>
<p>change: <code>insecure_basic_auth</code> now defaults to
<code>true</code> instead of <code>false</code></p>
<p>This restores backward compatibility with pre-1.24.0 behavior where
basic auth
was allowed over HTTP connections. If you need strict HTTPS-only basic
auth:</p>
<ul>
<li>Set globally: <code>application:set_env(hackney,
insecure_basic_auth, false)</code></li>
<li>Or per-request: <code>{insecure_basic_auth, false}</code> in
options</li>
</ul>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="8c00789e41"><code>8c00789</code></a>
Merge pull request <a
href="https://redirect.github.com/benoitc/hackney/issues/778">#778</a>
from benoitc/insecure-basic-auth-default-true</li>
<li><a
href="a1d4108541"><code>a1d4108</code></a>
change insecure_basic_auth default to true</li>
<li><a
href="e2bbdf741e"><code>e2bbdf7</code></a>
bump unicode compat lib</li>
<li><a
href="3b901a6cf8"><code>3b901a6</code></a>
update readme</li>
<li>See full diff in <a
href="https://github.com/benoitc/hackney/compare/1.24.1...1.25.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=hackney&package-manager=hex&previous-version=1.24.1&new-version=1.25.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-01 17:41:13 +00:00
dependabot[bot]
0738427f08 build(deps): bump logger_json from 7.0.3 to 7.0.4 in /elixir (#10084)
Bumps [logger_json](https://github.com/Nebo15/logger_json) from 7.0.3 to
7.0.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/Nebo15/logger_json/releases">logger_json's
releases</a>.</em></p>
<blockquote>
<h2>7.0.4</h2>
<h2>What's Changed</h2>
<ul>
<li>Add indentation to the code snippet for docs by <a
href="https://github.com/RudolfMan"><code>@​RudolfMan</code></a> in <a
href="https://redirect.github.com/Nebo15/logger_json/pull/160">Nebo15/logger_json#160</a></li>
<li>Google Cloud: Handle non-binary values in
<code>format_affected_user/1</code> by <a
href="https://github.com/raulpe7eira"><code>@​raulpe7eira</code></a> in
<a
href="https://redirect.github.com/Nebo15/logger_json/pull/161">Nebo15/logger_json#161</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/Nebo15/logger_json/compare/7.0.3...7.0.4">https://github.com/Nebo15/logger_json/compare/7.0.3...7.0.4</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1524672b6c"><code>1524672</code></a>
Bump version</li>
<li><a
href="88a149b0ac"><code>88a149b</code></a>
fix(google-cloud): <code>format_affected_user/1</code> with non-binary
values (<a
href="https://redirect.github.com/Nebo15/logger_json/issues/161">#161</a>)</li>
<li><a
href="6e7768060e"><code>6e77680</code></a>
Add indentation to the code snippet for docs (<a
href="https://redirect.github.com/Nebo15/logger_json/issues/160">#160</a>)</li>
<li>See full diff in <a
href="https://github.com/Nebo15/logger_json/compare/7.0.3...7.0.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=logger_json&package-manager=hex&previous-version=7.0.3&new-version=7.0.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-01 15:24:41 +00:00
dependabot[bot]
c02986299e build(deps): bump postgrex from 0.20.0 to 0.21.0 in /elixir (#10085)
Bumps [postgrex](https://github.com/elixir-ecto/postgrex) from 0.20.0 to
0.21.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/elixir-ecto/postgrex/blob/master/CHANGELOG.md">postgrex's
changelog</a>.</em></p>
<blockquote>
<h2>v0.21.0 (2025-07-31)</h2>
<p>This release requires Erlang/OTP 25+</p>
<ul>
<li>
<p>Enhancements</p>
<ul>
<li>Add query timeout option on ReplicationConnection</li>
</ul>
</li>
<li>
<p>Bug fixes</p>
<ul>
<li>PGHOST option does not override explicitly given endpoint
configuration</li>
<li>Add ltxtquery support</li>
</ul>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/elixir-ecto/postgrex/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=postgrex&package-manager=hex&previous-version=0.20.0&new-version=0.21.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-01 15:24:15 +00:00
Jamil
54d91e2004 fix(portal): don't send reject_access for remaining flows (#10071)
This fixes a simple logic bug where we were mistakenly reacting to a
flow deletion event where flows still existed in the cache by sending
`reject_access`. This fixes that bug, and adds more comprehensive
logging to help diagnose issues like this more quickly in the future.

This PR also fixes the following issues found during the investigation:

- We were redundantly reacting to Token deletion in the channel pids.
This is unnecessary: we send a global socket disconnect from the Token
hook module instead.
- We had a bug that would crash the WAL consumer if a "global" token
(i.e. relay) was deleted or expired - these have no `account_id`.
- We now always use `min(max(all_conforming_polices_expiration),
token.expires_at)` when setting expiration on a new flow to minimize the
possibility for access churn.
- We now check to ensure the token and gateway are still undeleted when
re-authorizing a given flow. This prevents us from failing to send
`reject_access` when a token or gateway is deleted corresponding to a
flow, but the other entities would have granted access.


Related: https://firezone.statuspage.io/incidents/xrsm13tml3dh
Related: #10068 
Related: #9501
2025-08-01 00:03:00 +00:00
Jamil
f169746389 chore(portal): use local website url for versions in dev (#10057)
When starting a local client with a local portal, this URL is hit and
times out, causing noise in the local gateway log.

In order to develop against this API in local dev, it might be better to
use the local website URL as well.
2025-07-30 21:16:24 +00:00
Jamil
ef3ee3aba8 fix(portal): relax gateway group perms (#10034)
This is hit by the client channel when a gateway group needs to be
hydrated, which should only require "connect gateways" permissions.
2025-07-28 19:58:11 +00:00
Jamil
44a9691df5 refactor(portal): don't store account assoc on client (#10009)
The full `account` struct is only used to render the client's interface,
and doesn't need to be stored in the `client` struct when the `subject`
struct already tracks it.
2025-07-28 16:24:58 +00:00
Jamil
4a448e5517 fix(portal): separate dev and runtime Oban configs (#10027)
Oban includes its own configuration validation, which seems to prevent
`runtime.exs` from overriding any compile-time options. This prevents us
from using ENV vars to configure it, such as restricting job execution
to `domain` nodes by setting `queues: []`. To fix that, we make sure to
set Oban configuration in env-specific files `config/dev.exs` and
`config/test.exs`, and at runtime for prod with `config/runtime.exs`.

Fixes #10016
2025-07-28 15:13:52 +00:00
Jamil
3ff31e3a33 fix(portal): maintain identity preload on client (#10008)
When updating a client, we need to maintain the preloaded `identity`
association to use for the IdP policy condition.
2025-07-26 00:42:19 +00:00
Jamil
f1a5af356d fix(portal): groom resource list and flows periodically (#10005)
Time-based policy conditions are tricky. When they authorize a flow, we
correctly tell the Gateway to remove access when the time window
expires.

However, we do nothing on the client to reset the connectivity state.
This means that whenever the window of time of access was re-entered,
the client would essentially never be able to connect to it again until
the resource was toggled.

To fix this, we add a 1-minute check in the client channel that
re-checks allowed resources, and updates the client state with the
difference. This means that policies that have time-based conditions are
only accurate to the minute, but this is how they're presented anyhow.


For good measure, we also add a periodic job that runs every minute to
delete expired Flows. This will propagate to the Gateway where, if the
access for a particular client-resource is determined to be actually
gone, will receive `reject_access`.

Zooming out a bit, this PR furthers the theme that:

- Client channels react to underlying resource / policy / membership
changes directly, while
- Gateway channels react primarily to flows being deleted, or the
downstream effects of a prior client authorization
2025-07-25 21:04:41 +00:00
Jamil
2959cca8ce fix(portal): use consistent wireguard psk (#10004)
Whenever a client requests a connection to gateway, we need to generate
a preshared key that will be used for the underlying WireGuard tunnel.

When the connection setup broke or otherwise was lost, _after_ the
gateway the received the authorize_flow call, but _before_ the client
could receive the response (and initiate a tunnel), we would have to
wait until an ICE timeout occurred in order to reset state on the
gateway.

This is because the psk was not used to determine if this was a _new_
flow authorization. So the old authorization would be matched, and the
client would never be able to connect, since its tunnel was using the
new psk, and the gateway the old.

To fix this, we generate a secure random 32-byte `psk_base` on each
client and gateway. When a client wishes to connect to a gateway, we
compute the WireGuard preshared key as an HMAC over these two inputs.

This fixes the issue by ensuring that subsequent flow authorization
requests from a particular client to a particular gateway will yield the
same psk.

Related: #9999 
Related: https://github.com/firezone/infra/issues/99
2025-07-25 19:28:47 +00:00
Jamil
ccc736e63e fix(portal): reauthorize new flow when last flow deleted (#9974)
The `flows` table tracks authorizations we've made for a resource and
persists them, so that we can determine which authorizations are still
valid across deploys or hiccups in the control plane connections.

Before, when the "in-use" authorization for a resource was deleted, we
would have flapped the resource in the client, and sent `reject_access`
to the gateway. However, that would cause issues in the following edge
case:

- Client is currently connected to Resource A through Policy B
- Client websocket goes down
- Policy B is created for Resource A (for another actor group), and
Policy A is deleted by admin
- Client reconnects
- Client sees that its resource list is the same
- Gateway has since received `reject_access` because no new flows were
created for this client-resource combination

To prevent this from happening, we now try to "reauthorize" the flow
whenever the last cached flow is removed for a particular
client-resource pair. This avoids needing to toggle the resource on the
client since we won't have sent `reject_access` to the gateway.
2025-07-25 01:53:10 +00:00
Jamil
f41a6f9e0b fix(portal): don't use process.alive on remote pid (#9964)
This can be removed, since we handle the ArgumentError in the link
operation.
2025-07-22 09:42:51 -07:00
Jamil
2c3692582b fix(portal): more robust replication pid discovery (#9960)
When debugging why we're receiving "Failed to start replication
connection" errors on deploy, it was discovered that there's a bug in
the Process discovery mechanism that new nodes use to attempt to link to
the existing replication connection. When restarting an existing
`domain` container that's not doing replication, we see this:

```
{"message":"Elixir.Domain.Events.ReplicationConnection: Publication tables are up to date","time":"2025-07-22T07:18:45.948Z","domain":["elixir"],"application":"domain","severity":"INFO","logging.googleapis.com/sourceLocation":{"function":"Elixir.Domain.Events.ReplicationConnection.handle_publication_tables_diff/2","line":2,"file":"lib/domain/events/replication_connection.ex"},"logging.googleapis.com/operation":{"producer":"#PID<0.764.0>"}}
{"message":"notifier only receiving messages from its own node, functionality may be degraded","time":"2025-07-22T07:18:45.942Z","domain":["elixir"],"application":"oban","source":"oban","severity":"DEBUG","event":"notifier:switch","connectivity_status":"solitary","logging.googleapis.com/sourceLocation":{"function":"Elixir.Oban.Telemetry.log/2","line":624,"file":"lib/oban/telemetry.ex"},"logging.googleapis.com/operation":{"producer":"#PID<0.756.0>"}}
{"message":"Elixir.Domain.ChangeLogs.ReplicationConnection: Publication tables are up to date","time":"2025-07-22T07:18:45.952Z","domain":["elixir"],"application":"domain","severity":"INFO","logging.googleapis.com/sourceLocation":{"function":"Elixir.Domain.ChangeLogs.ReplicationConnection.handle_publication_tables_diff/2","line":2,"file":"lib/domain/change_logs/replication_connection.ex"},"logging.googleapis.com/operation":{"producer":"#PID<0.763.0>"}}
{"message":"Elixir.Domain.ChangeLogs.ReplicationConnection: Starting replication slot change_logs_slot","time":"2025-07-22T07:18:45.966Z","state":"[REDACTED]","domain":["elixir"],"application":"domain","severity":"INFO","logging.googleapis.com/sourceLocation":{"function":"Elixir.Domain.ChangeLogs.ReplicationConnection.handle_result/2","line":2,"file":"lib/domain/change_logs/replication_connection.ex"},"logging.googleapis.com/operation":{"producer":"#PID<0.763.0>"}}
{"message":"Elixir.Domain.Events.ReplicationConnection: Starting replication slot events_slot","time":"2025-07-22T07:18:45.966Z","state":"[REDACTED]","domain":["elixir"],"application":"domain","severity":"INFO","logging.googleapis.com/sourceLocation":{"function":"Elixir.Domain.Events.ReplicationConnection.handle_result/2","line":2,"file":"lib/domain/events/replication_connection.ex"},"logging.googleapis.com/operation":{"producer":"#PID<0.764.0>"}}
{"message":"Elixir.Domain.ChangeLogs.ReplicationConnection: Replication connection disconnected","time":"2025-07-22T07:18:45.977Z","domain":["elixir"],"application":"domain","counter":0,"severity":"INFO","logging.googleapis.com/sourceLocation":{"function":"Elixir.Domain.ChangeLogs.ReplicationConnection.handle_disconnect/1","line":2,"file":"lib/domain/change_logs/replication_connection.ex"},"logging.googleapis.com/operation":{"producer":"#PID<0.763.0>"}}
{"message":"Elixir.Domain.Events.ReplicationConnection: Replication connection disconnected","time":"2025-07-22T07:18:45.977Z","domain":["elixir"],"application":"domain","counter":0,"severity":"INFO","logging.googleapis.com/sourceLocation":{"function":"Elixir.Domain.Events.ReplicationConnection.handle_disconnect/1","line":2,"file":"lib/domain/events/replication_connection.ex"},"logging.googleapis.com/operation":{"producer":"#PID<0.764.0>"}}
{"message":"Failed to start replication connection Elixir.Domain.Events.ReplicationConnection","reason":"%Postgrex.Error{message: nil, postgres: %{code: :object_in_use, line: \"607\", message: \"replication slot \\\"events_slot\\\" is active for PID 135123\", file: \"slot.c\", unknown: \"ERROR\", severity: \"ERROR\", pg_code: \"55006\", routine: \"ReplicationSlotAcquire\"}, connection_id: 136400, query: nil}","time":"2025-07-22T07:18:45.978Z","domain":["elixir"],"application":"domain","max_retries":10,"severity":"INFO","logging.googleapis.com/sourceLocation":{"function":"Elixir.Domain.Replication.Manager.handle_info/2","line":41,"file":"lib/domain/replication/manager.ex"},"logging.googleapis.com/operation":{"producer":"#PID<0.761.0>"},"retries":0}
{"message":"Failed to start replication connection Elixir.Domain.ChangeLogs.ReplicationConnection","reason":"%Postgrex.Error{message: nil, postgres: %{code: :object_in_use, line: \"607\", message: \"replication slot \\\"change_logs_slot\\\" is active for PID 135124\", file: \"slot.c\", unknown: \"ERROR\", severity: \"ERROR\", pg_code: \"55006\", routine: \"ReplicationSlotAcquire\"}, connection_id: 136401, query: nil}","time":"2025-07-22T07:18:45.978Z","domain":["elixir"],"application":"domain","max_retries":10,"severity":"INFO","logging.googleapis.com/sourceLocation":{"function":"Elixir.Domain.Replication.Manager.handle_info/2","line":41,"file":"lib/domain/replication/manager.ex"},"logging.googleapis.com/operation":{"producer":"#PID<0.760.0>"},"retries":0}
```

Before, we relied on `start_link` telling us that there was an existing
pid running in the cluster. However, from the output above, it appears
that may not always be reliable.

Instead, we first check explicitly where the running process is and, if
alive, we try linking to it. If not, we try starting the connection
ourselves.

Once linked to the process, we react to it being torn down as well,
causing a first-one-wins scenario where all nodes will attempt to start
replication, minimizing downtime during deploys.

Now that https://github.com/firezone/infra/pull/94 is in place, I did
verify we are properly handling SIGTERM in the BEAM, so the deployment
would now go like this:

1. GCP brings up the new nodes, they all find the existing pid and link
to it
2. GCP sends SIGTERM to the old nodes
3. The _actual_ pid receives SIGTERM and exits
4. This exit propagates to all other nodes due to the link
5. Some node will "win", and the others will end up linking to it

Fixes #9911
2025-07-22 15:13:45 +00:00
Jamil
2392bddacb fix(elixir): handle nil external url config in dev mode (#9958)
This is nil in local dev. Fixing this allows us to run the local dev in
`:prod` mode to hit more codepaths.
2025-07-22 01:05:06 +00:00
Jamil
e4ba5a6929 fix(portal): inherit pid 1 in cmd (#9957)
Apparently using the shell form of this causes it not to inherit PID 1
from tini.
2025-07-21 22:38:25 +00:00
dependabot[bot]
0a0ee3c940 build(deps): bump sentry from 10.10.0 to 11.0.2 in /elixir (#9933)
Bumps [sentry](https://github.com/getsentry/sentry-elixir) from 10.10.0
to 11.0.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/getsentry/sentry-elixir/releases">sentry's
releases</a>.</em></p>
<blockquote>
<h2>11.0.2</h2>
<h3>Bug fixes</h3>
<ul>
<li>Deeply nested spans are handled now when building up traces in
<code>SpanProcessor</code> (<a
href="https://redirect.github.com/getsentry/sentry-elixir/pull/924">#924</a>)</li>
</ul>
<h4>Various improvements</h4>
<ul>
<li>Span's attributes no longer include <code>db.url:
&quot;ecto:&quot;</code> entries as they are now filtered out (<a
href="https://redirect.github.com/getsentry/sentry-elixir/pull/925">#925</a>)</li>
</ul>
<h2>11.0.1</h2>
<h4>Various improvements</h4>
<ul>
<li><code>Sentry.OpenTelemetry.Sampler</code> now works with an empty
config (<a
href="https://redirect.github.com/getsentry/sentry-elixir/pull/915">#915</a>)</li>
</ul>
<h2>11.0.0</h2>
<p>This release comes with a beta support for Traces using OpenTelemetry
- please test it out and report any issues you find.</p>
<h3>New features</h3>
<ul>
<li>
<p>Beta support for Traces using OpenTelemetry (<a
href="https://redirect.github.com/getsentry/sentry-elixir/pull/902">#902</a>)</p>
<p>To enable Tracing in your Phoenix application, you need to add the
following to your <code>mix.exs</code>:</p>
<pre lang="elixir"><code>def deps do
  [
    # ...
    {:sentry, &quot;~&gt; 11.0.0&quot;},
    {:opentelemetry, &quot;~&gt; 1.5&quot;},
    {:opentelemetry_api, &quot;~&gt; 1.4&quot;},
    {:opentelemetry_exporter, &quot;~&gt; 1.0&quot;},
    {:opentelemetry_semantic_conventions, &quot;~&gt; 1.27&quot;},
    {:opentelemetry_phoenix, &quot;~&gt; 2.0&quot;},
    {:opentelemetry_ecto, &quot;~&gt; 1.2&quot;},
    # ...
  ]
</code></pre>
<p>And then configure Tracing in Sentry and OpenTelemetry in your
<code>config.exs</code>:</p>
<pre lang="elixir"><code>config :sentry,
  # ...
  traces_sample_rate: 1.0 # any value between 0 and 1.0 enables tracing
<p>config :opentelemetry, span_processor:
{Sentry.OpenTelemetry.SpanProcessor, []}
config :opentelemetry, sampler: {Sentry.OpenTelemetry.Sampler, [drop:
[]]}
</code></pre></p>
</li>
<li>
<p>Add installer (based on Igniter) (<a
href="https://redirect.github.com/getsentry/sentry-elixir/pull/876">#876</a>)</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/getsentry/sentry-elixir/blob/master/CHANGELOG.md">sentry's
changelog</a>.</em></p>
<blockquote>
<h2>11.0.2</h2>
<h3>Bug fixes</h3>
<ul>
<li>Deeply nested spans are handled now when building up traces in
<code>SpanProcessor</code> (<a
href="https://redirect.github.com/getsentry/sentry-elixir/pull/924">#924</a>)</li>
</ul>
<h4>Various improvements</h4>
<ul>
<li>Span's attributes no longer include <code>db.url:
&quot;ecto:&quot;</code> entries as they are now filtered out (<a
href="https://redirect.github.com/getsentry/sentry-elixir/pull/925">#925</a>)</li>
</ul>
<h2>11.0.1</h2>
<h4>Various improvements</h4>
<ul>
<li><code>Sentry.OpenTelemetry.Sampler</code> now works with an empty
config (<a
href="https://redirect.github.com/getsentry/sentry-elixir/pull/915">#915</a>)</li>
</ul>
<h2>11.0.0</h2>
<p>This release comes with a beta support for Traces using OpenTelemetry
- please test it out and report any issues you find.</p>
<h3>New features</h3>
<ul>
<li>
<p>Beta support for Traces using OpenTelemetry (<a
href="https://redirect.github.com/getsentry/sentry-elixir/pull/902">#902</a>)</p>
<p>To enable Tracing in your Phoenix application, you need to add the
following to your <code>mix.exs</code>:</p>
<pre lang="elixir"><code>def deps do
  [
    # ...
    {:sentry, &quot;~&gt; 11.0.0&quot;},
    {:opentelemetry, &quot;~&gt; 1.5&quot;},
    {:opentelemetry_api, &quot;~&gt; 1.4&quot;},
    {:opentelemetry_exporter, &quot;~&gt; 1.0&quot;},
    {:opentelemetry_semantic_conventions, &quot;~&gt; 1.27&quot;},
    {:opentelemetry_phoenix, &quot;~&gt; 2.0&quot;},
    {:opentelemetry_ecto, &quot;~&gt; 1.2&quot;},
    # ...
  ]
</code></pre>
<p>And then configure Tracing in Sentry and OpenTelemetry in your
<code>config.exs</code>:</p>
<pre lang="elixir"><code>config :sentry,
  # ...
  traces_sample_rate: 1.0 # any value between 0 and 1.0 enables tracing
<p>config :opentelemetry, span_processor:
{Sentry.OpenTelemetry.SpanProcessor, []}
config :opentelemetry, sampler: {Sentry.OpenTelemetry.Sampler, []}
</code></pre></p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b142174df9"><code>b142174</code></a>
release: 11.0.2</li>
<li><a
href="f43055b8ca"><code>f43055b</code></a>
Update CHANGELOG for 11.0.2 (<a
href="https://redirect.github.com/getsentry/sentry-elixir/issues/926">#926</a>)</li>
<li><a
href="ee512d3bf6"><code>ee512d3</code></a>
Filter out empty db.url from span's attributes (<a
href="https://redirect.github.com/getsentry/sentry-elixir/issues/925">#925</a>)</li>
<li><a
href="6809aaa68c"><code>6809aaa</code></a>
Fix handling of spans at 2+ levels (<a
href="https://redirect.github.com/getsentry/sentry-elixir/issues/924">#924</a>)</li>
<li><a
href="b7e16798d3"><code>b7e1679</code></a>
Improve event callback docs (<a
href="https://redirect.github.com/getsentry/sentry-elixir/issues/922">#922</a>)</li>
<li><a
href="97d0382418"><code>97d0382</code></a>
Merge branch 'release/11.0.1'</li>
<li><a
href="738fc763cd"><code>738fc76</code></a>
release: 11.0.1</li>
<li><a
href="ab58c0ef6b"><code>ab58c0e</code></a>
Update CHANGELOG (<a
href="https://redirect.github.com/getsentry/sentry-elixir/issues/917">#917</a>)</li>
<li><a
href="028ce18841"><code>028ce18</code></a>
handle nil drop list (<a
href="https://redirect.github.com/getsentry/sentry-elixir/issues/915">#915</a>)</li>
<li><a
href="5850c73a96"><code>5850c73</code></a>
Merge branch 'release/11.0.0'</li>
<li>Additional commits viewable in <a
href="https://github.com/getsentry/sentry-elixir/compare/10.10.0...11.0.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=sentry&package-manager=hex&previous-version=10.10.0&new-version=11.0.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-21 21:01:25 +00:00
Jamil
1b1bd6401a fix(portal): gracefully account deletions in changelog (#9955)
When an account is perma-deleted, we need to handle that with another
function clause matching the WAL message coming into the change logs
replication connection module.
2025-07-21 20:47:41 +00:00
Jamil
488cb96469 fix(portal): don't prematurely reject access (#9952)
Before:

- When a flow was deleted, we flapped the resource on the client, and
sent `reject_access` naively for the flow's `{client_id, resource_id}`
pair on the gateway. This resulted in lots of unneeded resource flappage
on the client whenever bulk flow deletions happened.

After:

- When a flow is deleted, we check if this is an active flow for the
client. If so, we flap the resource then in order to trigger generation
of a new flow. If access was truly affected, that results in a loss of a
resource, we will push `resource_deleted` for the update that triggered
the flow deletion (for example the resource/policy removal). On the
gateway, we only send `reject_access` if it was the last flow granting
access for a particular `client/resource` tuple.


Why:

- While the access state is still correct in the previous
implementation, we run the possibility of pushing way too many resource
flaps to the client in an overly eager attempt to remove access the
client may not have access to.

cc @thomaseizinger 

Related:
https://firezonehq.slack.com/archives/C08FPHECLUF/p1753101115735179
2025-07-21 13:12:05 -07:00
dependabot[bot]
272074e8d4 build(deps): bump hammer from 7.0.1 to 7.1.0 in /elixir (#9935)
Bumps [hammer](https://github.com/ExHammer/hammer) from 7.0.1 to 7.1.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/ExHammer/hammer/blob/master/CHANGELOG.md">hammer's
changelog</a>.</em></p>
<blockquote>
<h2>7.1.0 - 2025-07-18</h2>
<ul>
<li>Fix key type inconsistency in backend implementations - all backends
now accept <code>term()</code> keys instead of <code>String.t()</code>
(<a
href="https://redirect.github.com/ExHammer/hammer/issues/143">#143</a>)</li>
<li>Add comprehensive test coverage for various key types (atoms,
tuples, integers, lists, maps)</li>
<li>Fix race conditions in atomic backend tests (FixWindow, LeakyBucket,
TokenBucket)</li>
<li>Replace timing-dependent tests with polling-based
<code>eventually</code> helper for better CI reliability</li>
<li>Add documentation warning about Redis backend string key
requirement</li>
<li>Fix typo in <code>inc/3</code> optional callback documentation (<a
href="https://redirect.github.com/ExHammer/hammer/issues/142">#142</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="a57bdecdc1"><code>a57bdec</code></a>
improve changelog last commit (<a
href="https://redirect.github.com/ExHammer/hammer/issues/145">#145</a>)</li>
<li><a
href="bb061c5334"><code>bb061c5</code></a>
Bump version to 7.1.0 (<a
href="https://redirect.github.com/ExHammer/hammer/issues/144">#144</a>)</li>
<li><a
href="7d7967f898"><code>7d7967f</code></a>
Fix key type inconsistency in backend implementations (<a
href="https://redirect.github.com/ExHammer/hammer/issues/143">#143</a>)</li>
<li><a
href="94d39525e8"><code>94d3952</code></a>
Fixes typo for inc/3 optional callback <code>@doc</code> (<a
href="https://redirect.github.com/ExHammer/hammer/issues/142">#142</a>)</li>
<li><a
href="79ca221876"><code>79ca221</code></a>
Bump benchee from 1.3.1 to 1.4.0 (<a
href="https://redirect.github.com/ExHammer/hammer/issues/135">#135</a>)</li>
<li><a
href="a09bbd0d42"><code>a09bbd0</code></a>
Bump ex_doc from 0.37.3 to 0.38.2 (<a
href="https://redirect.github.com/ExHammer/hammer/issues/141">#141</a>)</li>
<li><a
href="d06a17b6be"><code>d06a17b</code></a>
Bump credo from 1.7.11 to 1.7.12 (<a
href="https://redirect.github.com/ExHammer/hammer/issues/134">#134</a>)</li>
<li><a
href="26df742620"><code>26df742</code></a>
Update bug_report.md (<a
href="https://redirect.github.com/ExHammer/hammer/issues/133">#133</a>)</li>
<li><a
href="b8765fe216"><code>b8765fe</code></a>
Bump ex_doc from 0.37.2 to 0.37.3 (<a
href="https://redirect.github.com/ExHammer/hammer/issues/131">#131</a>)</li>
<li>See full diff in <a
href="https://github.com/ExHammer/hammer/compare/7.0.1...7.1.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=hammer&package-manager=hex&previous-version=7.0.1&new-version=7.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-21 13:24:40 +00:00
dependabot[bot]
619cfa0a37 build(deps): bump telemetry_poller from 1.2.0 to 1.3.0 in /elixir (#9917)
Bumps
[telemetry_poller](https://github.com/beam-telemetry/telemetry_poller)
from 1.2.0 to 1.3.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/beam-telemetry/telemetry_poller/blob/main/CHANGELOG.md">telemetry_poller's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/beam-telemetry/telemetry_poller/tree/v1.3.0">1.3.0</a></h2>
<h3>Added</h3>
<ul>
<li>Add <code>atom_limit</code>, <code>process_limit</code>, and
<code>port_limit</code> measurements to the <code>[vm,
system_counts]</code> event. (<a
href="https://redirect.github.com/beam-telemetry/telemetry_poller/issues/79">#79</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="6d5c98f580"><code>6d5c98f</code></a>
Release v1.3.0</li>
<li><a
href="411675d8ed"><code>411675d</code></a>
Add vm.system_counts measurements with atom, port, process limits (<a
href="https://redirect.github.com/beam-telemetry/telemetry_poller/issues/79">#79</a>)</li>
<li><a
href="fefb3e9053"><code>fefb3e9</code></a>
Fix incorrect GitHub CI badge URL (<a
href="https://redirect.github.com/beam-telemetry/telemetry_poller/issues/78">#78</a>)</li>
<li><a
href="f5a3a389a7"><code>f5a3a38</code></a>
Mention persistent_term in the README (<a
href="https://redirect.github.com/beam-telemetry/telemetry_poller/issues/77">#77</a>)</li>
<li><a
href="8e8148f774"><code>8e8148f</code></a>
Fix docs</li>
<li>See full diff in <a
href="https://github.com/beam-telemetry/telemetry_poller/compare/v1.2.0...v1.3.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=telemetry_poller&package-manager=hex&previous-version=1.2.0&new-version=1.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-21 13:24:15 +00:00
Jamil
b5af132ae8 feat(portal): allow queue_target and queue_interval via ENV (#9943)
These parameters should be tuned to how long we expect "normal" queries
to take against the SQL instance. For smaller instances, "normal"
queries may take longer than 500ms, so we need to be able to configure
these via our Terraform configuration.

If not specified, the same defaults are used as before.

Related: https://github.com/firezone/infra/pull/82
2025-07-20 12:28:04 -07:00
dependabot[bot]
5711807a3c build(deps): bump open_api_spex from 3.21.2 to 3.21.5 in /elixir (#9927)
Bumps [open_api_spex](https://github.com/open-api-spex/open_api_spex)
from 3.21.2 to 3.21.5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/open-api-spex/open_api_spex/releases">open_api_spex's
releases</a>.</em></p>
<blockquote>
<h2>v3.21.5</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix assert_operation_response/2 references by <a
href="https://github.com/zorbash"><code>@​zorbash</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/673">open-api-spex/open_api_spex#673</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/open-api-spex/open_api_spex/compare/v3.21.4...v3.21.5">https://github.com/open-api-spex/open_api_spex/compare/v3.21.4...v3.21.5</a></p>
<h2>v3.21.4</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix OTP-28 support by <a
href="https://github.com/bopm"><code>@​bopm</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/672">open-api-spex/open_api_spex#672</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/bopm"><code>@​bopm</code></a> made their
first contribution in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/672">open-api-spex/open_api_spex#672</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/open-api-spex/open_api_spex/compare/v3.21.3...v3.21.4">https://github.com/open-api-spex/open_api_spex/compare/v3.21.3...v3.21.4</a></p>
<h2>v3.21.3</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix cast x-validate when decoded schema by <a
href="https://github.com/GPrimola"><code>@​GPrimola</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/647">open-api-spex/open_api_spex#647</a></li>
<li>Bump CI dependencies by <a
href="https://github.com/zorbash"><code>@​zorbash</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/655">open-api-spex/open_api_spex#655</a></li>
<li>Add examples property to Schema by <a
href="https://github.com/madjar"><code>@​madjar</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/654">open-api-spex/open_api_spex#654</a></li>
<li>Document schema resolver duplicate titles behaviour by <a
href="https://github.com/zorbash"><code>@​zorbash</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/656">open-api-spex/open_api_spex#656</a></li>
<li>Add spec.yaml tasks to example applications by <a
href="https://github.com/zorbash"><code>@​zorbash</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/657">open-api-spex/open_api_spex#657</a></li>
<li>Fix 1.18 compilation warnings by <a
href="https://github.com/zorbash"><code>@​zorbash</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/665">open-api-spex/open_api_spex#665</a></li>
<li>Check for ex_doc warnings in CI and bump devtest deps by <a
href="https://github.com/zorbash"><code>@​zorbash</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/666">open-api-spex/open_api_spex#666</a></li>
<li>Test array query params in example phoenix app by <a
href="https://github.com/zorbash"><code>@​zorbash</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/667">open-api-spex/open_api_spex#667</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/GPrimola"><code>@​GPrimola</code></a>
made their first contribution in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/647">open-api-spex/open_api_spex#647</a></li>
<li><a href="https://github.com/madjar"><code>@​madjar</code></a> made
their first contribution in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/654">open-api-spex/open_api_spex#654</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/open-api-spex/open_api_spex/compare/v3.21.2...v3.21.3">https://github.com/open-api-spex/open_api_spex/compare/v3.21.2...v3.21.3</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/open-api-spex/open_api_spex/blob/master/CHANGELOG.md">open_api_spex's
changelog</a>.</em></p>
<blockquote>
<h2>v3.21.5 - 2025-07-08</h2>
<ul>
<li>Fix assert_operation_response/2 references by <a
href="https://github.com/zorbash"><code>@​zorbash</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/673">open-api-spex/open_api_spex#673</a></li>
</ul>
<h2>v3.21.4 - 2025-07-01</h2>
<ul>
<li>Fix OTP-28 support by <a
href="https://github.com/bopm"><code>@​bopm</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/672">open-api-spex/open_api_spex#672</a></li>
</ul>
<h2>v3.21.3 - 2025-06-25</h2>
<ul>
<li>Fix cast x-validate when decoded schema by <a
href="https://github.com/GPrimola"><code>@​GPrimola</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/647">open-api-spex/open_api_spex#647</a></li>
<li>Add examples property to Schema by <a
href="https://github.com/madjar"><code>@​madjar</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/654">open-api-spex/open_api_spex#654</a></li>
<li>Document schema resolver duplicate titles behaviour by <a
href="https://github.com/zorbash"><code>@​zorbash</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/656">open-api-spex/open_api_spex#656</a></li>
<li>Fix 1.18 compilation warnings by <a
href="https://github.com/zorbash"><code>@​zorbash</code></a> in <a
href="https://redirect.github.com/open-api-spex/open_api_spex/pull/665">open-api-spex/open_api_spex#665</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="bc1606b9e5"><code>bc1606b</code></a>
Release version 3.21.5</li>
<li><a
href="c71c312d0d"><code>c71c312</code></a>
Fix assert_operation_response/2 references (<a
href="https://redirect.github.com/open-api-spex/open_api_spex/issues/673">#673</a>)</li>
<li><a
href="61d5951dc4"><code>61d5951</code></a>
Release version 3.21.4</li>
<li><a
href="572130d1cf"><code>572130d</code></a>
Fix OTP-28 support (<a
href="https://redirect.github.com/open-api-spex/open_api_spex/issues/672">#672</a>)</li>
<li><a
href="2069321f6a"><code>2069321</code></a>
Release 3.21.3</li>
<li><a
href="410f3aaa64"><code>410f3aa</code></a>
Test array query params in example phoenix app (<a
href="https://redirect.github.com/open-api-spex/open_api_spex/issues/667">#667</a>)</li>
<li><a
href="22f46f527c"><code>22f46f5</code></a>
Check for ex_doc warnings in CI and bump devtest deps (<a
href="https://redirect.github.com/open-api-spex/open_api_spex/issues/666">#666</a>)</li>
<li><a
href="fa34dd00a0"><code>fa34dd0</code></a>
Fix 1.18 compilation warnings (<a
href="https://redirect.github.com/open-api-spex/open_api_spex/issues/665">#665</a>)</li>
<li><a
href="02d8c1558a"><code>02d8c15</code></a>
Add spec.yaml tasks to example applications (<a
href="https://redirect.github.com/open-api-spex/open_api_spex/issues/657">#657</a>)</li>
<li><a
href="439fadc5bd"><code>439fadc</code></a>
Document schema resolver duplicate titles behaviour (<a
href="https://redirect.github.com/open-api-spex/open_api_spex/issues/656">#656</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/open-api-spex/open_api_spex/compare/v3.21.2...v3.21.5">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=open_api_spex&package-manager=hex&previous-version=3.21.2&new-version=3.21.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-07-20 15:46:32 +00:00
Jamil
f379e85e9b refactor(portal): cache access state in channel pids (#9773)
When changes occur in the Firezone DB that trigger side effects, we need
some mechanism to broadcast and handle these.

Before, the system we used was:

- Each process subscribes to a myriad of topics related to data it wants
to receive. In some cases it would subscribe to new topics based on
received events from existing topics (I.e. flows in the gateway
channel), and sometimes in a loop. It would then need to be sure to
_unsubscribe_ from these topics
- Handle the side effect in the `after_commit` hook of the Ecto function
call after it completes
- Broadcast only a simply (thin) event message with a DB id
- In the receiver, use the id(s) to re-evaluate, or lookup one or many
records associated with the change
- After the lookup completes, `push` the relevant message(s) to the
LiveView, `client` pid, or `gateway` pid in their respective channel
processes

This system had a number of drawbacks ranging from scalability issues to
undesirable access bugs:

1. The `after_commit` callback, on each App node, is not globally
ordered. Since we broadcast a thin event schema and read from the DB to
hydrate each event, this meant we had a `read after write` problem in
our event architecture, leading to the potential for lost updates. Case
in point: if a policy is updated from `resource_id-1` to
`resource_id-2`, and then back to `resource_id-1`, it's possible that,
given the right amount of delay, the gateway channel will receive two
`reject_access` events for `resource_id-1`, as opposed to one for
`resource_id-1` and one for `resource_id-2`, leading to the potential
for unauthorized access.
1. It was very difficult to ensure that the correct topics were being
subscribed to and unsubscribed from, and the correct number of times,
leading to maintenance issues for other engineers.
1. We had a nasty N+1 query problem whenever memberships were added or
removed that resolved in essentially all access related to that
membership (so all Policies touching its actor group) to be
re-evaluated, and broadcasted. This meant that any bulk addition or
deletion of memberships would generate so many queries that they'd
timeout or consume the entire connection pool.
1. We had no durability for side-effect processing. In some places, we
were iterating over many returned records to send broadcasts.
Broadcasting is not a zero-time operation, each call takes a small
amount of CPU time to copy the message into the receiver's mailbox. If
we deployed while this was happening, the state update would be lost
forever. If this was a `reject_access` for a Gateway, the Gateway would
never remove access for that particular flow.
1. On each flow authorization, we needed to hit `us-east1` not only to
"authorize" the flow, but to log it as well. This incurs latency
especially for users in other parts of the world, which happens on
_each_ connection setup to a new resource.
1. Since we read and re-authorize access due to the thin events
broadcasted from side effects, we risk hitting thundering herd problems
(see the N+1 query problem above) where a single DB change could result
in all receivers hitting the DB at once to "hydrate" their
processing.ion
1. If an administrator modifies the DB directly, or, if we need to run a
DB migration that involves side effects, they'll be lost, because the
side effect triggers happened in `after_commit` hooks that are only
available when querying the DB through Ecto. Manually deleting (or
resurrecting) a policy, for example, would not have updated any
connected clients or gateways with the new state.


To fix all of the above, we move to the system introduced in this PR:

- All changes are now serialized (for free) by Postgres and broadcasted
as a single event stream
- The number of topics has been reduced to just one, the `account_id` of
an account. All receivers subscribe to this one topic for the lifetime
of their pid and then only filter the events they want to act upon,
ignoring all other messages
- The events themselves have been turned into "fat" structs based on the
schemas they present. By making them properly typed, we can apply things
like the existing Policy authorizer functions to them as if we had just
fetched them from the DB.
- All flow creation now happens in memory and doesn't not need to incur
a DB hit in `us-east1` to proceed.
- Since clients and gateways now track state in a push-based manner from
the DB, this means very few actual DB queries are needed to maintain
state in the channel procs, and it also means we can be smarter about
when to send `resource_deleted` and `resource_created_or_updated`
appropriately, since we can always diff between what the client _had_
access to, and what they _now_ have access to.
- All DB operations, whether they happen from the application code, a
`psql` prompt, or even via Google SQL Studio in the GCP console, will
trigger the _same_ side effects.
- We now use a replication consumer based off Postgres logical decoding
of the write-ahead log using a _durable slot_. This means that Postgres
will retain _all events_ until they are acknowledged, giving us the
ability to ensure at-least-once processing semantics for our system.
Today, the ACK is simply, "did we broadcast this event successfully".
But in the future, we can assert that replies are received before we
acknowledge the event as processed back to Postgres.



The tests in this PR have been updated to pass given the refactor.
However, since we are tracking more state now in the channel procs, it
would be a good idea to add more tests for those edge cases. That is
saved as a later PR because (1) this one is already huge, and (2) we
need to get this out to staging to smoke test everything anyhow.

Fixes: #9908 
Fixes: #9909 
Fixes: #9910
Fixes: #9900 
Related: #9501
2025-07-18 22:47:18 +00:00
Jamil
789a3012d6 fix(portal): only process jsonb strings (#9883)
As a followup to #9882, we need to ensure that `jsonb` columns that have
value data other than strings are not decoded as jsonb. An example of
when this happens is when Postgres sends an `:unchanged_toast` to
indicate the data hasn't changed.
2025-07-15 18:06:13 -07:00
Jamil
cce21a8dea fix(portal): handle jsonb for embedded schemas (#9882)
In #9664, we introduced the `Domain.struct_from_params/2` function which
converts a set of params containing string keys into a provided struct
representing a schema module. This is used to broadcast actual structs
pertaining to WAL data as opposed to simple string encodings of the
data.

The problem is that function was a bit too naive and failed to properly
cast embedded schemas, resulting in all embedded schema on the root
struct being `nil` or `[]`.

To fix this, we need to do two things:

1. We now decode JSON/JSONB fields from binaries (strings) into actual
lists and maps in the replication consumer module for downstream
processors to use
2. We update our `struct_from_params/2` function to properly cast
embedded schemas from these lists and maps using Ecto.Changeset's
`apply_changes` function, which uses the same logic to instantiate the
schemas as if we were saving a form or API request.

Lastly, tests are added to ensure this works under various scenarios,
including nested embedded schemas which we use in some places.

Fixes #9835

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-15 23:50:27 +00:00
Thomas Eizinger
cb497a7435 fix(portal): use correct password generation algorithm (#9874)
In #9870, the password generation algorithm was broken. The correct
order of the elements in the hash is: expiry, stamp_secret, salt. The
relay expects this order when it re-generates the password to validate
the message.

Due to a different bug in our CI system, we weren't actually checking
for warnings / errors in our perf-test suite:
https://github.com/firezone/firezone/actions/runs/16285038111/job/45982241021#step:9:66
2025-07-15 13:39:31 +00:00