Commit Graph

704 Commits

Author SHA1 Message Date
Jamil
f07aa8aa3a fix(portal): Add ReplicationConnectionManager (#9158)
When deploying, new Elixir nodes are spun up before the old ones are
brought down. This ensures that the state in the cluster is merge into
the new nodes between the old ones go away.

This also means, however, that the existing WAL consumer is still up
when our new one tries to come online.

Normally, this isn't an issue, because we find the old pid and return it
with `{:ok, existing_pid}`. When that VM goes away, the Supervisor(s) of
the new application notice and restart it,

However, if the cluster state diverges or is inconsistent during this
period, we may fail to find the existing pid, and try to start a new
ReplicationConnection. If the old pid is still active, this will fail
because there's a mutex on the connection. The original implementation
was designed to handle this case using the Supervisor with a
`:transient` restart policy.

What the author failed to understand is that the restart policy applies
only to _restarts_ and not initial starts, so if all of the new
application servers fail to find the old pid which is still connected,
and they all fail to come up, we won't consume the WAL.

This is fixed with a `ReplicationConnectionManager` that always comes up
fine, and then simply tries to start a `ReplicationConnection` every
30s, giving up after 5 minutes if it can't start one or find an existing
one. This will crash, causing the Supervisor to restart us, and then
notify us.
2025-05-16 19:54:27 +00:00
Jamil
65c58ee254 feat(portal): Zero-click client authentication (#9144)
Adds a new field to `settings/identity_providers` that allows an Admin
to designate any non-email/otp provider as the `default` for client
authentication. Clients will then navigate directly to the provider's
`/redirect` endpoint when authenticating, which in many cases will
automatically sign them in.

No existing providers are updated in this PR.



https://github.com/user-attachments/assets/7b962a25-76fd-491f-a194-60ed993821fc
2025-05-16 19:26:08 +00:00
Brian Manifold
dd5a53f686 fix(portal): Fix sign_up to properly populate email (#9105)
Why:

* During the account sign up flow, the email of the first admin was not
being populated in the `email` column on the auth_identities table. This
was due to atoms being passed in the attrs instead of strings to the
`create_identity` function. A migration was also created to backfill the
missing emails in the `auth_identities` table.
2025-05-13 19:49:25 +00:00
Brian Manifold
a8bea13591 fix(portal): Remove redundant index on actor_group_memberships (#9063)
Why:

* It was pointed out that the way Postgresql does compound indexes there
is no need to have an individual index on the first column of the
compound index. This commit removes the redundant index on the
`actor_id` for the `actor_group_membership` table.
2025-05-09 05:32:45 +00:00
Brian Manifold
20d8246ce8 fix(portal): Add indexes to actor_group_memberships (#9058)
Why:

* As we move towards hard deleting data one issue we've run into is with
cascading deletes on the actor_group_memberships table. In order to
solve this problem indexes have been created on the `actor_id` and
`group_id` columns of the actor_group_memberships.
2025-05-09 01:47:24 +00:00
Jamil
8a7f248dda fix(portal): ignore expected replication connection failures (#9003)
These are expected during deploys, so don't log them as errors. If the
Supervisor fails to start us after exhausting all attempts, it will log
an error.
2025-05-02 00:45:02 +00:00
Jamil
299fbcd096 fix(portal): Properly check background jobs (#8986)
The `background_jobs_enabled` config in an ENV var that needs to be set
for a specific configuration key. It's not set on the top-level
`:domain` config by default.

Instead, it's used to enable / disable specific modules to start by the
application's Supervisor.

The `Domain.Events.ReplicationConnection` module is updated in this PR
to follow this convention.
2025-05-01 16:32:43 +00:00
Jamil
8e054f5c74 fix(portal): Restrict WAL streaming to domain nodes only (#8956)
The `web` and `api` application use `domain` as a dependency in their
`mix.exs`. This means by default their Supervisor will start the
Domain's supervision tree as well.

The author did not realize this at the time of implementation, and so we
now leverage the convention in place for restricting tasks to `domain`
nodes, the `background_jobs_enabled` application configuration
parameter.

We also add an info log when the replication slot is being started so we
can verify the node it's starting on.
2025-05-01 13:28:40 +00:00
Jamil
c0a670d947 fix(portal): Restart ReplicationConnection using Supervisor (#8953)
When deploying, the cluster state diverges temporarily, which allows
more than one `ReplicationConnection` process to start on the new nodes.

(One of) the old nodes still has an active slot, and we get an "object
in use" error `(Postgrex.Error) ERROR 55006 (object_in_use) replication
slot "events_slot" is active for PID 603037`.

Rather than use ReplicationConnection's restart behavior (which logs
tons of errors with Logger.error), we can use the Supervisor here
instead, and continue to try and start the ReplicationConnection until
successful.

Note that if the process name is registered (globally) and running,
ReplicationConnection.start_link/1 simply returns `{:ok, pid}` instead
of erroring out with `:already_running`, so eventually one of the nodes
will succeed and the remaining ones will return the globally-registered
pid.
2025-05-01 03:48:35 +00:00
Jamil
fdd1105b10 fix(portal): alter db user role with replication (#8952)
We need the `replication` attribute set on the db user. This is
trivially done in a migration, and with the `CURRENT_USER` specifier, we
don't need to fetch the Application configuration.
2025-04-30 13:02:34 -07:00
Jamil
a98a9867af fix(portal): Redact entire connection_opts param (#8946)
The LoggerJSON Redactor only redacts top-level keys, so we need to
redact the entire `connection_opts` param to redact its contained
password.

We also don't need to pass around `connection_opts` across the entire
ReplicationConnection process state, only for the initial connection, so
we refactor that out of the `state`.
2025-04-30 16:33:21 +00:00
Jamil
968db2ae39 feat(portal): Receive WAL events (#8909)
Firezone's control plane is a realtime, distributed system that relies
on a broadcast/subscribe system to function. In many cases, these events
are broadcasted whenever relevant data in the DB changes, such as an
actor losing access to a policy, a membership being deleted, and so
forth.

Today, this is handled in the application layer, typically happening at
the place where the relevant DB call is made (i.e. in an
`after_commit`). While this approach has worked thus far, it has several
issues:

1. We have no guarantee that the DB change will issue a broadcast. If
the application is deployed or the process crashes after the DB changes
are made but before the broadcast happens, we will have potentially
failed to update any connected clients or gateways with the changes.
2. We have no guarantee that the order of DB updates will be maintained
in order for broadcasts. In other words, app server A could win its DB
operation against app server B, but then proceed to lose being the first
to broadcast.
3. If the cluster is in a bad state where broadcasts may return an error
(i.e. https://github.com/firezone/firezone/issues/8660), we will never
retry the broadcast.

To fix the above issues, we introduce a WAL logical decoder that process
the event stream one message at a time and performs any needed work.
Serializability is guaranteed since we only process the WAL in a single,
cluster-global process, `ReplicationConnection`. Durability is also
guaranteed since we only ACK WAL segments after we've successfully
ingested the event.

This means we will only advance the position of our WAL stream after
successfully broadcasting the event.

This PR only introduces the WAL stream processing system but does not
introduce any changes to our current broadcasting behavior - that's
saved for another PR.
2025-04-29 23:53:06 -07:00
Brian Manifold
3f3f007920 fix(portal): Update copy to clipboard button (#8907)
Why:

* The copy to clipboard button was not working at all on the API new
token page due to the fact that the FlowbiteJS library expects the
presence of the elements in the DOM on first render. This was not true
of the API Token code block. Along with that issue the existing code
blocks copy to clipboard buttons did not give any visual indication that
the copy had been completed. It was also somewhat difficult to see the
copy to clipboard button on those code blocks as well. This commit
updates the buttons to be more visible, as well as adds a phx-hook to
make sure the FlowbiteJS init functions are run on every code block even
if it's inserted after the initial load of the page and adds functions
that are run as a callback to toggle the button text and icon to show
the text has been copied.
2025-04-26 00:43:43 +00:00
Jamil
0a2a393d4c fix(portal): Prevent additional email identities per actor (#8888)
This is a UI-only change for now to serve as a stop-gap while we work to
overhaul the identity domain model.

Related: #6294
2025-04-22 21:13:37 +00:00
Jamil
8293e6c440 fix(portal): Don't peek groups for api_client actors (#8890)
API clients don't belong to any actor_groups and attempting to deep link
into the `groups` section when viewing an actor raises a 500 error.

This PR fixes that by removing the deep link into `actor_groups` from
the actors index view.
2025-04-22 13:59:06 +00:00
Jamil
0f300f2484 fix(portal): Prevent dupe sync adapters (#8887)
Prevents more than one sync-enabled adapter per account in order to
prepare for eventually adding a unique constraint on
`provider_identifier` for identities and groups per account.

Related: #6294

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Brian Manifold <bmanifold@users.noreply.github.com>
2025-04-22 13:58:24 +00:00
Jamil
d10c77c17d chore(portal): Drop unused table configurations (#8881)
This was left behind in a large refactor as part of #3642 and was never
cleaned up.

I verified on prod this table in fact has no meaningful data in it and
has not changed since that PR was merged.
2025-04-21 22:04:46 +00:00
Brian Manifold
74ccf8e0b2 fix(portal): Update elixir OIDC library (#8802)
Why:

* Updating the Elixir OIDC library to pick up a fix made in the library
regarding EdDSA keys
  https://github.com/firezone/openid_connect/pull/8
2025-04-17 22:06:40 +00:00
Brian Manifold
4c9848453d refactor(portal): Add more logging around sign in errors (#8789)
Why:

* To allow for more accurate and efficient troubleshooting in
production.
2025-04-15 14:25:06 +00:00
Jamil
2bbc0abc3a feat(portal): Add Oban (#8786)
Our current bespoke job system, while it's worked out well so far, has
the following shortcomings:

- No retry logic
- No robust to guarantee job isolation / uniqueness without resorting to
row-level locking
- No support for cron-based scheduling

This PR adds the boilerplate required to get started with
[Oban](https://hexdocs.pm/oban/Oban.html), the job management system for
Elixir.
2025-04-15 03:56:49 +00:00
Jamil
6cd7616b5c refactor(portal): Expect members key to be missing when empty (#8781)
This will prevent warning spam we're currently seeing in Sentry.
2025-04-14 20:12:43 +00:00
Jamil
2f0d2462c9 fix(portal): Increase directory sync timeout to 8 hours (#8771)
Large Okta directories can take a very long time (> 1 hour) to sync.
This currently times out, preventing any entities from making it into
the database.

There are many things to address in our sync operation, but this should
hopefully resolve the immediate issue with the customer.


https://firezone-inc.sentry.io/issues/6537862651/?project=4508756715569152&query=is%3Aunresolved%20issue.priority%3A%5Bhigh%2C%20medium%5D%20Enum.to_list&referrer=issue-stream&stream_index=0
2025-04-13 17:27:15 +00:00
Jamil
649c03e290 chore(portal): Bump LoggerJSON to 7.0.0, fixing config (#8759)
There was slight API change in the way LoggerJSON's configuration is
generation, so I took the time to do a little fixing and cleanup here.

Specifically, we should be using the `new/1` callback to create the
Logger config which fixes the below exception due to missing config
keys:

```
FORMATTER CRASH: {report,[{formatter_crashed,'Elixir.LoggerJSON.Formatters.GoogleCloud'},{config,[{metadata,{all_except,[socket,conn]}},{redactors,[{'Elixir.LoggerJSON.Redactors.RedactKeys',[<<"password">>,<<"secret">>,<<"nonce">>,<<"fragment">>,<<"state">>,<<"token">>,<<"public_key">>,<<"private_key">>,<<"preshared_key">>,<<"session">>,<<"sessions">>]}]}]},{log_event,#{meta => #{line => 15,pid => <0.308.0>,time => 1744145139650804,file => "lib/logger.ex",gl => <0.281.0>,domain => [elixir],application => libcluster,mfa => {'Elixir.Cluster.Logger',info,2}},msg => {string,<<"[libcluster:default] connected to :\"web@web.cluster.local\"">>},level => info}},{reason,{error,{badmatch,[{metadata,{all_except,[socket,conn]}},{redactors,[{'Elixir.LoggerJSON.Redactors.RedactKeys',[<<"password">>,<<"secret">>,<<"nonce">>,<<"fragment">>,<<"state">>,<<"token">>,<<"public_key">>,<<"private_key">>,<<"preshared_key">>,<<"session">>,<<"sessions">>]}]}]},[{'Elixir.LoggerJSON.Formatters.GoogleCloud',format,2,[{file,"lib/logger_json/formatters/google_cloud.ex"},{line,148}]}]}}]}
```

Supersedes #8714

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-11 19:00:06 -07:00
Brian Manifold
bed6a60056 fix(portal): Fetch latest Okta access_token before API call (#8745)
Why:

* The Okta IdP sync job needs to make sure it is always using the latest
access token available. If not, there is the possibility for the job to
take too long to complete and the access token that the job started with
might time out. This commit updates the Okta API client to always check
and make sure it is using the latest access token for each request to
the Okta API.
2025-04-11 21:25:07 +00:00
Jamil
d2fd57a3b6 fix(portal): Attach Sentry in each umbrella app (#8749)
- Attaches the Sentry Logging hook in each of [api, web, domain]
- Removes errant Sentry logging configuration in config/config.exs
- Fixes the exception logger to default to logging exceptions, use
`skip_sentry: true` to skip

Tested successfully in dev. Hopefully the cluster behaves the same way.

Fixes #8639
2025-04-11 04:17:12 +00:00
dependabot[bot]
3458d7f151 build(deps): bump tailwind from 0.2.4 to 0.3.1 in /elixir (#8707)
Bumps [tailwind](https://github.com/phoenixframework/tailwind) from
0.2.4 to 0.3.1.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/phoenixframework/tailwind/blob/main/CHANGELOG.md">tailwind's
changelog</a>.</em></p>
<blockquote>
<h2>v0.3.1 (2025-02-28)</h2>
<ul>
<li>Support correct target for Linux MUSL with Tailwind v3.</li>
</ul>
<h2>v0.3.0 (2025-02-26)</h2>
<ul>
<li>Support Tailwind v4+. This release assumes Tailwind v4 for new
projects.</li>
</ul>
<p>Note: v0.3.0 dropped target code for handling Linux MUSL with
Tailwind v3. Use v0.3.1+ instead.</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="dec852e08d"><code>dec852e</code></a>
release v0.3.1</li>
<li><a
href="2bc2fdff38"><code>2bc2fdf</code></a>
Merge pull request <a
href="https://redirect.github.com/phoenixframework/tailwind/issues/115">#115</a>
from phoenixframework/sd-musl-target-v3v4</li>
<li><a
href="c0006e254b"><code>c0006e2</code></a>
Support Linux MUSL v3 and v4</li>
<li><a
href="08629c84b8"><code>08629c8</code></a>
release v0.3.0</li>
<li><a
href="8b3247daad"><code>8b3247d</code></a>
Merge branch 'next'</li>
<li><a
href="7e1f93b284"><code>7e1f93b</code></a>
use Tailwind 4.0.9 as latest</li>
<li><a
href="44ac9014f0"><code>44ac901</code></a>
don't mention 0.3 or Tailwind v4 in README yet</li>
<li><a
href="8ad425c2da"><code>8ad425c</code></a>
Pass url as a string into fetch_body! as URI.parse would not succeed
with a c...</li>
<li><a
href="6f45cae55d"><code>6f45cae</code></a>
Merge pull request <a
href="https://redirect.github.com/phoenixframework/tailwind/issues/97">#97</a>
from arcanemachine/main</li>
<li><a
href="22788850d2"><code>2278885</code></a>
Merge pull request <a
href="https://redirect.github.com/phoenixframework/tailwind/issues/110">#110</a>
from phoenixframework/sd-tailwind3to4</li>
<li>Additional commits viewable in <a
href="https://github.com/phoenixframework/tailwind/compare/v0.2.4...v0.3.1">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=tailwind&package-manager=hex&previous-version=0.2.4&new-version=0.3.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-11 03:32:52 +00:00
Jamil
b9532bc243 revert: "Enable automatic tax calculation by default" (#8743)
This needs #8670 in order to function.

Reverts firezone/firezone#8552
2025-04-11 02:59:17 +00:00
Jamil
05dafabbad fix(portal): Fix human display of geo location (#8665)
These seem to be swapped. Generally accepted is `city, country`.
2025-04-09 01:28:35 +00:00
Jamil
8ca43300cd chore(portal): Fix typo: counties -> countries (#8666) 2025-04-05 08:11:05 +00:00
dependabot[bot]
a66423c25c build(deps): bump @fontsource/source-sans-3 from 5.1.1 to 5.2.6 in /elixir/apps/web/assets (#8599)
Bumps
[@fontsource/source-sans-3](https://github.com/fontsource/font-files/tree/HEAD/fonts/google/source-sans-3)
from 5.1.1 to 5.2.6.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/fontsource/font-files/commits/HEAD/fonts/google/source-sans-3">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=@fontsource/source-sans-3&package-manager=npm_and_yarn&previous-version=5.1.1&new-version=5.2.6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-04 02:50:46 +00:00
Jamil
6e336fc3bc fix(portal): Update flows fkey constraints to cascade deletes (#8645)
The `flows` table currently has `ON DELETE SET NULL` behavior for many
of its foreign key constraints. The problem is that if we try to delete
any of the associated entities, setting a null here causes the DB
operation to fail with:

```
ERROR:  null value in column "policy_id" of relation "flows" violates not-null constraint
```

I can understand why it was originally architected like this to preserve
connection log data, but we'll be using another approach for that that
doesn't require maintaining relational data in perpetuity.

Related: #949
2025-04-03 16:29:19 -07:00
Jamil
fb9f132a49 fix(portal): Interpret missing members as empty list (#8640)
The Google API will often return a missing `members` key alongside a
`200` response from their members API. The documentation here isn't
clear whether this key is expected or not, but since the sync has been
working fine up until #8608, we can only surmise that the missing key in
fact means the group has no members.

This PR updates the Google API client so that a `default_if_missing` can
be passed in which is returned if the API response is missing the JSON
key to fetch.

For the users, groups, and organization units fetches, we consider a
missing key to be an error and we return `{:error, :invalid_response}`
since this most likely indicates an API problem.

For the members endpoint, we consider the missing key to be the empty
set.

Additionally, a bug is fixed that was introduced in #8608 whereupon we
returned `{:error, :retry_later}` for newly-accounted-for API responses,
which would have caused a "sync failed" email to be sent to the admins
on the instance.

Instead, we want to return `{:error, :invalid_response}` which will stop
the sync from progressing, and log it internally.
2025-04-03 11:27:39 -07:00
Jamil
2f7598c648 fix(portal): Delete soft-deleted synced actor_groups (#8638)
The previous migration only accounted for soft-deleted rows that have an
active counterpart.

This fails the new unique index if multiple soft-deleted rows exist for
the same `account_id, provider_id, provider_identifier` combination.

Instead, to appease the new index, we need to delete all soft-deleted
rows where these fields exist.

Related: #8615
2025-04-03 07:21:06 -07:00
Jamil
713ff1e7de chore(portal): Log problematic identity api responses (#8623)
After merging #8608, we discovered that we receive unexpected API
responses on the regular. This adds improved logging to uncover what
exactly these unexpected API responses are.
2025-04-02 14:59:16 -07:00
Jamil
f275bf70d9 fix(portal): Resurrect deleted identities and groups (#8615)
When syncing identities from an identity, we have logic in place that
resurrects any soft-deleted identities in order to maintain their
session history, group memberships and any other relevant data. Users
can be temporarily suspended from their identity provider and then
resumed.

Groups, however, based on cursory research, can never be temporarily
suspended at the identity provider. However, this doesn't mean that we
can't see the group disappear and reappear at a later point in time.
This can happen due to a temporary sync issue, or in the upcoming Group
Filters PR: #8381.

This PR adds more robust testing to ensure we can in fact resurrect
identities as expected.

It also updates the group sync logic to similarly resurrect soft-deleted
groups if they are seen again in a subsequent sync.

To achieve this, we need to update the `UNIQUE CONSTRAINT` used in the
upsert clause during the sync. Before, it was possible for two (or more)
groups to exist with the same provider_identifier and provider_id, if
`deleted_at IS NOT NULL`. Now, we need to ensure that only one group
with the same `account_id, provider_id, provider_identifier` can exist,
since we want to resurrect and not recreate these.

To do this, we use a migration that does the following:

1. Ensures any potentially problematic data is permanently deleted
2. Drops the existing unique constraint
3. Recreates it, omitting `WHERE DELETED_AT IS NULL` from the partial
index.

Based on exploring the production DB data, this should not cause any
issues, but it would be a good idea to double-check before rolling this
out to prod.


Lastly, the final missing piece to the resurrection story is Policies.
This is saved for a future PR since we need to first define the
difference between a policy that was soft-deleted via a sync job, and a
policy that was "perma" deleted by a user.

Related: #8187
2025-04-02 21:12:44 +00:00
Jamil
88c4e723a6 fix(portal): Gracefully handle dir sync error responses (#8608)
When calling the various directory sync endpoints, we had error cases
that matched a few of the possible error scenarios in an appropriate way
by returning either `{:error, :retry_later}` or the `{:error, ...}`
tuples.

However, as we've recently learned in [this
thread](https://firezonehq.slack.com/archives/C069H865MHP/p1743521884037159),
it's possible for identity provider APIs to return all kinds of bogus
data here, and we need a more defensive approach.

The specific issue this PR addresses is the case where we receive a
`2xx` response, but without the expected JSON key in the response body.
That will result in the `list*` functions returning an empty list, which
the calling code paths then use to soft-delete all existing record types
in the DB.

This is wrong. If the JSON response is missing a key we're expecting, we
instead log a warning and return `{:error, :retry_later}`. It's
currently unknown when exactly this happens and why, but with better
monitoring here we'll have a much better picture as to why.
2025-04-02 19:04:43 +00:00
Jamil
8805d906aa chore(portal): Leave notes around sync frequency (#8605)
When reading through these modules, it's helpful to know that the actual
sync data update doesn't occur more often than 10 minutes due to a
database check.
2025-04-01 18:25:33 +00:00
Jamil
936f5ddb01 chore(billing): Enable automatic tax calculation by default (#8552)
When a customer signs up for Starter or Team, we don't enable tax
calculation by default. This means customers can upgrade to Team, start
paying invoices, and we won't collect taxes.

This creates a management issue and possible tax liability since I need
to manually reconcile these.

Instead, since we have Stripe Tax configured on our account, we can
enable automatic tax calculation when the subscription is created. Any
products (Starter/Team/Enterprise) therefore in the subscription will
automatically collect tax appropriately.

In most cases in the US, the tax rate is 0. In EU transactions, for B2B
sales, the tax rate for us is also 0 (reverse charge basis). If we sell
a Team subscription to an individual, however, we need to collect VAT.

There doesn't seem to be a way to block consumer EU transactions in
Stripe, so we'll likely need to register for VAT in the EU if we cross
the reporting threshold.
2025-03-31 13:23:39 +00:00
Jamil
2dbfae9ba9 fix(portal): Use old policy for broadcasting events when updated (#8550)
A regression was introduced in d0f0de0f8d
whereupon we started using the updated policy record for broadcasting
the `delete_policy` and `expire_flows` events. This caused a security
issue because if the actor group changed from `Everyone` to `thomas`,
for example, we'd only expire flows and broadcast policy removal (i.e.
resource removal) events for `thomas`, and `Everyone` would still have
access granted by the old policy.

To fix this, we broadcast the destructive events to the old policy, so
that its `actor_group_id` and `resource_id` are used, and not the new
policy's.

Fixes #8549
2025-03-30 03:26:11 +00:00
Jamil
95d3f765f4 feat(portal): Show Internet Resource in resources/index (#8495)
After removing some of the functionality for viewing the Internet
Resource, customer was confused where to find it again.

This places an `Internet` section in the Resources index page (similar
to Sites page) with a short help text and an action button to view the
Internet Resource.

This also adds a convenient helper that allows us to route to
`/#{account}/resources/internet` for a nicer-looking URL that users can
bookmark if needed.

<img width="1423" alt="Screenshot 2025-03-19 at 11 52 31 PM"
src="https://github.com/user-attachments/assets/f2da1c31-92b2-429e-832f-73ddd0524155"
/>


Fixes #8479
2025-03-26 21:30:11 +00:00
Brian Manifold
3313e7377e feat(portal): Add account delete button (#8487)
Why:

* This commit will allow account admins to send a request through the
Firezone portal to schedule a deletion of their account, rather than
having the account admins email their request manually. Doing this
through the portal allows us to verify that the request actually came
from an admin of the account.
2025-03-19 18:23:32 +00:00
Jamil
595fb7efd9 refactor(portal): Rename resource_cidrs -> device_cidrs (#8482)
I was debugging some of this just now and realized our naming / comments
are incorrect here, so thought I'd open a PR to tidy things up for the
next person reading this.

Resource CIDRs actually occupy the `100.96.0.0/11` range (and IPv6
equivalent), but the portal doesn't generate these.
2025-03-19 01:54:08 +00:00
Brian Manifold
e14e5c4008 refactor(portal): Use appropriate access token for Google IdP (#8478)
Why:

* Previously, when running a directory sync with the Google Workspace
IdP adapter, if a service account had been configured but there was a
problem getting an access token for the service account, the sync job
would fall back to using a personal access token. We no longer want to
rely on any personal access token once a service account has been
configured. This commit will make sure that if a service account is
configured there is no way to fall back to any personal access token.


Fixes #8409
2025-03-18 16:46:08 +00:00
Jamil
366215b1d6 fix(gateway): Prefer setting FIREZONE_ID over /var/lib/firezone (#8475)
When deploying a Gateway from the admin portal UI, we show various
environment variables required for setup. Until now, we've relied on the
`/var/lib/firezone` persistence method for identifying the Gateway.

However, this can cause issues on some systems that don't have writeable
access to /var/lib/firezone, or old versions of systemd that don't
support sandboxed access to this directory.

This PR updates each deployment method to use `FIREZONE_ID` instead
everywhere. Additionally, since the Docker upgrade script needs to
reinvoke the new container using the same arguments (more or less) as
the install, we need to extract the old `/var/lib/firezone/gateway_id`
file out of the existing container if it exists, and try to insert it
into the upgraded container.

Tested both scripts, including upgrades for the Docker script.

Fixes: #8471
2025-03-18 04:08:21 +00:00
Jamil
d143d4dc89 feat(portal): Add changelog link to outdated gateway email (#8458)
It would be useful to have a link to the changelog in our outdated
gateway email.

See https://firezonehq.slack.com/archives/C069H865MHP/p1742088424077639

<img width="638" alt="Screenshot 2025-03-16 at 9 39 22 PM"
src="https://github.com/user-attachments/assets/f67b9b3e-9796-45a9-ae90-26eeabc40740"
/>
2025-03-18 02:43:06 +00:00
Jamil
4ce2f160e3 fix(portal): Allow .local for search_domains (#8472)
This apparently is explicitly used by customers. See
https://firezonehq.slack.com/archives/C08FPHECLUF/p1742221580587719?thread_ts=1741639183.188459&cid=C08FPHECLUF
2025-03-17 20:18:51 +00:00
Jamil
43d084f97f refactor(portal): Enforce internet resource site exclusion (#8448)
Finishes up the Internet Resource migration by enforcing:

- No internet resources in non-internet sites
- No regular resources in internet sites
- Removing the prompt to migrate

~~I've already migrated the existing internet resources in customer's
accounts. No one that was using the internet resource hadn't already
migrated.~~

Edit: I started to head down that path, then decided doing this here in
a data migration was going to be a better approach.

Fixes #8212
2025-03-15 18:25:32 -05:00
Jamil
06aa485e18 ci: Use search_domain for one resource in CI test (#8393)
- Adds a `search_domain` of `httpbin.test` in seeds
- Updates one of our DNS resources under CI test to use this
2025-03-15 13:27:22 +00:00
Jamil
7df1bf2718 feat(portal): Create pgaudit extension (#8435)
[Step
2](https://cloud.google.com/sql/docs/postgres/pg-audit#set-pgaudit-flag-values)
of the pgaudit setup guide for Google Cloud SQL. It would be good to
have detailed pg audit logs on the master application instance in case
things go wrong.

Notably, this prevents erroring out when the `pgaudit` is not available,
which by default, it is. Enabling the `pgaudit` extension for our dev
instance is left as a future endeavor.

Supersedes #5442
2025-03-14 20:04:47 +00:00
Jamil
4cd4c2c6a4 fix(portal): Fix submit button spacing in settings/dns (#8440)
The submit button on the settings -> dns page has a couple UX issues
with the new search domain section:

- It's ambiguous what the `Save` is actually saving
- The spacing makes it look like it's only saving upstream resolvers

This PR introduces a simple fix that address the two issues by:

- Updating the button text to `Save DNS Settings`
- Increasing spacing between submit button and form elements
- Slightly decreasing spacing between the `search domain` and `upstream
resolvers` inputs


<img width="968" alt="Screenshot 2025-03-14 at 12 06 02 AM"
src="https://github.com/user-attachments/assets/651f54c8-3b5f-4747-ad3a-e2ae32eccbf0"
/>


Related #5248
2025-03-14 09:20:29 +00:00