When filters are updated for a Resource, we need to first adapt the
resource before rendering it down to the Gateway. Otherwise, the gateway
may see a Resource that does not match its expected schema.
Napkin math shows that we can save substantial memory (~3x or more) on
the API nodes as connected clients/gateways grow if we just store the
fields we need in order to keep the client and gateway state maintained
in the channel pids.
To facilitate this, we create new `Cacheable` structs that represent
their `Domain` cousins, which use byte arrays for `id`s and strip out
unused fields.
Additionally, all business logic involved with maintaining these caches
is now contained within two modules: `Domain.Cache.Client` and
`Domain.Cache.Gateway`, and type specs have been added to aid in static
analysis and code documentation.
Comprehensive testing is now added not only for the cache modules, but
for their associated channel modules as well to ensure we handle
different kinds of edge cases gracefully.
The `Events` nomenclature was renamed to `Changes` to better name what
we are doing: Change-Data-Capture.
Lastly, the following related changes are included in this PR since they
were "in the way" so to speak of getting this done:
- We save the last received LSN in each channel and drop the `change`
with a warning if we receive it twice in a row, or we receive it out of
order
- The client/gateway version compatibility calculations have been moved
to `Domain.Resources` and `Domain.Gateways` and have been simplified to
make them easier to understand and maintain going forward.
Related: #10174Fixes: #9392Fixes: #9965Fixes: #9501Fixes: #10227
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This fixes a simple logic bug where we were mistakenly reacting to a
flow deletion event where flows still existed in the cache by sending
`reject_access`. This fixes that bug, and adds more comprehensive
logging to help diagnose issues like this more quickly in the future.
This PR also fixes the following issues found during the investigation:
- We were redundantly reacting to Token deletion in the channel pids.
This is unnecessary: we send a global socket disconnect from the Token
hook module instead.
- We had a bug that would crash the WAL consumer if a "global" token
(i.e. relay) was deleted or expired - these have no `account_id`.
- We now always use `min(max(all_conforming_polices_expiration),
token.expires_at)` when setting expiration on a new flow to minimize the
possibility for access churn.
- We now check to ensure the token and gateway are still undeleted when
re-authorizing a given flow. This prevents us from failing to send
`reject_access` when a token or gateway is deleted corresponding to a
flow, but the other entities would have granted access.
Related: https://firezone.statuspage.io/incidents/xrsm13tml3dh
Related: #10068
Related: #9501
The full `account` struct is only used to render the client's interface,
and doesn't need to be stored in the `client` struct when the `subject`
struct already tracks it.
Time-based policy conditions are tricky. When they authorize a flow, we
correctly tell the Gateway to remove access when the time window
expires.
However, we do nothing on the client to reset the connectivity state.
This means that whenever the window of time of access was re-entered,
the client would essentially never be able to connect to it again until
the resource was toggled.
To fix this, we add a 1-minute check in the client channel that
re-checks allowed resources, and updates the client state with the
difference. This means that policies that have time-based conditions are
only accurate to the minute, but this is how they're presented anyhow.
For good measure, we also add a periodic job that runs every minute to
delete expired Flows. This will propagate to the Gateway where, if the
access for a particular client-resource is determined to be actually
gone, will receive `reject_access`.
Zooming out a bit, this PR furthers the theme that:
- Client channels react to underlying resource / policy / membership
changes directly, while
- Gateway channels react primarily to flows being deleted, or the
downstream effects of a prior client authorization
Whenever a client requests a connection to gateway, we need to generate
a preshared key that will be used for the underlying WireGuard tunnel.
When the connection setup broke or otherwise was lost, _after_ the
gateway the received the authorize_flow call, but _before_ the client
could receive the response (and initiate a tunnel), we would have to
wait until an ICE timeout occurred in order to reset state on the
gateway.
This is because the psk was not used to determine if this was a _new_
flow authorization. So the old authorization would be matched, and the
client would never be able to connect, since its tunnel was using the
new psk, and the gateway the old.
To fix this, we generate a secure random 32-byte `psk_base` on each
client and gateway. When a client wishes to connect to a gateway, we
compute the WireGuard preshared key as an HMAC over these two inputs.
This fixes the issue by ensuring that subsequent flow authorization
requests from a particular client to a particular gateway will yield the
same psk.
Related: #9999
Related: https://github.com/firezone/infra/issues/99
The `flows` table tracks authorizations we've made for a resource and
persists them, so that we can determine which authorizations are still
valid across deploys or hiccups in the control plane connections.
Before, when the "in-use" authorization for a resource was deleted, we
would have flapped the resource in the client, and sent `reject_access`
to the gateway. However, that would cause issues in the following edge
case:
- Client is currently connected to Resource A through Policy B
- Client websocket goes down
- Policy B is created for Resource A (for another actor group), and
Policy A is deleted by admin
- Client reconnects
- Client sees that its resource list is the same
- Gateway has since received `reject_access` because no new flows were
created for this client-resource combination
To prevent this from happening, we now try to "reauthorize" the flow
whenever the last cached flow is removed for a particular
client-resource pair. This avoids needing to toggle the resource on the
client since we won't have sent `reject_access` to the gateway.
Before:
- When a flow was deleted, we flapped the resource on the client, and
sent `reject_access` naively for the flow's `{client_id, resource_id}`
pair on the gateway. This resulted in lots of unneeded resource flappage
on the client whenever bulk flow deletions happened.
After:
- When a flow is deleted, we check if this is an active flow for the
client. If so, we flap the resource then in order to trigger generation
of a new flow. If access was truly affected, that results in a loss of a
resource, we will push `resource_deleted` for the update that triggered
the flow deletion (for example the resource/policy removal). On the
gateway, we only send `reject_access` if it was the last flow granting
access for a particular `client/resource` tuple.
Why:
- While the access state is still correct in the previous
implementation, we run the possibility of pushing way too many resource
flaps to the client in an overly eager attempt to remove access the
client may not have access to.
cc @thomaseizinger
Related:
https://firezonehq.slack.com/archives/C08FPHECLUF/p1753101115735179
When changes occur in the Firezone DB that trigger side effects, we need
some mechanism to broadcast and handle these.
Before, the system we used was:
- Each process subscribes to a myriad of topics related to data it wants
to receive. In some cases it would subscribe to new topics based on
received events from existing topics (I.e. flows in the gateway
channel), and sometimes in a loop. It would then need to be sure to
_unsubscribe_ from these topics
- Handle the side effect in the `after_commit` hook of the Ecto function
call after it completes
- Broadcast only a simply (thin) event message with a DB id
- In the receiver, use the id(s) to re-evaluate, or lookup one or many
records associated with the change
- After the lookup completes, `push` the relevant message(s) to the
LiveView, `client` pid, or `gateway` pid in their respective channel
processes
This system had a number of drawbacks ranging from scalability issues to
undesirable access bugs:
1. The `after_commit` callback, on each App node, is not globally
ordered. Since we broadcast a thin event schema and read from the DB to
hydrate each event, this meant we had a `read after write` problem in
our event architecture, leading to the potential for lost updates. Case
in point: if a policy is updated from `resource_id-1` to
`resource_id-2`, and then back to `resource_id-1`, it's possible that,
given the right amount of delay, the gateway channel will receive two
`reject_access` events for `resource_id-1`, as opposed to one for
`resource_id-1` and one for `resource_id-2`, leading to the potential
for unauthorized access.
1. It was very difficult to ensure that the correct topics were being
subscribed to and unsubscribed from, and the correct number of times,
leading to maintenance issues for other engineers.
1. We had a nasty N+1 query problem whenever memberships were added or
removed that resolved in essentially all access related to that
membership (so all Policies touching its actor group) to be
re-evaluated, and broadcasted. This meant that any bulk addition or
deletion of memberships would generate so many queries that they'd
timeout or consume the entire connection pool.
1. We had no durability for side-effect processing. In some places, we
were iterating over many returned records to send broadcasts.
Broadcasting is not a zero-time operation, each call takes a small
amount of CPU time to copy the message into the receiver's mailbox. If
we deployed while this was happening, the state update would be lost
forever. If this was a `reject_access` for a Gateway, the Gateway would
never remove access for that particular flow.
1. On each flow authorization, we needed to hit `us-east1` not only to
"authorize" the flow, but to log it as well. This incurs latency
especially for users in other parts of the world, which happens on
_each_ connection setup to a new resource.
1. Since we read and re-authorize access due to the thin events
broadcasted from side effects, we risk hitting thundering herd problems
(see the N+1 query problem above) where a single DB change could result
in all receivers hitting the DB at once to "hydrate" their
processing.ion
1. If an administrator modifies the DB directly, or, if we need to run a
DB migration that involves side effects, they'll be lost, because the
side effect triggers happened in `after_commit` hooks that are only
available when querying the DB through Ecto. Manually deleting (or
resurrecting) a policy, for example, would not have updated any
connected clients or gateways with the new state.
To fix all of the above, we move to the system introduced in this PR:
- All changes are now serialized (for free) by Postgres and broadcasted
as a single event stream
- The number of topics has been reduced to just one, the `account_id` of
an account. All receivers subscribe to this one topic for the lifetime
of their pid and then only filter the events they want to act upon,
ignoring all other messages
- The events themselves have been turned into "fat" structs based on the
schemas they present. By making them properly typed, we can apply things
like the existing Policy authorizer functions to them as if we had just
fetched them from the DB.
- All flow creation now happens in memory and doesn't not need to incur
a DB hit in `us-east1` to proceed.
- Since clients and gateways now track state in a push-based manner from
the DB, this means very few actual DB queries are needed to maintain
state in the channel procs, and it also means we can be smarter about
when to send `resource_deleted` and `resource_created_or_updated`
appropriately, since we can always diff between what the client _had_
access to, and what they _now_ have access to.
- All DB operations, whether they happen from the application code, a
`psql` prompt, or even via Google SQL Studio in the GCP console, will
trigger the _same_ side effects.
- We now use a replication consumer based off Postgres logical decoding
of the write-ahead log using a _durable slot_. This means that Postgres
will retain _all events_ until they are acknowledged, giving us the
ability to ensure at-least-once processing semantics for our system.
Today, the ACK is simply, "did we broadcast this event successfully".
But in the future, we can assert that replies are received before we
acknowledge the event as processed back to Postgres.
The tests in this PR have been updated to pass given the refactor.
However, since we are tracking more state now in the channel procs, it
would be a good idea to add more tests for those edge cases. That is
saved as a later PR because (1) this one is already huge, and (2) we
need to get this out to staging to smoke test everything anyhow.
Fixes: #9908Fixes: #9909Fixes: #9910Fixes: #9900
Related: #9501
As a followup to #9856, after talking with @bmanifold, we determined
using the public_key as the username for TURN credentials is a safer bet
because:
- It's by definition public and therefore does not need to be obfuscated
- It's shorter-lived than the token, especially for the gateway
- It essentially represents the data plane connection for client/gateway
and naturally rotates along with the key state for those
When giving TURN credentials to clients and gateways, it's important
that they remain consistent across hiccups in the portal connection so
that relayed connections are not interrupted during a deploy, or if the
user's internet is flaky, or the GCP load balancer decides to disconnect
the client/gateway.
Prior to this PR, that was not the case because we essentially tied TURN
credentials, required for data plane packet flows, to the WebSocket
connection, a control plane element. This happened because we generated
random `expires_at` and `salt` elements on _each_ connection to the
portal.
Instead, what we do now is make these reproducible and tied to the auth
token by hashing then base64-encoding it. The expiry is tied to the
auth-token's expiry.
Fixes#9856
This has been dead code for a long time. The feature this was meant to
support, #8353, will require a different domain model, views, and user
flows.
Related: #8353
The `expires_at` column on the `flows` table was never used outside of
the context in which the flow was created in the Client Channel. This
ephemeral state, which is created in the `Domain.Flows.authorize_flow/4`
function, is never read from the DB in any meaningful capacity, so it
can be safely removed.
The `expire_flows_for` family of functions now simply reads the needed
fields from the flows table in order to broadcast `{:expire_flow,
flow_id, client_id, resource_id}` directly to the subscribed entities.
This PR is step 1 in removing the reliance on `Flows` to manage
ephemeral access state. In a subsequent PR we will actually change the
structure of what state is kept in the channel PIDs such that reliance
on this Flows table will no longer be necessary.
Additionally, in a few places, we were referencing a Flows.Show view
that was never available in production, so this dead code has been
removed.
Lastly, the `flows` table subscription and associated hook processing
has been completely removed as it is no longer needed. We've implemented
in #9667 logic to remove publications from removed table subscriptions,
so we can expect to get a couple ingest warnings when we deploy this as
the `Hooks.Flows` processor no longer exists, and the WAL data may have
lingering flows records in the queue. These can be safely ignored.
When a client is updated, we may need to re-initialize it if "breaking"
fields are updated. If non-breaking fields are changed, such as name, we
don't need to re-initialize the client.
This PR also adds a helper `struct_from_params/2` which will create a
schema struct from WAL data in order to type cast any needed data for
convenience. This avoid having to do a DB hit - we _already have the
data from the DB_ - we just need to format and send it.
Related: #9501
Adds the `account_slug` to the gateway's `init` message. When the
account slug is changed, the gateway's socket is disconnected using the
same mechanism as gateway deletion, which causes the gateway to
reconnect immediately and receive a new `init`.
Related: #9545
We issue broadcasts and subscribes in many places throughout the portal.
To help keep the cognitive overhead low, this PR consolidates all PubSub
functionality to the `Domain.PubSub` module.
This allows for:
- better maintainability
- see all of the topics we use at a glance
- consolidate repeated functionality (saved for a future PR)
- use the module hierarchy to define function names, which feels more
intuitive when reading and sets a convention
We also introduce a `Domain.Events.Hooks` behavior to ensure all hooks
comply with this simple contract, and we also introduce a convention to
standardize on topic names using the module hierarchy defined herein.
Lastly, we add convenience functions to the Presence modules to save a
bit of duplication and chance for errors.
This will make it much easier to maintain PubSub going forward.
Related: #9501
We move the resource events to the WAL system. Notably, we no longer
need `fetch_and_update_breakable` for resource updates, so a bit of
refactoring is included to update the call sites for those.
Additionally, we need to add a `Flow.expire_flows_for_resource_id/1`
function to expire flows from the WAL system. This is now being called
in the WAL event handler. To prevent this from blocking the WAL
consumer/broadcaster, we wrap it with a Task.async. These will be
cleaned up when the lookup table for access is implemented next.
Another thing to note is that we lose the `subject` when moving from
`Flows.expire_flows_for(%Resource{}, subject)` to
`Flows.expire_flows_for_resource_id(resource_id)` when a resource is
deleted or updated by an actor since we respond to this event in the WAL
where that data isn't available. However, we don't actually _use_ the
subject when expiring flows (other than authorize the initial resource
update), so this isn't an issue.
Related: #9501
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Brian Manifold <bmanifold@users.noreply.github.com>
Membership events are quite simple to move to the WAL:
- Only one topic is used to determine which client(s) receive updates
for which Actor(s).
- The unsubscribe was removed because it was unused.
- Notably, the N+1 query problem regarding re-evaluating all access
again after each membership is updated is still present. This will be
fixed using a lookup table in the client channel in the last PR to move
events to the WAL.
Related: https://github.com/firezone/firezone/issues/6294
Related: https://github.com/firezone/firezone/issues/8187
This PR moves Gateway events to be triggered by the WAL broadcaster.
Some things of note that are cleaned up:
- The gateway `:update` event was never received anywhere (but in a
test) and so has been removed
- The account topic has been removed as it was also never acted upon
anywhere. Presence yes, but topic no
- The group topic has also been removed as it was only used to receive
broadcasted disconnects when a group is deleted, but this was already
handled by the token deletion and so is redundant.
Why:
* Now that we have started using the `created_by_subject` field on
various tables, we no longer need to keep the
`created_by_<identity/actor>` fields. This will help remove a foreign
key reference and will be one step closer to allowing us to hard delete
data rather than soft deleting all data in order to keep foreign key
references like these.
Client updates are next on the path to moving more side effects to the
WAL broadcaster. This one has the following notable changes:
- ~~The `actor_clients` pubsub topic were only used to broadcast removal
of clients belonging to an actor; these are no longer needed since we
handle this in the individual removal event~~ EDIT: only the presence is
kept
- The `account_clients:{account_id}` pubsub and presence topic
definition has been moved to `Events.Hooks.Accounts` because these are
broadcasted using the account_id field based on account changes, and
have nothing to do with the client lifecycle
Related: #6294
Related: #8187
Similar to #9285, we move the `expire_flow` event to be broadcasted from
the WAL broadcaster.
Unrelated tests needed to be updated to not expect to receive the
broadcast, and instead check to ensure the record has been updated.
A minor bug is also fixed in the ordering of the `old_data, data`
fields.
Tested manually on dev.
Related: #6294
Related: #8187
Now that the WAL consumer has been dry running in production for some
time, we can begin moving events over to it.
We start with a relatively simple case: the account `config_changed`
event.
Since side effects now happen decoupled from the actual record updates,
testing is updated in this PR:
- We don't expect broadcasts to happen in the `accounts_test.exs` -
these context modules are now solely responsible for managing updates to
records and will no longer need to worry about side effects (in the
typical case) like subscribe and broadcast
- The Event hooks module now contains all logic related to processing
side effects for a particular account update.
The net effect is that we now have dedicated module and tests for side
effects, starting with `accounts`.
Related: #6294
Related: #8187
Why:
* During the account sign up flow, the email of the first admin was not
being populated in the `email` column on the auth_identities table. This
was due to atoms being passed in the attrs instead of strings to the
`create_identity` function. A migration was also created to backfill the
missing emails in the `auth_identities` table.
There was slight API change in the way LoggerJSON's configuration is
generation, so I took the time to do a little fixing and cleanup here.
Specifically, we should be using the `new/1` callback to create the
Logger config which fixes the below exception due to missing config
keys:
```
FORMATTER CRASH: {report,[{formatter_crashed,'Elixir.LoggerJSON.Formatters.GoogleCloud'},{config,[{metadata,{all_except,[socket,conn]}},{redactors,[{'Elixir.LoggerJSON.Redactors.RedactKeys',[<<"password">>,<<"secret">>,<<"nonce">>,<<"fragment">>,<<"state">>,<<"token">>,<<"public_key">>,<<"private_key">>,<<"preshared_key">>,<<"session">>,<<"sessions">>]}]}]},{log_event,#{meta => #{line => 15,pid => <0.308.0>,time => 1744145139650804,file => "lib/logger.ex",gl => <0.281.0>,domain => [elixir],application => libcluster,mfa => {'Elixir.Cluster.Logger',info,2}},msg => {string,<<"[libcluster:default] connected to :\"web@web.cluster.local\"">>},level => info}},{reason,{error,{badmatch,[{metadata,{all_except,[socket,conn]}},{redactors,[{'Elixir.LoggerJSON.Redactors.RedactKeys',[<<"password">>,<<"secret">>,<<"nonce">>,<<"fragment">>,<<"state">>,<<"token">>,<<"public_key">>,<<"private_key">>,<<"preshared_key">>,<<"session">>,<<"sessions">>]}]}]},[{'Elixir.LoggerJSON.Formatters.GoogleCloud',format,2,[{file,"lib/logger_json/formatters/google_cloud.ex"},{line,148}]}]}}]}
```
Supersedes #8714
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
- Attaches the Sentry Logging hook in each of [api, web, domain]
- Removes errant Sentry logging configuration in config/config.exs
- Fixes the exception logger to default to logging exceptions, use
`skip_sentry: true` to skip
Tested successfully in dev. Hopefully the cluster behaves the same way.
Fixes#8639
- Adds a simple text input to configure search domains ("default DNS
suffix") in the Settings -> DNS page.
- Sends the `search_domain` field as part of the client's `init` message
- Fixes a minor UI alignment inconsistency for the upstream resolvers
field so that the total form width and `New resolver` button width are
the same.
<img width="1137" alt="Screenshot 2025-03-09 at 10 56 56 PM"
src="https://github.com/user-attachments/assets/a1d5a570-8eae-4aa9-8a1c-6aaeb9f4c33a"
/>
Fixes#8365
Adds the following endpoints:
- `PUT /clients/:id` for updating the `name`
- `PUT /clients/:client_id/verify` for verifying a client
- `PUT /clients/:client_id/unverify` for unverifying a client
- `GET /clients` for listing clients in an account
- `GET /clients/:id` for getting a single client
- `DELETE /clients/:id` for deleting a client
Related: #8081
If the websocket connection between a relay and the portal experiences a
temporary network split, the portal will immediately send the
disconnected id of the relay to any connected clients and gateways, and
all relayed connections (and current allocations) will be immediately
revoked by connlib.
This tight coupling is needlessly disruptive. As we've seen in staging
and production logs, relay disconnects can happen randomly, and in the
vast majority of cases immediately reconnect. Currently we see about 1-2
dozen of these **per day**.
To better account for this, we introduce a debounce mechanism in the
portal for `relays_presence` disconnects that works as follows:
- When a relay disconnects, record its `stamp_secret` (this is somewhat
tricky as we don't get this at the time of disconnect - we need to cache
it by relay_id beforehand)
- If the same `relay_id` reconnects again with the same `stamp_secret`
within `relays_presence_debounce_timeout` -> no-op
- If the same `relay_id` reconnects again with a **different**
`stamp_secret` -> disconnect immediately
- If it doesn't reconnect, **then** send the `relays_presence` with the
disconnected_id after the `relays_presence_debounce_timeout`
There are several ways connlib detects a relay is down:
1. Binding requests time out. These happen every 25s, so on average we
don't know a Relay is down for 12.5s + backoff timer.
2. `relays_presence` - this is currently the fastest way to detect
relays are down. With this change, the caveat is we will now detect this
with a delay of `relays_presence_debounce_timer`.
Fixes#8301
Instead of crashing, it would make sense to log these and let the
connected entity maintain its WebSocket connection.
This should never happen in practice if we maintain our version
compatibility matrix properly, but it will help reduce the blast radius
of a channel message bug that happens to slip out into the wild.
Fixes#4679
In order to properly handle SRV and TXT records on the clients, we need
to be able to pick a Gateway using the initial query itself. After that,
we need to know the Gateway Tunnel IPs we're connecting to so we can
have the query perform the lookup.
Fixes#8281
Why:
* After merging #8267 it was discovered that there was a race condition
that allowed a `resource_create` message to end up at the Gateway
Channel process. Previously, this message would not have ever arrived,
because we were replacing Resource IDs when a breaking change was made,
but since that is no longer the case, it is possible that a connection
could be established between the time the `delete_resource` and
`create_resource` messages are sent and the `create_resource` would end
up at the Gateway Channel process. This commit adds a no-op handler to
make sure the message gets processed without throwing an error.
Why:
* Rather than using a persistent_id field in Resources/Policies, it was
decided that we should allow "breaking changes" to these entities. This
means that Resources/Policies will now be able to update all fields on
the schema without changing the primary key ID of the entity.
* This change will greatly help the API and Terraform provider
development.
@jamilbk, would you like me to put a migration in this PR to actually
get rid of all of the existing soft deleted entities?
@thomaseizinger, I tagged you on this, because I wanted to make sure
that these changes weren't going to break any expectations in the client
and/or gateways.
---------
Signed-off-by: Brian Manifold <bmanifold@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
The applications within our umbrella are all joined into a single Erlang
cluster, and logger configuration is applied already to the entire
umbrella.
As such, registering the Sentry log handler in each application's
startup routine triggers duplicate handlers to be registered for the
cluster, resulting in warnings like this in GCP:
```
Event dropped due to being a duplicate of a previously-captured event.
```
As such, we can move the log handler configuration to the top-level
`:logger` key, under the `:logger` subkey for configuring a single
handler. We then load this handler config in the `domain` app only and
it applies to the entire cluster.
This adds https://github.com/getsentry/sentry-elixir to the portal for
automatic process crash and exception trace reporting.
It also configures Logger reporting for the `warning` level and higher,
and sets the data scrubbing rules to allow all Logger metadata keys
(`logger_metadata.*` in the Sentry project settings).
Lastly, it configures automatic HTTP error reporting by tying into the
`api` and `web` endpoint modules with a custom `plug` middleware so we
get automatic reporting of unsuccessful Phoenix responses.
It is expected this will be noisy when we first deploy and we'll need to
tune it down a bit. This is the same approach used with other Sentry
platforms.
Why:
* The fallback controller in the API was not catching `{:error,
:seat_limit_reached}` being returned and was then generating a 500
response when this happened. This commit adds the condition in the
fallback controller and adds a new template for a more specific error
message in the returned JSON.
Why:
* Currently, when using the API, a user has no way of easily identifying
what identities they are pulling back as the response only includes the
`provider_identifier` which for most of our AuthProviders is an ID for
the IdP and not an email address. Along with that, when adding users to
an OIDC provider within Firezone, there is no check for whether or not
an identity has already been added with a given email address. By
creating a separate email column on the `auth_identities` table, it will
be very straight forward to know whether an email address exists for a
given identity, return it in an API response and allow the admin of a
Firezone account to track users (Identities) by email rather than IdP
identifier.
Fixes#7392