Commit Graph

106 Commits

Author SHA1 Message Date
Jamil
e98aa82e8e fix(portal): respect gateway_group_id filter in REST API (#9840)
Fixes #9815
2025-07-11 19:12:05 +00:00
Jamil
dddd1b57fc refactor(portal): remove flow_activities (#9693)
This has been dead code for a long time. The feature this was meant to
support, #8353, will require a different domain model, views, and user
flows.

Related: #8353
2025-06-27 20:40:25 +00:00
Jamil
0b09d9f2f5 refactor(portal): don't rely on flows.expires_at (#9692)
The `expires_at` column on the `flows` table was never used outside of
the context in which the flow was created in the Client Channel. This
ephemeral state, which is created in the `Domain.Flows.authorize_flow/4`
function, is never read from the DB in any meaningful capacity, so it
can be safely removed.

The `expire_flows_for` family of functions now simply reads the needed
fields from the flows table in order to broadcast `{:expire_flow,
flow_id, client_id, resource_id}` directly to the subscribed entities.

This PR is step 1 in removing the reliance on `Flows` to manage
ephemeral access state. In a subsequent PR we will actually change the
structure of what state is kept in the channel PIDs such that reliance
on this Flows table will no longer be necessary.

Additionally, in a few places, we were referencing a Flows.Show view
that was never available in production, so this dead code has been
removed.

Lastly, the `flows` table subscription and associated hook processing
has been completely removed as it is no longer needed. We've implemented
in #9667 logic to remove publications from removed table subscriptions,
so we can expect to get a couple ingest warnings when we deploy this as
the `Hooks.Flows` processor no longer exists, and the WAL data may have
lingering flows records in the queue. These can be safely ignored.
2025-06-27 18:29:12 +00:00
Jamil
343717b502 refactor(portal): broadcast client struct when updated (#9664)
When a client is updated, we may need to re-initialize it if "breaking"
fields are updated. If non-breaking fields are changed, such as name, we
don't need to re-initialize the client.

This PR also adds a helper `struct_from_params/2` which will create a
schema struct from WAL data in order to type cast any needed data for
convenience. This avoid having to do a DB hit - we _already have the
data from the DB_ - we just need to format and send it.

Related: #9501
2025-06-25 17:04:41 +00:00
Jamil
933d51e3d0 feat(portal): send account_slug in gateway init (#9653)
Adds the `account_slug` to the gateway's `init` message. When the
account slug is changed, the gateway's socket is disconnected using the
same mechanism as gateway deletion, which causes the gateway to
reconnect immediately and receive a new `init`.

Related: #9545
2025-06-24 18:35:06 +00:00
Jamil
c6545fe853 refactor(portal): consolidate pubsub functions (#9529)
We issue broadcasts and subscribes in many places throughout the portal.
To help keep the cognitive overhead low, this PR consolidates all PubSub
functionality to the `Domain.PubSub` module.

This allows for:

- better maintainability
- see all of the topics we use at a glance
- consolidate repeated functionality (saved for a future PR)
- use the module hierarchy to define function names, which feels more
intuitive when reading and sets a convention

We also introduce a `Domain.Events.Hooks` behavior to ensure all hooks
comply with this simple contract, and we also introduce a convention to
standardize on topic names using the module hierarchy defined herein.

Lastly, we add convenience functions to the Presence modules to save a
bit of duplication and chance for errors.

This will make it much easier to maintain PubSub going forward.


Related: #9501
2025-06-15 04:30:57 +00:00
Jamil
cbe33cd108 refactor(portal): move policy events to WAL (#9521)
Moves all of the policy lifecycle events to be broadcasted from the WAL
consumer.

#### Test

- [x] Enable policy
- [x] Disable policy
- [x] Delete policy
- [x] Non-breaking change
- [x] Breaking change


Related: #6294

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
2025-06-14 01:10:09 +00:00
Jamil
c31f51d138 refactor(portal): move resource events to WAL (#9406)
We move the resource events to the WAL system. Notably, we no longer
need `fetch_and_update_breakable` for resource updates, so a bit of
refactoring is included to update the call sites for those.

Additionally, we need to add a `Flow.expire_flows_for_resource_id/1`
function to expire flows from the WAL system. This is now being called
in the WAL event handler. To prevent this from blocking the WAL
consumer/broadcaster, we wrap it with a Task.async. These will be
cleaned up when the lookup table for access is implemented next.

Another thing to note is that we lose the `subject` when moving from
`Flows.expire_flows_for(%Resource{}, subject)` to
`Flows.expire_flows_for_resource_id(resource_id)` when a resource is
deleted or updated by an actor since we respond to this event in the WAL
where that data isn't available. However, we don't actually _use_ the
subject when expiring flows (other than authorize the initial resource
update), so this isn't an issue.

Related: #9501

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Brian Manifold <bmanifold@users.noreply.github.com>
2025-06-11 00:12:45 +00:00
Jamil
38c1de351c refactor(portal): move membership events to WAL (#9388)
Membership events are quite simple to move to the WAL:

- Only one topic is used to determine which client(s) receive updates
for which Actor(s).
- The unsubscribe was removed because it was unused.
- Notably, the N+1 query problem regarding re-evaluating all access
again after each membership is updated is still present. This will be
fixed using a lookup table in the client channel in the last PR to move
events to the WAL.

Related: https://github.com/firezone/firezone/issues/6294
Related: https://github.com/firezone/firezone/issues/8187
2025-06-06 06:23:33 +00:00
Jamil
9c3f6e7b36 refactor(portal): don't send ip_stack for non-DNS resources (#9376)
We always return the `ip_stack` field when rendering resource for both
WebSocket and REST APIs. If the resource's type is not `:dns` then this
will be `nil`.

Related:
https://github.com/firezone/firezone/pull/9303#discussion_r2119681062
2025-06-02 23:16:49 -07:00
Jamil
6fc7d2e4e0 feat(portal): configurable ip stack for DNS resources (#9303)
Some poorly-behaved applications (e.g. mongo) will fail to connect if
they see both IPv4 and IPv6 addresses for a DNS resource, because they
will try to connect to both of them and fail the whole connection setup
if either one is not routable.

To fix this, we need to introduce a knob to allow admins to restrict DNS
resources to only A or AAAA records.


<img width="750" alt="Screenshot 2025-06-02 at 10 48 39 AM"
src="https://github.com/user-attachments/assets/4dbcb6ae-685f-43ee-b9e8-1502b365a294"
/>

<img width="1174" alt="Screenshot 2025-06-02 at 11 05 53 AM"
src="https://github.com/user-attachments/assets/02d0a4b3-e6e8-4b6d-89fa-d3d999b5811e"
/>

---

Related:
https://firezonehq.slack.com/archives/C08KPQKJZKM/p1746720923535349
Related: #9300
Fixes: #9042
2025-06-03 02:24:41 +00:00
Jamil
73c3e2d87b refactor(portal): move gateway events to WAL (#9299)
This PR moves Gateway events to be triggered by the WAL broadcaster.
Some things of note that are cleaned up:

- The gateway `:update` event was never received anywhere (but in a
test) and so has been removed
- The account topic has been removed as it was also never acted upon
anywhere. Presence yes, but topic no
- The group topic has also been removed as it was only used to receive
broadcasted disconnects when a group is deleted, but this was already
handled by the token deletion and so is redundant.
2025-06-01 16:40:28 +00:00
Brian Manifold
a51b35a6b4 refactor(portal): remove created_by_<identity/actor> columns (#9306)
Why:

* Now that we have started using the `created_by_subject` field on
various tables, we no longer need to keep the
`created_by_<identity/actor>` fields. This will help remove a foreign
key reference and will be one step closer to allowing us to hard delete
data rather than soft deleting all data in order to keep foreign key
references like these.
2025-05-30 21:06:35 +00:00
Jamil
6cea0cd6ec refactor(portal): Move client updates to WAL broadcaster (#9288)
Client updates are next on the path to moving more side effects to the
WAL broadcaster. This one has the following notable changes:

- ~~The `actor_clients` pubsub topic were only used to broadcast removal
of clients belonging to an actor; these are no longer needed since we
handle this in the individual removal event~~ EDIT: only the presence is
kept
- The `account_clients:{account_id}` pubsub and presence topic
definition has been moved to `Events.Hooks.Accounts` because these are
broadcasted using the account_id field based on account changes, and
have nothing to do with the client lifecycle


Related: #6294 
Related: #8187
2025-05-29 16:56:08 +00:00
Jamil
7c674ea21c refactor(portal): Move expire_flow to WAL broadcaster (#9286)
Similar to #9285, we move the `expire_flow` event to be broadcasted from
the WAL broadcaster.

Unrelated tests needed to be updated to not expect to receive the
broadcast, and instead check to ensure the record has been updated.

A minor bug is also fixed in the ordering of the `old_data, data`
fields.

Tested manually on dev.

Related: #6294 
Related: #8187
2025-05-29 06:35:03 +00:00
Brian Manifold
12b4a12f26 feat(portal): Add created_by_subject (#9176)
Why:

* We have decided to change the way we will do audit logging. Instead of
soft deleting data and keeping it in the table it was created in, we
will be moving to an audit trail table where various actions will be
recorded in a table/DB specifically for auditing purposes. Due to this
change we need to make sure that we don't have stale/dangling
references. One set of references we keep everywhere is
`created_by_identity_id` and `created_by_actor_id`. Those foreign key
references won't be able to be used after moving to the new audit
system. This commit will allow us to keep that info by pulling the
values and storing the data in a created_by_subject field on the record.
2025-05-20 20:03:46 +00:00
Brian Manifold
dd5a53f686 fix(portal): Fix sign_up to properly populate email (#9105)
Why:

* During the account sign up flow, the email of the first admin was not
being populated in the `email` column on the auth_identities table. This
was due to atoms being passed in the attrs instead of strings to the
`create_identity` function. A migration was also created to backfill the
missing emails in the `auth_identities` table.
2025-05-13 19:49:25 +00:00
Jamil
43d084f97f refactor(portal): Enforce internet resource site exclusion (#8448)
Finishes up the Internet Resource migration by enforcing:

- No internet resources in non-internet sites
- No regular resources in internet sites
- Removing the prompt to migrate

~~I've already migrated the existing internet resources in customer's
accounts. No one that was using the internet resource hadn't already
migrated.~~

Edit: I started to head down that path, then decided doing this here in
a data migration was going to be a better approach.

Fixes #8212
2025-03-15 18:25:32 -05:00
Brian Manifold
d133ee84b7 feat(portal): Add API rate limiting (#8417) 2025-03-13 03:21:09 +00:00
Jamil
6d527c1308 feat(portal): Search domain UI and JSON view (#8401)
- Adds a simple text input to configure search domains ("default DNS
suffix") in the Settings -> DNS page.
- Sends the `search_domain` field as part of the client's `init` message
- Fixes a minor UI alignment inconsistency for the upstream resolvers
field so that the total form width and `New resolver` button width are
the same.


<img width="1137" alt="Screenshot 2025-03-09 at 10 56 56 PM"
src="https://github.com/user-attachments/assets/a1d5a570-8eae-4aa9-8a1c-6aaeb9f4c33a"
/>



Fixes #8365
2025-03-10 17:46:40 +00:00
Jamil
c3a9bac465 feat(portal): Add client endpoints to REST API (#8355)
Adds the following endpoints:

- `PUT /clients/:id` for updating the `name`
- `PUT /clients/:client_id/verify` for verifying a client
- `PUT /clients/:client_id/unverify` for unverifying a client
- `GET /clients` for listing clients in an account
- `GET /clients/:id` for getting a single client
- `DELETE /clients/:id` for deleting a client

Related: #8081
2025-03-05 00:37:01 +00:00
Jamil
e064cf5821 fix(portal): Debounce relays_presence (#8302)
If the websocket connection between a relay and the portal experiences a
temporary network split, the portal will immediately send the
disconnected id of the relay to any connected clients and gateways, and
all relayed connections (and current allocations) will be immediately
revoked by connlib.

This tight coupling is needlessly disruptive. As we've seen in staging
and production logs, relay disconnects can happen randomly, and in the
vast majority of cases immediately reconnect. Currently we see about 1-2
dozen of these **per day**.

To better account for this, we introduce a debounce mechanism in the
portal for `relays_presence` disconnects that works as follows:

- When a relay disconnects, record its `stamp_secret` (this is somewhat
tricky as we don't get this at the time of disconnect - we need to cache
it by relay_id beforehand)
- If the same `relay_id` reconnects again with the same `stamp_secret`
within `relays_presence_debounce_timeout` -> no-op
- If the same `relay_id` reconnects again with a **different**
`stamp_secret` -> disconnect immediately
- If it doesn't reconnect, **then** send the `relays_presence` with the
disconnected_id after the `relays_presence_debounce_timeout`

There are several ways connlib detects a relay is down:

1. Binding requests time out. These happen every 25s, so on average we
don't know a Relay is down for 12.5s + backoff timer.
2. `relays_presence` - this is currently the fastest way to detect
relays are down. With this change, the caveat is we will now detect this
with a delay of `relays_presence_debounce_timer`.

Fixes #8301
2025-03-04 23:56:40 +00:00
Jamil
fee808bc62 chore(portal): Log error for unknown channel messages (#8299)
Instead of crashing, it would make sense to log these and let the
connected entity maintain its WebSocket connection.

This should never happen in practice if we maintain our version
compatibility matrix properly, but it will help reduce the blast radius
of a channel message bug that happens to slip out into the wild.

Fixes #4679
2025-03-03 21:21:39 +00:00
Jamil
cb0bf44815 chore: Remove ability to create GCP log sinks (#8298)
This has long since been removed in the Clients.
2025-02-28 20:57:21 +00:00
Jamil
e03047d549 feat(portal): Send gateway ipv4 and ipv6 to client (#8291)
In order to properly handle SRV and TXT records on the clients, we need
to be able to pick a Gateway using the initial query itself. After that,
we need to know the Gateway Tunnel IPs we're connecting to so we can
have the query perform the lookup.

Fixes #8281
2025-02-28 03:52:27 +00:00
Brian Manifold
bc150156ce fix(portal): Update gateway channel to process resource_update (#8280)
Why:

* After merging #8267 it was discovered that there was a race condition
that allowed a `resource_create` message to end up at the Gateway
Channel process. Previously, this message would not have ever arrived,
because we were replacing Resource IDs when a breaking change was made,
but since that is no longer the case, it is possible that a connection
could be established between the time the `delete_resource` and
`create_resource` messages are sent and the `create_resource` would end
up at the Gateway Channel process. This commit adds a no-op handler to
make sure the message gets processed without throwing an error.
2025-02-27 01:46:13 +00:00
Brian Manifold
d0f0de0f8d refactor(portal): Allow breaking changes in Resources/Policies (#8267)
Why:

* Rather than using a persistent_id field in Resources/Policies, it was
decided that we should allow "breaking changes" to these entities. This
means that Resources/Policies will now be able to update all fields on
the schema without changing the primary key ID of the entity.
* This change will greatly help the API and Terraform provider
development.

@jamilbk, would you like me to put a migration in this PR to actually
get rid of all of the existing soft deleted entities?

@thomaseizinger, I tagged you on this, because I wanted to make sure
that these changes weren't going to break any expectations in the client
and/or gateways.

---------

Signed-off-by: Brian Manifold <bmanifold@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2025-02-26 17:05:34 +00:00
Brian Manifold
eea7079776 fix(portal): Catch seat limit error in API fallback controller (#7783)
Why:

* The fallback controller in the API was not catching `{:error,
:seat_limit_reached}` being returned and was then generating a 500
response when this happened. This commit adds the condition in the
fallback controller and adds a new template for a more specific error
message in the returned JSON.
2025-01-17 00:13:45 +00:00
Brian Manifold
f114bc95cd refactor(portal): Add email as separate column on auth_identities table (#7472)
Why:

* Currently, when using the API, a user has no way of easily identifying
what identities they are pulling back as the response only includes the
`provider_identifier` which for most of our AuthProviders is an ID for
the IdP and not an email address. Along with that, when adding users to
an OIDC provider within Firezone, there is no check for whether or not
an identity has already been added with a given email address. By
creating a separate email column on the `auth_identities` table, it will
be very straight forward to know whether an email address exists for a
given identity, return it in an API response and allow the admin of a
Firezone account to track users (Identities) by email rather than IdP
identifier.

Fixes #7392
2024-12-13 17:26:47 +00:00
Brian Manifold
9711cf56c1 fix(portal): Fix update API endpoint for resources (#7493)
Why:

* The API endpoint for updating Resources was using
`Resources.fetch_resource_by_id_or_persistent_id`, however that function
was fetching all Resources, which included deleted Resources. In order
to prevent an API user from attempting to update a Resource that is
deleted, a new function was added to fetch active Resources only.

Fixes: #7492
2024-12-12 22:51:28 +00:00
Brian Manifold
06791d2d05 refactor(portal): API persistent IDs (#7182)
In order for the firezone terraform provider to work properly, the
Resources and Policies need to be able to be referenced by their
`persistent_id`, specifically in the portal API.
2024-11-07 20:45:56 +00:00
Thomas Eizinger
ce1e59c9fe feat(connlib): implement idempotent control protocol for gateway (#6941)
This PR implements the new idempotent control protocol for the gateway.
We retain backwards-compatibility with old clients to allow admins to
perform a disruption-free update to the latest version.

With this new control protocol, we are moving the responsibility of
exchanging the proxy IPs we assigned to DNS resources to a p2p protocol
between client and gateway. As a result, wildcard DNS resources only get
authorized on the first access. Accessing a new domain within the same
resource will thus no longer require a roundtrip to the portal.

Overall, users will see a greatly decreased connection setup latency. On
top of that, the new protocol will allow us to more easily implement
packet buffering which will be another UX boost for Firezone.
2024-10-18 15:59:47 +00:00
Andrew Dryga
b3c2e54460 feat(portal): New version of the WS control protocol (#6761)
TODOs:
- [x] Switch to sending messages instead of replies
- [ ] Do not hide pre-filtered resources and render them with an error
instead (in case we will want to expose that on a client later)
- [x] Figure out how to generate PSK so that it stays across WS
connections
2024-10-16 10:57:54 -06:00
Andrew Dryga
1abfa10fb7 fix(portal): UX improvements (#7013)
This PR accumulates lots of small UX fixes from #6645.

---------

Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
2024-10-14 11:32:44 -06:00
Andrew Dryga
3652839b1a feat(portal): Allow updating policies and resources (#6690)
Now you can "edit" any fields on the policy, when one of fields that
govern the access is changed (resource, actor group or conditions) a new
policy will be created and an old one is deleted. This will be
broadcasted to the clients right away to minimize downtime. New policy
will have it's own flows to prevent confusion while auditing. To make
experience better for external systems we added `persistent_id` that
will be the same across all versions of a given policy.

Resources work in a similar fashion but when they are replaced we will
also replace all corresponding policies.

An additional nice effect of this approach is that we also got
configuration audit log for resources and policies.

Fixes #2504
2024-09-18 13:06:05 -06:00
Andrew Dryga
e72bb05436 feat(portal): Reinit client when itself or a known group were updated (#6609)
This allows us to push a whole set of resources at once when client was
verified/unverified/updated/blocked.

Closes #6560
2024-09-05 16:51:47 -07:00
Andrew Dryga
1dae0a3ed5 fix(portal): Do not send resources not connected to any sites down to clients (#6512)
This is only possible for internet resources, any other resource will
always have at least one site connected at all times.

Closes #6510
2024-08-30 14:11:48 -06:00
Andrew Dryga
2a808292d0 feat(portal): Add blocked_tx_bytes to flow activity metrics (#6487)
Closes #4787
2024-08-29 14:21:51 -06:00
Andrew
7c6eac6af5 Hotfix: crash while rendering internet resources for gateways 2024-08-28 10:44:13 -06:00
Thomas Eizinger
35017537c7 feat(gateway): allow out-of-order allow_access requests (#6403)
Currently, the gateway requires a strict ordering of first receiving a
`request_connection` message, following by multiple `allow_access`
messages. Additionally, access can be granted as part of the initial
`request_connection` message too.

This isn't an ideal design. Setting up a new connection is infallible,
all we need to do is send our ICE credentials back to the client.
However, untangling that will require a bit more effort.

Starting with #6335, following this strict order on the client is a more
difficult. Whilst we can send them in order, it is harder to maintain
those ordering guarantees across all our systems.

To avoid this, we change the gateway to perform an upsert for its local
ACLs for a client. In case that an `allow_access` call would somehow get
to the gateway earlier, we can simply already create the `Peer` and only
set up the actual connection later.

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-08-28 13:10:06 +00:00
Andrew Dryga
2d083379c6 feat(portal): Internet resources (#6299)
They will be sent in the API for connlib 1.3 and above.

I think in future we can make a whole menu section called "Internet
Security" which will be a specialized UI for the new resource type (and
now show it in Resources list) to improve the user experience around it.

Closes #5852

---------

Signed-off-by: Andrew Dryga <andrew@dryga.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-08-27 23:11:17 +00:00
Andrew Dryga
8e4a4a7b05 feat(portal): Pre-check constraint conformation on client connect (#6431)
Closes #6216
2024-08-26 15:30:46 -06:00
Andrew Dryga
25a22b4780 chore(portal): Test that we only render resources once in WS API (#6394) 2024-08-21 17:16:19 -06:00
Andrew Dryga
c922ea29e9 fix(portal): Fix DNS wildcard support for Gateways (#6270) 2024-08-12 12:54:20 -06:00
Andrew Dryga
00b93f6b82 feat(portal): Wildcard dns with backwards compatibility (#6214)
If a new resource is created that will use format not supported by
previous client versions we temporarily show a warning:
<img width="683" alt="Screenshot 2024-08-07 at 2 28 57 PM"
src="https://github.com/user-attachments/assets/bbfdfc96-0c4b-4226-93c5-bc2b5fdb9d30">

It will also be excluded from `resources` list for older clients (below
1.2).

---------

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-08-10 18:25:24 +00:00
Andrew Dryga
bf06534caf fix(portal): Prevent races during relay selection by only using the ones connected for more than 5 seconds ago (#6111)
Closes #6099
Should push #6109 to not being needed short term.
2024-08-02 11:10:40 -06:00
Brian Manifold
bdc4d85afa fix(api): fix generated openapi spec (#6008)
(External contribution)

Hi, first thanks to @bmanifold for his awesome work! I've not yet tested
the API but here is a first PR fixing various small mistakes in the
generated openapi spec:

Schema names cannot contain spaces
Add missing path parameters in the spec
Remove duplicated endpoint for creating an identity (not sure about
that, I'll let you check)
If you want to validate the generated spec you can paste it here:
https://editor.swagger.io/ (or at the bottow of your swagger ui)

Please review commit by commit

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Antoine Labarussias <antoinelabarussias@gmail.com>
2024-07-24 15:59:15 +00:00
Brian Manifold
79c815fbbc feat(portal): Add REST API (#5579)
Why:

* In order to manage a large number of Firezone Sites, Resources,
Policies, etc... a REST API is needed as clicking through the UI is too
time consuming, as well as prone to error. By providing a REST API
Firezone customers will be able to manage things within their Firezone
accounts with code.
2024-07-20 04:20:43 +00:00
Andrew Dryga
c9a9c1864a fix(portal): Update client identity on every connection (#5697)
This identity must track the last sign in method used by the client

Closes #5633
2024-07-03 13:17:06 -06:00
Andrew Dryga
cfe777f389 fix(portal): Do not crash WebSocket when client version is invalid (#5525) 2024-06-26 18:50:43 -06:00