Commit Graph

309 Commits

Author SHA1 Message Date
Jamil
8a7f248dda fix(portal): ignore expected replication connection failures (#9003)
These are expected during deploys, so don't log them as errors. If the
Supervisor fails to start us after exhausting all attempts, it will log
an error.
2025-05-02 00:45:02 +00:00
Jamil
299fbcd096 fix(portal): Properly check background jobs (#8986)
The `background_jobs_enabled` config in an ENV var that needs to be set
for a specific configuration key. It's not set on the top-level
`:domain` config by default.

Instead, it's used to enable / disable specific modules to start by the
application's Supervisor.

The `Domain.Events.ReplicationConnection` module is updated in this PR
to follow this convention.
2025-05-01 16:32:43 +00:00
Jamil
8e054f5c74 fix(portal): Restrict WAL streaming to domain nodes only (#8956)
The `web` and `api` application use `domain` as a dependency in their
`mix.exs`. This means by default their Supervisor will start the
Domain's supervision tree as well.

The author did not realize this at the time of implementation, and so we
now leverage the convention in place for restricting tasks to `domain`
nodes, the `background_jobs_enabled` application configuration
parameter.

We also add an info log when the replication slot is being started so we
can verify the node it's starting on.
2025-05-01 13:28:40 +00:00
Jamil
c0a670d947 fix(portal): Restart ReplicationConnection using Supervisor (#8953)
When deploying, the cluster state diverges temporarily, which allows
more than one `ReplicationConnection` process to start on the new nodes.

(One of) the old nodes still has an active slot, and we get an "object
in use" error `(Postgrex.Error) ERROR 55006 (object_in_use) replication
slot "events_slot" is active for PID 603037`.

Rather than use ReplicationConnection's restart behavior (which logs
tons of errors with Logger.error), we can use the Supervisor here
instead, and continue to try and start the ReplicationConnection until
successful.

Note that if the process name is registered (globally) and running,
ReplicationConnection.start_link/1 simply returns `{:ok, pid}` instead
of erroring out with `:already_running`, so eventually one of the nodes
will succeed and the remaining ones will return the globally-registered
pid.
2025-05-01 03:48:35 +00:00
Jamil
a98a9867af fix(portal): Redact entire connection_opts param (#8946)
The LoggerJSON Redactor only redacts top-level keys, so we need to
redact the entire `connection_opts` param to redact its contained
password.

We also don't need to pass around `connection_opts` across the entire
ReplicationConnection process state, only for the initial connection, so
we refactor that out of the `state`.
2025-04-30 16:33:21 +00:00
Jamil
968db2ae39 feat(portal): Receive WAL events (#8909)
Firezone's control plane is a realtime, distributed system that relies
on a broadcast/subscribe system to function. In many cases, these events
are broadcasted whenever relevant data in the DB changes, such as an
actor losing access to a policy, a membership being deleted, and so
forth.

Today, this is handled in the application layer, typically happening at
the place where the relevant DB call is made (i.e. in an
`after_commit`). While this approach has worked thus far, it has several
issues:

1. We have no guarantee that the DB change will issue a broadcast. If
the application is deployed or the process crashes after the DB changes
are made but before the broadcast happens, we will have potentially
failed to update any connected clients or gateways with the changes.
2. We have no guarantee that the order of DB updates will be maintained
in order for broadcasts. In other words, app server A could win its DB
operation against app server B, but then proceed to lose being the first
to broadcast.
3. If the cluster is in a bad state where broadcasts may return an error
(i.e. https://github.com/firezone/firezone/issues/8660), we will never
retry the broadcast.

To fix the above issues, we introduce a WAL logical decoder that process
the event stream one message at a time and performs any needed work.
Serializability is guaranteed since we only process the WAL in a single,
cluster-global process, `ReplicationConnection`. Durability is also
guaranteed since we only ACK WAL segments after we've successfully
ingested the event.

This means we will only advance the position of our WAL stream after
successfully broadcasting the event.

This PR only introduces the WAL stream processing system but does not
introduce any changes to our current broadcasting behavior - that's
saved for another PR.
2025-04-29 23:53:06 -07:00
Jamil
0f300f2484 fix(portal): Prevent dupe sync adapters (#8887)
Prevents more than one sync-enabled adapter per account in order to
prepare for eventually adding a unique constraint on
`provider_identifier` for identities and groups per account.

Related: #6294

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Brian Manifold <bmanifold@users.noreply.github.com>
2025-04-22 13:58:24 +00:00
Jamil
2bbc0abc3a feat(portal): Add Oban (#8786)
Our current bespoke job system, while it's worked out well so far, has
the following shortcomings:

- No retry logic
- No robust to guarantee job isolation / uniqueness without resorting to
row-level locking
- No support for cron-based scheduling

This PR adds the boilerplate required to get started with
[Oban](https://hexdocs.pm/oban/Oban.html), the job management system for
Elixir.
2025-04-15 03:56:49 +00:00
Jamil
6cd7616b5c refactor(portal): Expect members key to be missing when empty (#8781)
This will prevent warning spam we're currently seeing in Sentry.
2025-04-14 20:12:43 +00:00
Jamil
2f0d2462c9 fix(portal): Increase directory sync timeout to 8 hours (#8771)
Large Okta directories can take a very long time (> 1 hour) to sync.
This currently times out, preventing any entities from making it into
the database.

There are many things to address in our sync operation, but this should
hopefully resolve the immediate issue with the customer.


https://firezone-inc.sentry.io/issues/6537862651/?project=4508756715569152&query=is%3Aunresolved%20issue.priority%3A%5Bhigh%2C%20medium%5D%20Enum.to_list&referrer=issue-stream&stream_index=0
2025-04-13 17:27:15 +00:00
Jamil
649c03e290 chore(portal): Bump LoggerJSON to 7.0.0, fixing config (#8759)
There was slight API change in the way LoggerJSON's configuration is
generation, so I took the time to do a little fixing and cleanup here.

Specifically, we should be using the `new/1` callback to create the
Logger config which fixes the below exception due to missing config
keys:

```
FORMATTER CRASH: {report,[{formatter_crashed,'Elixir.LoggerJSON.Formatters.GoogleCloud'},{config,[{metadata,{all_except,[socket,conn]}},{redactors,[{'Elixir.LoggerJSON.Redactors.RedactKeys',[<<"password">>,<<"secret">>,<<"nonce">>,<<"fragment">>,<<"state">>,<<"token">>,<<"public_key">>,<<"private_key">>,<<"preshared_key">>,<<"session">>,<<"sessions">>]}]}]},{log_event,#{meta => #{line => 15,pid => <0.308.0>,time => 1744145139650804,file => "lib/logger.ex",gl => <0.281.0>,domain => [elixir],application => libcluster,mfa => {'Elixir.Cluster.Logger',info,2}},msg => {string,<<"[libcluster:default] connected to :\"web@web.cluster.local\"">>},level => info}},{reason,{error,{badmatch,[{metadata,{all_except,[socket,conn]}},{redactors,[{'Elixir.LoggerJSON.Redactors.RedactKeys',[<<"password">>,<<"secret">>,<<"nonce">>,<<"fragment">>,<<"state">>,<<"token">>,<<"public_key">>,<<"private_key">>,<<"preshared_key">>,<<"session">>,<<"sessions">>]}]}]},[{'Elixir.LoggerJSON.Formatters.GoogleCloud',format,2,[{file,"lib/logger_json/formatters/google_cloud.ex"},{line,148}]}]}}]}
```

Supersedes #8714

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-11 19:00:06 -07:00
Brian Manifold
bed6a60056 fix(portal): Fetch latest Okta access_token before API call (#8745)
Why:

* The Okta IdP sync job needs to make sure it is always using the latest
access token available. If not, there is the possibility for the job to
take too long to complete and the access token that the job started with
might time out. This commit updates the Okta API client to always check
and make sure it is using the latest access token for each request to
the Okta API.
2025-04-11 21:25:07 +00:00
Jamil
d2fd57a3b6 fix(portal): Attach Sentry in each umbrella app (#8749)
- Attaches the Sentry Logging hook in each of [api, web, domain]
- Removes errant Sentry logging configuration in config/config.exs
- Fixes the exception logger to default to logging exceptions, use
`skip_sentry: true` to skip

Tested successfully in dev. Hopefully the cluster behaves the same way.

Fixes #8639
2025-04-11 04:17:12 +00:00
Jamil
b9532bc243 revert: "Enable automatic tax calculation by default" (#8743)
This needs #8670 in order to function.

Reverts firezone/firezone#8552
2025-04-11 02:59:17 +00:00
Jamil
8ca43300cd chore(portal): Fix typo: counties -> countries (#8666) 2025-04-05 08:11:05 +00:00
Jamil
fb9f132a49 fix(portal): Interpret missing members as empty list (#8640)
The Google API will often return a missing `members` key alongside a
`200` response from their members API. The documentation here isn't
clear whether this key is expected or not, but since the sync has been
working fine up until #8608, we can only surmise that the missing key in
fact means the group has no members.

This PR updates the Google API client so that a `default_if_missing` can
be passed in which is returned if the API response is missing the JSON
key to fetch.

For the users, groups, and organization units fetches, we consider a
missing key to be an error and we return `{:error, :invalid_response}`
since this most likely indicates an API problem.

For the members endpoint, we consider the missing key to be the empty
set.

Additionally, a bug is fixed that was introduced in #8608 whereupon we
returned `{:error, :retry_later}` for newly-accounted-for API responses,
which would have caused a "sync failed" email to be sent to the admins
on the instance.

Instead, we want to return `{:error, :invalid_response}` which will stop
the sync from progressing, and log it internally.
2025-04-03 11:27:39 -07:00
Jamil
713ff1e7de chore(portal): Log problematic identity api responses (#8623)
After merging #8608, we discovered that we receive unexpected API
responses on the regular. This adds improved logging to uncover what
exactly these unexpected API responses are.
2025-04-02 14:59:16 -07:00
Jamil
f275bf70d9 fix(portal): Resurrect deleted identities and groups (#8615)
When syncing identities from an identity, we have logic in place that
resurrects any soft-deleted identities in order to maintain their
session history, group memberships and any other relevant data. Users
can be temporarily suspended from their identity provider and then
resumed.

Groups, however, based on cursory research, can never be temporarily
suspended at the identity provider. However, this doesn't mean that we
can't see the group disappear and reappear at a later point in time.
This can happen due to a temporary sync issue, or in the upcoming Group
Filters PR: #8381.

This PR adds more robust testing to ensure we can in fact resurrect
identities as expected.

It also updates the group sync logic to similarly resurrect soft-deleted
groups if they are seen again in a subsequent sync.

To achieve this, we need to update the `UNIQUE CONSTRAINT` used in the
upsert clause during the sync. Before, it was possible for two (or more)
groups to exist with the same provider_identifier and provider_id, if
`deleted_at IS NOT NULL`. Now, we need to ensure that only one group
with the same `account_id, provider_id, provider_identifier` can exist,
since we want to resurrect and not recreate these.

To do this, we use a migration that does the following:

1. Ensures any potentially problematic data is permanently deleted
2. Drops the existing unique constraint
3. Recreates it, omitting `WHERE DELETED_AT IS NULL` from the partial
index.

Based on exploring the production DB data, this should not cause any
issues, but it would be a good idea to double-check before rolling this
out to prod.


Lastly, the final missing piece to the resurrection story is Policies.
This is saved for a future PR since we need to first define the
difference between a policy that was soft-deleted via a sync job, and a
policy that was "perma" deleted by a user.

Related: #8187
2025-04-02 21:12:44 +00:00
Jamil
88c4e723a6 fix(portal): Gracefully handle dir sync error responses (#8608)
When calling the various directory sync endpoints, we had error cases
that matched a few of the possible error scenarios in an appropriate way
by returning either `{:error, :retry_later}` or the `{:error, ...}`
tuples.

However, as we've recently learned in [this
thread](https://firezonehq.slack.com/archives/C069H865MHP/p1743521884037159),
it's possible for identity provider APIs to return all kinds of bogus
data here, and we need a more defensive approach.

The specific issue this PR addresses is the case where we receive a
`2xx` response, but without the expected JSON key in the response body.
That will result in the `list*` functions returning an empty list, which
the calling code paths then use to soft-delete all existing record types
in the DB.

This is wrong. If the JSON response is missing a key we're expecting, we
instead log a warning and return `{:error, :retry_later}`. It's
currently unknown when exactly this happens and why, but with better
monitoring here we'll have a much better picture as to why.
2025-04-02 19:04:43 +00:00
Jamil
8805d906aa chore(portal): Leave notes around sync frequency (#8605)
When reading through these modules, it's helpful to know that the actual
sync data update doesn't occur more often than 10 minutes due to a
database check.
2025-04-01 18:25:33 +00:00
Jamil
936f5ddb01 chore(billing): Enable automatic tax calculation by default (#8552)
When a customer signs up for Starter or Team, we don't enable tax
calculation by default. This means customers can upgrade to Team, start
paying invoices, and we won't collect taxes.

This creates a management issue and possible tax liability since I need
to manually reconcile these.

Instead, since we have Stripe Tax configured on our account, we can
enable automatic tax calculation when the subscription is created. Any
products (Starter/Team/Enterprise) therefore in the subscription will
automatically collect tax appropriately.

In most cases in the US, the tax rate is 0. In EU transactions, for B2B
sales, the tax rate for us is also 0 (reverse charge basis). If we sell
a Team subscription to an individual, however, we need to collect VAT.

There doesn't seem to be a way to block consumer EU transactions in
Stripe, so we'll likely need to register for VAT in the EU if we cross
the reporting threshold.
2025-03-31 13:23:39 +00:00
Jamil
2dbfae9ba9 fix(portal): Use old policy for broadcasting events when updated (#8550)
A regression was introduced in d0f0de0f8d
whereupon we started using the updated policy record for broadcasting
the `delete_policy` and `expire_flows` events. This caused a security
issue because if the actor group changed from `Everyone` to `thomas`,
for example, we'd only expire flows and broadcast policy removal (i.e.
resource removal) events for `thomas`, and `Everyone` would still have
access granted by the old policy.

To fix this, we broadcast the destructive events to the old policy, so
that its `actor_group_id` and `resource_id` are used, and not the new
policy's.

Fixes #8549
2025-03-30 03:26:11 +00:00
Brian Manifold
3313e7377e feat(portal): Add account delete button (#8487)
Why:

* This commit will allow account admins to send a request through the
Firezone portal to schedule a deletion of their account, rather than
having the account admins email their request manually. Doing this
through the portal allows us to verify that the request actually came
from an admin of the account.
2025-03-19 18:23:32 +00:00
Jamil
595fb7efd9 refactor(portal): Rename resource_cidrs -> device_cidrs (#8482)
I was debugging some of this just now and realized our naming / comments
are incorrect here, so thought I'd open a PR to tidy things up for the
next person reading this.

Resource CIDRs actually occupy the `100.96.0.0/11` range (and IPv6
equivalent), but the portal doesn't generate these.
2025-03-19 01:54:08 +00:00
Brian Manifold
e14e5c4008 refactor(portal): Use appropriate access token for Google IdP (#8478)
Why:

* Previously, when running a directory sync with the Google Workspace
IdP adapter, if a service account had been configured but there was a
problem getting an access token for the service account, the sync job
would fall back to using a personal access token. We no longer want to
rely on any personal access token once a service account has been
configured. This commit will make sure that if a service account is
configured there is no way to fall back to any personal access token.


Fixes #8409
2025-03-18 16:46:08 +00:00
Jamil
d143d4dc89 feat(portal): Add changelog link to outdated gateway email (#8458)
It would be useful to have a link to the changelog in our outdated
gateway email.

See https://firezonehq.slack.com/archives/C069H865MHP/p1742088424077639

<img width="638" alt="Screenshot 2025-03-16 at 9 39 22 PM"
src="https://github.com/user-attachments/assets/f67b9b3e-9796-45a9-ae90-26eeabc40740"
/>
2025-03-18 02:43:06 +00:00
Jamil
4ce2f160e3 fix(portal): Allow .local for search_domains (#8472)
This apparently is explicitly used by customers. See
https://firezonehq.slack.com/archives/C08FPHECLUF/p1742221580587719?thread_ts=1741639183.188459&cid=C08FPHECLUF
2025-03-17 20:18:51 +00:00
Jamil
43d084f97f refactor(portal): Enforce internet resource site exclusion (#8448)
Finishes up the Internet Resource migration by enforcing:

- No internet resources in non-internet sites
- No regular resources in internet sites
- Removing the prompt to migrate

~~I've already migrated the existing internet resources in customer's
accounts. No one that was using the internet resource hadn't already
migrated.~~

Edit: I started to head down that path, then decided doing this here in
a data migration was going to be a better approach.

Fixes #8212
2025-03-15 18:25:32 -05:00
Brian Manifold
d133ee84b7 feat(portal): Add API rate limiting (#8417) 2025-03-13 03:21:09 +00:00
Jamil
6cfe500b11 fix(portal): Add more validation to search_domain (#8392)
- Prevents `.local`
- Allows ending with `.`

https://github.com/firezone/firezone/pull/8391/files#r1985958387
2025-03-08 14:39:04 +00:00
Jamil
d723336c2a feat(portal): Support search_domain field in Account.Config (#8391)
Introduces a simple `search_domain` field embed into our existing
`Accounts.Account.Config` embedded schema. This will be sent to clients
to append to single-label DNS queries.

UI and API changes will come in subsequent PRs: this one adds field and
(lots of) validations only.

Related: #8365
2025-03-08 03:08:33 +00:00
Jamil
e3897aebd8 feat(portal): Add Mock sync adapter and more seeds (#8370)
- Adds more actor groups to the existing `oidc_provider`
- Configures a rand seed so our seed data is reproducible across
machines
- Formats the seeds file to allow for some refactoring a later PR
- Adds a `Mock` identity provider adapter with sync enabled
2025-03-07 09:37:32 -08:00
Jamil
25ed48114a fix(portal): Use explicit UTC timezone for NOW() (#8374)
Fixes #8373
2025-03-06 17:59:49 +00:00
Jamil
c3a9bac465 feat(portal): Add client endpoints to REST API (#8355)
Adds the following endpoints:

- `PUT /clients/:id` for updating the `name`
- `PUT /clients/:client_id/verify` for verifying a client
- `PUT /clients/:client_id/unverify` for unverifying a client
- `GET /clients` for listing clients in an account
- `GET /clients/:id` for getting a single client
- `DELETE /clients/:id` for deleting a client

Related: #8081
2025-03-05 00:37:01 +00:00
Jamil
e064cf5821 fix(portal): Debounce relays_presence (#8302)
If the websocket connection between a relay and the portal experiences a
temporary network split, the portal will immediately send the
disconnected id of the relay to any connected clients and gateways, and
all relayed connections (and current allocations) will be immediately
revoked by connlib.

This tight coupling is needlessly disruptive. As we've seen in staging
and production logs, relay disconnects can happen randomly, and in the
vast majority of cases immediately reconnect. Currently we see about 1-2
dozen of these **per day**.

To better account for this, we introduce a debounce mechanism in the
portal for `relays_presence` disconnects that works as follows:

- When a relay disconnects, record its `stamp_secret` (this is somewhat
tricky as we don't get this at the time of disconnect - we need to cache
it by relay_id beforehand)
- If the same `relay_id` reconnects again with the same `stamp_secret`
within `relays_presence_debounce_timeout` -> no-op
- If the same `relay_id` reconnects again with a **different**
`stamp_secret` -> disconnect immediately
- If it doesn't reconnect, **then** send the `relays_presence` with the
disconnected_id after the `relays_presence_debounce_timeout`

There are several ways connlib detects a relay is down:

1. Binding requests time out. These happen every 25s, so on average we
don't know a Relay is down for 12.5s + backoff timer.
2. `relays_presence` - this is currently the fastest way to detect
relays are down. With this change, the caveat is we will now detect this
with a delay of `relays_presence_debounce_timer`.

Fixes #8301
2025-03-04 23:56:40 +00:00
Jamil
cb0bf44815 chore: Remove ability to create GCP log sinks (#8298)
This has long since been removed in the Clients.
2025-02-28 20:57:21 +00:00
Jamil
d7be59707a fix(portal): Improve resource address validation (#8288)
We had a number of validation issues:

- DNS resources allow address `1.1.1.1` or `1.1.1.1/32`. These are not
valid and will cause issues during resolution.
- IP resources were allowing basically any string character on `edit`
caused by a logic bug in the changeset
- CIDR resources, same as above
- `*.*.*.*.google.com` and similar DNS wildcard resources were not
allowed

This PR beefs all of those up so that we have a higher degree of
certainty that our data is valid. If invalid data reaches connlib, it
will cause a panic.

This PR also introduces a migration to migrate any invalid resources
into the proper format in the DB.

Fixes #8287
2025-02-27 23:41:11 +00:00
Brian Manifold
d0f0de0f8d refactor(portal): Allow breaking changes in Resources/Policies (#8267)
Why:

* Rather than using a persistent_id field in Resources/Policies, it was
decided that we should allow "breaking changes" to these entities. This
means that Resources/Policies will now be able to update all fields on
the schema without changing the primary key ID of the entity.
* This change will greatly help the API and Terraform provider
development.

@jamilbk, would you like me to put a migration in this PR to actually
get rid of all of the existing soft deleted entities?

@thomaseizinger, I tagged you on this, because I wanted to make sure
that these changes weren't going to break any expectations in the client
and/or gateways.

---------

Signed-off-by: Brian Manifold <bmanifold@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2025-02-26 17:05:34 +00:00
Jamil
5650150b3f chore(portal): Enforce only internet resource in internet site (#8254)
Currently, it would theoretically be possible for an admin to connect
non-internet Resources to the Internet site. This PR fixes that by
enforcing only the `internet` Resource type can belong to the `Internet`
gateway group.


Related: #6834
2025-02-25 03:45:40 +00:00
Jamil
d9a513fa54 fix(portal): optionally enable optimistic lock (#8229)
When the buffer is full, we want to update immediately, without locking.
2025-02-20 23:42:29 -08:00
Jamil
a797e350c0 fix(portal): Force update last_flushed_at for optimistic lock (#8228)
This PR fixes two issues:

1. Since we weren't updating any actual fields in the telemetry reporter
log record, it was never being updated, thus optimistic locking was not
taking effect. To fix this, we use `Repo.update(force: true)`.
2. If a buffer is full, we write immediately, but we provider an empty
`%Log{}` which causes a repetitive `the current value of last_flushed_at
is nil and will not be used as a filter for optimistic locking.`
2025-02-20 23:12:17 -08:00
Jamil
a07f1725c6 chore(portal): Refactor GCP labels logger to relax sentry alerts (#8213) 2025-02-20 11:20:45 +00:00
Jamil
407085d7ec fix(portal): Add managed_by to gateway groups index (#8208)
Some customers have already picked the `Internet` name, which is making
our migrations fail.

This scopes the unique name index by `managed_by` so that our attempts
to create them succeed.
2025-02-19 15:55:51 -08:00
Jamil
28559a317f chore(portal): Optionally drop NotFoundError to sentry (#8183)
By specifying the `before_send` hook, we can easily drop events based on
their data, such as `original_exception` which contains the original
exception instance raised.

Leveraging this, we can add a `report_to_sentry` parameter to
`Web.LiveErrors.NotFound` to optionally ignore certain not found errors
from going to Sentry.
2025-02-18 21:55:23 +00:00
Jamil
d452e7d1b5 fix(portal): Parse string metric datetimes (#8148)
It turns out we can sometimes receive measurements with `DateTime`
fields, and other times they're strings. 🙃
2025-02-16 14:15:31 -08:00
Jamil
311988c5a2 fix(portal): Only compute diff for metrics with both start and end times (#8147)
A fix for a nil error from #8146
2025-02-16 12:57:03 -08:00
Jamil
36b887e98e fix(portal): Don't flush metrics when intervals < 5s (#8146) 2025-02-16 11:51:10 -08:00
Jamil
d29b210a63 chore(portal): Log metrics that failed to flush (#8142)
When flushing metrics to GCP, we sometimes get the following error:

```
{400, "{\n  \"error\": {\n    \"code\": 400,\n    \"message\": \"One or more TimeSeries could not be written: timeSeries[0-51]: write for resource=gce_instance{zone:us-east1-d,instance_id:6130184649770384727} failed with: One or more points were written more frequently than the maximum sampling period configured for the metric.\",\n    \"status\": \"INVALID_ARGUMENT\",\n    \"details\": [\n      {\n        \"@type\": \"type.googleapis.com/google.monitoring.v3.CreateTimeSeriesSummary\",\n        \"totalPointCount\": 52,\n        \"successPointCount\": 48,\n        \"errors\": [\n          {\n            \"status\": {\n              \"code\": 9\n            },\n            \"pointCount\": 4\n          }\n        ]\n      }\n    ]\n  }\n}\n"}
```

It would be helpful to know exactly which metrics are failing to flush
so we can further troubleshoot any issues.
2025-02-15 08:50:29 -08:00
Jamil
85ee37dfb3 Revert "fix(portal): Add node name key to metrics labels" (#8141)
The node_name label is already in the metrics.

Reverts firezone/firezone#8082
2025-02-15 08:47:45 -08:00
Andrew Dryga
bacb4596b7 feat(portal): Internet Sites (#6905)
Related #6834

Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
2025-02-15 00:34:30 +00:00