> This PR adds two configuration keys for the replication connection.
> Exemple usecase:
> If you run two firezone control planes on the same db cluster (like me
😂 ), you'll need to have two different replication slot names
Related: #9395
Co-authored-by: Antoine <antoinelabarussias@gmail.com>
The `background_jobs_enabled` config in an ENV var that needs to be set
for a specific configuration key. It's not set on the top-level
`:domain` config by default.
Instead, it's used to enable / disable specific modules to start by the
application's Supervisor.
The `Domain.Events.ReplicationConnection` module is updated in this PR
to follow this convention.
The `web` and `api` application use `domain` as a dependency in their
`mix.exs`. This means by default their Supervisor will start the
Domain's supervision tree as well.
The author did not realize this at the time of implementation, and so we
now leverage the convention in place for restricting tasks to `domain`
nodes, the `background_jobs_enabled` application configuration
parameter.
We also add an info log when the replication slot is being started so we
can verify the node it's starting on.
Turns out we are making replication overly complex by creating a
dedicated user for it. The `web` user is already privileged and we can
reuse it since the replication system operates in the same security
context as the remaining app.
Firezone's control plane is a realtime, distributed system that relies
on a broadcast/subscribe system to function. In many cases, these events
are broadcasted whenever relevant data in the DB changes, such as an
actor losing access to a policy, a membership being deleted, and so
forth.
Today, this is handled in the application layer, typically happening at
the place where the relevant DB call is made (i.e. in an
`after_commit`). While this approach has worked thus far, it has several
issues:
1. We have no guarantee that the DB change will issue a broadcast. If
the application is deployed or the process crashes after the DB changes
are made but before the broadcast happens, we will have potentially
failed to update any connected clients or gateways with the changes.
2. We have no guarantee that the order of DB updates will be maintained
in order for broadcasts. In other words, app server A could win its DB
operation against app server B, but then proceed to lose being the first
to broadcast.
3. If the cluster is in a bad state where broadcasts may return an error
(i.e. https://github.com/firezone/firezone/issues/8660), we will never
retry the broadcast.
To fix the above issues, we introduce a WAL logical decoder that process
the event stream one message at a time and performs any needed work.
Serializability is guaranteed since we only process the WAL in a single,
cluster-global process, `ReplicationConnection`. Durability is also
guaranteed since we only ACK WAL segments after we've successfully
ingested the event.
This means we will only advance the position of our WAL stream after
successfully broadcasting the event.
This PR only introduces the WAL stream processing system but does not
introduce any changes to our current broadcasting behavior - that's
saved for another PR.
There was slight API change in the way LoggerJSON's configuration is
generation, so I took the time to do a little fixing and cleanup here.
Specifically, we should be using the `new/1` callback to create the
Logger config which fixes the below exception due to missing config
keys:
```
FORMATTER CRASH: {report,[{formatter_crashed,'Elixir.LoggerJSON.Formatters.GoogleCloud'},{config,[{metadata,{all_except,[socket,conn]}},{redactors,[{'Elixir.LoggerJSON.Redactors.RedactKeys',[<<"password">>,<<"secret">>,<<"nonce">>,<<"fragment">>,<<"state">>,<<"token">>,<<"public_key">>,<<"private_key">>,<<"preshared_key">>,<<"session">>,<<"sessions">>]}]}]},{log_event,#{meta => #{line => 15,pid => <0.308.0>,time => 1744145139650804,file => "lib/logger.ex",gl => <0.281.0>,domain => [elixir],application => libcluster,mfa => {'Elixir.Cluster.Logger',info,2}},msg => {string,<<"[libcluster:default] connected to :\"web@web.cluster.local\"">>},level => info}},{reason,{error,{badmatch,[{metadata,{all_except,[socket,conn]}},{redactors,[{'Elixir.LoggerJSON.Redactors.RedactKeys',[<<"password">>,<<"secret">>,<<"nonce">>,<<"fragment">>,<<"state">>,<<"token">>,<<"public_key">>,<<"private_key">>,<<"preshared_key">>,<<"session">>,<<"sessions">>]}]}]},[{'Elixir.LoggerJSON.Formatters.GoogleCloud',format,2,[{file,"lib/logger_json/formatters/google_cloud.ex"},{line,148}]}]}}]}
```
Supersedes #8714
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The adapter itself isn't enabled in the UI on prod, but the background
job to sync mock data was. This prevents the job from being started and
emitting log noise into production logs.
- Adds more actor groups to the existing `oidc_provider`
- Configures a rand seed so our seed data is reproducible across
machines
- Formats the seeds file to allow for some refactoring a later PR
- Adds a `Mock` identity provider adapter with sync enabled
By specifying the `before_send` hook, we can easily drop events based on
their data, such as `original_exception` which contains the original
exception instance raised.
Leveraging this, we can add a `report_to_sentry` parameter to
`Web.LiveErrors.NotFound` to optionally ignore certain not found errors
from going to Sentry.
This allows connections to the postgresql database via the standard
socket, which - opposed to TCP sockets - allows `peer` authentication
based on local unix users. This removes the need for a password and is
much simpler to deploy when running components locally.
In the current form, `DATABASE_SOCKET_DIR` takes precedence over
hostname, if the environment variable is present. I found that
`compile_config!` somehow enforces a value to be present which is
explicitly not what I want for some of these values (i think). I'd be
glad if anyone with more elixir experience can guide me as to how I can
make this more idiomatic.
---------
Supersedes: #8044
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: oddlama <oddlama@oddlama.org>
This is required to run multiple components on a single machine (even if
the processes are sandboxed), since they will share a network namespace
and thus cannot bind to the same port.
Currently port `4000` is hardcoded, this PR allows this to be configured
by an environment variable.
---------
Co-authored-by: oddlama <oddlama@oddlama.org>
It appears that something is initializing the Sentry.LoggerHandler
before we try to load it when starting:
```
Invalid logger handler config: {:logger,
{:invalid_handler, {:function_not_exported, {Sentry.LoggerHandler, :log, 2}}}}
```
This doesn't seem to actually inhibit the Sentry logger at all,
presumably because it initializes just fine in the application start
callback.
Instead of defining the config in the `config/` directory, we can pass
it directly to `:logger` on start which solves the above issue.
This adds https://github.com/getsentry/sentry-elixir to the portal for
automatic process crash and exception trace reporting.
It also configures Logger reporting for the `warning` level and higher,
and sets the data scrubbing rules to allow all Logger metadata keys
(`logger_metadata.*` in the Sentry project settings).
Lastly, it configures automatic HTTP error reporting by tying into the
`api` and `web` endpoint modules with a custom `plug` middleware so we
get automatic reporting of unsuccessful Phoenix responses.
It is expected this will be noisy when we first deploy and we'll need to
tune it down a bit. This is the same approach used with other Sentry
platforms.
This adds a feature that will email all admins in a Firezone Account
when sync errors occur with their Identity Provider.
In order to avoid spamming admins with sync error emails, the error
emails are only sent once every 24 hours. One exception to that is when
there is a successful sync the `sync_error_emailed_at` field is reset,
which means in theory if an identity provider was flip flopping between
successful and unsuccessful syncs the admins would be emailed more than
once in a 24 hours period.
### Sample Email Message
<img width="589" alt="idp-sync-error-message"
src="https://github.com/user-attachments/assets/d7128c7c-c10d-4d02-8283-059e2f1f5db5">
They will be sent in the API for connlib 1.3 and above.
I think in future we can make a whole menu section called "Internet
Security" which will be a specialized UI for the new resource type (and
now show it in Resources list) to improve the user experience around it.
Closes#5852
---------
Signed-off-by: Andrew Dryga <andrew@dryga.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Why:
* The Swagger UI is currently served from the API application. This
means that the Web application does not have access to the external URL
in the API configuration during/after compilation. Without the API
external URL, we cannot generate a proper link in the portal to the
Swagger UI. This commit refactors how the API external URL is set from
the environment variables and allows the Web app to have access to the
value of the API URL.
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Why:
* JumpCloud directory sync was requested from customers. JumpCloud only
offers the ability to use it's API with an admin level access token that
is tied to a specific user within a given JumpCloud account. This would
require Firezone customers to give an access token with much more
permissions that needed for our directory sync. To avoid this, we've
decide to use WorkOS to provide SCIM support between JumpCloud and
WorkOS, which will allow Firezone to then easily and safely retrieve
JumpCloud directory info from WorkOS.
---------
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Why:
* As work on the portal REST API has begun, there was a need to easily
provision API tokens to allow testing of the new API endpoints being
created. Adding the API Client UI allows for this to be done very easily
and will also be used once the API is ready to be consumed by customers.
Closes#2368
Why:
* In order to allow easy testing of billing / Stripe integration, the
staging environment needs to allow members of the Firezone team access
to create new accounts, while disallowing the general public to create
accounts. The account creation override functionality allows for
multiple domains to be set by ENV variable by passing a comma separated
string of domains.
---------
Co-authored-by: Andrew Dryga <andrew@dryga.com>
Why:
* On some clients, the web view that is opened to sign-in to Firezone is
left open and ends up getting stuck on the Sign In page with the
liveview loader on the top of the page also stuck and appearing as
though it is waiting for another response. This commit adds a sign-in
success page that is displayed upon successful sign-in and shows a
message to the user that lets them know they can close the window if
needed. If the client device is able to close the web view that was
opened, then the page will either very briefly be shown or will not be
visible at all due to how quickly the redirect happens.
Closes#3608
<img width="625" alt="Screenshot 2024-02-15 at 4 30 57 PM"
src="https://github.com/firezone/firezone/assets/2646332/eb6a5df6-4a4c-4e54-b57c-5da239069ea9">
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
# Gateways
- [x] When Gateway Group is deleted all gateways should be disconnected
- [x] When Gateway Group is updated (eg. routing) broadcast to all
affected gateway to disconnect all the clients
- [x] When Gateway is deleted it should be disconnected
- [x] When Gateway Token is revoked all gateways that use it should be
disconnected
# Relays
- [x] When Relay Group is deleted all relays should be disconnected
- [x] When Relay is deleted it should be disconnected
- [x] When Relay Token is revoked all gateways that use it should be
disconnected
# Clients
- [x] Remove Delete Client button, show clients using the token on the
Actors page (#2669)
- [x] When client is deleted disconnect it
- [ ] ~When Gateway is offline broadcast to the Clients connected to it
it's status~
- [x] Persist `last_used_token_id` in Clients and show it in tokens UI
# Resources
- [x] When Resource is deleted it should be removed from all gateways
and clients
- [x] When Resource connection is removed it should be deleted from
removed gateway groups
- [x] When Resource is updated (eg. traffic filters) all it's
authorizations should removed
# Authentication
- [x] When Token is deleted related sessions are terminated
- [x] When an Actor is deleted or disabled it should be disconnected
from browser and client
- [x] When Identity is deleted it's sessions should be disconnected from
browser and client
- [x] ^ Ensure the same happens for identities during IdP sync
- [x] When IdP is disabled act like all actors for it are disabled?
- [x] When IdP is deleted act like all actors for it are deleted?
# Authorization
- [x] When Policy is created clients that gain access to a resource
should get an update
- [x] When Policy is deleted we need to all authorizations it's made
- [x] When Policy is disabled we need to all authorizations it's made
- [x] When Actor Group adds or removes a user, related policies should
be re-evaluated
- [x] ^ Ensure the same happens for identities during IdP sync
# Settings
- [x] Re-send init message to Client when DNS settings change
# Code
- [x] Crear way to see all available topics and messages, do not use
binary topics any more
---------
Co-authored-by: conectado <gabrielalejandro7@gmail.com>
- [x] Introduce api_client actor type and code to create and
authenticate using it's token
- [x] Unify Tokens usage for Relays and Gateways
- [x] Unify Tokens usage for magic links
Closes#2367
Ref #2696
- [x] make sure that session cookie for client is stored separately from
session cookie for the portal (will close#2647 and #2032)
- [x] #2622
- [ ] #2501
- [ ] show identity tokens and allow rotating/deleting them (#2138)
- [ ] #2042
- [ ] use Tokens context for Relays and Gateways to remove duplication
- [x] #2823
- [ ] Expire LiveView sockets when subject is expired
- [ ] Service Accounts UI is ambiguous now because of token identity and
actual token shown
- [ ] Limit subject permissions based on token type
Closes#2924. Now we extend the lifetime for client tokens, but not for
browsers.
Why:
* Self-hosted Relays are not going to be apart of the beta release, so
hiding the functionality in the portal will allow the user not to get
confused about a feature they aren't able to use.
Closes#2178
Why:
* The traffic filter functionality is not quite ready in the system as a
whole, so the web UI will give the ability to hide the section of the
forms to allow for a better end user experience.
Renaming it back to clients to reflect service accounts and headless
clients use cases in the terminology. Such a rename will be very painful
on live data so better if we do it early on.
---------
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>