firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 18:18:55 +00:00

Go to file

Jamil f379e85e9b refactor(portal): cache access state in channel pids (#9773 )

When changes occur in the Firezone DB that trigger side effects, we need
some mechanism to broadcast and handle these.

Before, the system we used was:

- Each process subscribes to a myriad of topics related to data it wants
to receive. In some cases it would subscribe to new topics based on
received events from existing topics (I.e. flows in the gateway
channel), and sometimes in a loop. It would then need to be sure to
_unsubscribe_ from these topics
- Handle the side effect in the `after_commit` hook of the Ecto function
call after it completes
- Broadcast only a simply (thin) event message with a DB id
- In the receiver, use the id(s) to re-evaluate, or lookup one or many
records associated with the change
- After the lookup completes, `push` the relevant message(s) to the
LiveView, `client` pid, or `gateway` pid in their respective channel
processes

This system had a number of drawbacks ranging from scalability issues to
undesirable access bugs:

1. The `after_commit` callback, on each App node, is not globally
ordered. Since we broadcast a thin event schema and read from the DB to
hydrate each event, this meant we had a `read after write` problem in
our event architecture, leading to the potential for lost updates. Case
in point: if a policy is updated from `resource_id-1` to
`resource_id-2`, and then back to `resource_id-1`, it's possible that,
given the right amount of delay, the gateway channel will receive two
`reject_access` events for `resource_id-1`, as opposed to one for
`resource_id-1` and one for `resource_id-2`, leading to the potential
for unauthorized access.
1. It was very difficult to ensure that the correct topics were being
subscribed to and unsubscribed from, and the correct number of times,
leading to maintenance issues for other engineers.
1. We had a nasty N+1 query problem whenever memberships were added or
removed that resolved in essentially all access related to that
membership (so all Policies touching its actor group) to be
re-evaluated, and broadcasted. This meant that any bulk addition or
deletion of memberships would generate so many queries that they'd
timeout or consume the entire connection pool.
1. We had no durability for side-effect processing. In some places, we
were iterating over many returned records to send broadcasts.
Broadcasting is not a zero-time operation, each call takes a small
amount of CPU time to copy the message into the receiver's mailbox. If
we deployed while this was happening, the state update would be lost
forever. If this was a `reject_access` for a Gateway, the Gateway would
never remove access for that particular flow.
1. On each flow authorization, we needed to hit `us-east1` not only to
"authorize" the flow, but to log it as well. This incurs latency
especially for users in other parts of the world, which happens on
_each_ connection setup to a new resource.
1. Since we read and re-authorize access due to the thin events
broadcasted from side effects, we risk hitting thundering herd problems
(see the N+1 query problem above) where a single DB change could result
in all receivers hitting the DB at once to "hydrate" their
processing.ion
1. If an administrator modifies the DB directly, or, if we need to run a
DB migration that involves side effects, they'll be lost, because the
side effect triggers happened in `after_commit` hooks that are only
available when querying the DB through Ecto. Manually deleting (or
resurrecting) a policy, for example, would not have updated any
connected clients or gateways with the new state.


To fix all of the above, we move to the system introduced in this PR:

- All changes are now serialized (for free) by Postgres and broadcasted
as a single event stream
- The number of topics has been reduced to just one, the `account_id` of
an account. All receivers subscribe to this one topic for the lifetime
of their pid and then only filter the events they want to act upon,
ignoring all other messages
- The events themselves have been turned into "fat" structs based on the
schemas they present. By making them properly typed, we can apply things
like the existing Policy authorizer functions to them as if we had just
fetched them from the DB.
- All flow creation now happens in memory and doesn't not need to incur
a DB hit in `us-east1` to proceed.
- Since clients and gateways now track state in a push-based manner from
the DB, this means very few actual DB queries are needed to maintain
state in the channel procs, and it also means we can be smarter about
when to send `resource_deleted` and `resource_created_or_updated`
appropriately, since we can always diff between what the client _had_
access to, and what they _now_ have access to.
- All DB operations, whether they happen from the application code, a
`psql` prompt, or even via Google SQL Studio in the GCP console, will
trigger the _same_ side effects.
- We now use a replication consumer based off Postgres logical decoding
of the write-ahead log using a _durable slot_. This means that Postgres
will retain _all events_ until they are acknowledged, giving us the
ability to ensure at-least-once processing semantics for our system.
Today, the ACK is simply, "did we broadcast this event successfully".
But in the future, we can assert that replies are received before we
acknowledge the event as processed back to Postgres.



The tests in this PR have been updated to pass given the refactor.
However, since we are tracking more state now in the channel procs, it
would be a good idea to add more tests for those edge cases. That is
saved as a later PR because (1) this one is already huge, and (2) we
need to get this out to staging to smoke test everything anyhow.

Fixes: #9908 
Fixes: #9909 
Fixes: #9910
Fixes: #9900 
Related: #9501

2025-07-18 22:47:18 +00:00

.github

feat(gateway): revoke unlisted authorizations upon init (#9896 )

2025-07-17 19:04:54 +00:00

docs

docs: remove outdated license notice (#9906 )

2025-07-18 14:28:08 +00:00

elixir

refactor(portal): cache access state in channel pids (#9773 )

2025-07-18 22:47:18 +00:00

kotlin/android

feat(connlib): add reason argument to reset API (#9878 )

2025-07-15 13:48:33 +00:00

rust

chore(telemetry): don't start in local environment (#9905 )

2025-07-18 14:28:55 +00:00

scripts

test: remove curl retry in favor of keep-alive (#9888 )

2025-07-16 16:17:27 +00:00

swift/apple

feat(connlib): add reason argument to reset API (#9878 )

2025-07-15 13:48:33 +00:00

website

feat(gateway): shutdown after 15m of portal disconnect (#9894 )

2025-07-18 05:47:30 +00:00

.dockerignore

chore: move terraform/ to private repo (#9421 )

2025-06-05 19:24:06 +00:00

.gitattributes

chore: set same eol for all platforms (#4316 )

2024-03-25 23:05:23 +00:00

.gitignore

chore: add nix scripts (#3771 )

2024-02-27 23:56:46 +00:00

.prettierignore

ci: properly ignore generated TS directory (#9383 )

2025-06-04 05:49:05 +00:00

.prettierrc.json

docs: update Apple docs for standalone guidance (#7589 )

2024-12-29 21:36:30 +00:00

.tool-versions

chore(portal): bump elixir 1.18.4, otp 27.3.4.1 (#9673 )

2025-06-25 18:39:20 +00:00

docker-compose.yml

chore: remove pull_policy from containers (#9887 )

2025-07-16 09:15:29 +00:00

LICENSE

Update LICENSE to include component license clarification for subcomponents (#1806 )

2023-07-20 21:14:38 +00:00

docs/README.md

A modern alternative to legacy VPNs.

Overview

Firezone is an open source platform to securely manage remote access for any-sized organization. Unlike most VPNs, Firezone takes a granular, least-privileged approach to access management with group-based policies that control access to individual applications, entire subnets, and everything in between.

Features

Firezone is:

Fast: Built on WireGuard® to be 3-4 times faster than OpenVPN.
Scalable: Deploy two or more gateways for automatic load balancing and failover.
Private: Peer-to-peer, end-to-end encrypted tunnels prevent packets from routing through our infrastructure.
Secure: Zero attack surface thanks to Firezone's holepunching tech which establishes tunnels on-the-fly at the time of access.
Open: Our entire product is open-source, allowing anyone to audit the codebase.
Flexible: Authenticate users via email, Google Workspace, Okta, Entra ID, or OIDC and sync users and groups automatically.
Simple: Deploy gateways and configure access in minutes with a snappy admin UI.

Firezone is not:

A tool for creating bi-directional mesh networks
A full-featured router or firewall
An IPSec or OpenVPN server

Contents of this repository

This is a monorepo containing the full Firezone product, marketing website, and product documentation, organized as follows:

elixir: Control plane and internal Elixir libraries:
- elixir/apps/web: Admin UI
- elixir/apps/api: API for Clients, Relays and Gateways.
rust/: Data plane and internal Rust libraries:
- rust/gateway: Gateway - Tunnel server based on WireGuard and deployed to your infrastructure.
- rust/relay: Relay - STUN/TURN server to facilitate holepunching.
- rust/headless-client: Cross-platform CLI client.
- rust/gui-client: Cross-platform GUI client.
swift/: macOS / iOS clients.
kotlin/: Android / ChromeOS clients.
website/: Marketing website and product documentation.

Quickstart

The quickest way to get started with Firezone is to sign up for an account at https://app.firezone.dev/sign_up.

Once you've signed up, follow the instructions in the welcome email to get started.

Frequently asked questions (FAQ)

Can I self-host Firezone?

Our license won't stop you from self-hosting the entire Firezone product top to bottom, but our internal APIs are changing rapidly so we can't meaningfully support self-hosting Firezone in production at this time.

If you're feeling especially adventurous and want to self-host Firezone for educational or hobby purposes, follow the instructions to spin up a local development environment in CONTRIBUTING.md.

The latest published clients (on App Stores and on releases) are only guaranteed to work with the managed version of Firezone and may not work with a self-hosted portal built from this repository. This is because Apple and Google can sometimes delay updates to their app stores, and so the latest published version may not be compatible with the tip of main from this repository.

Therefore, if you're experimenting with self-hosting Firezone, you will probably want to use clients you build and distribute yourself as well.

See the READMEs in the following directories for more information on building each client:

macOS / iOS: swift/apple
Android / ChromeOS: kotlin/android
Windows / Linux: rust/gui-client

How long will 0.7 be supported until?

Firezone 0.7 is currently end-of-life and has stopped receiving updates as of January 31st, 2024. It will continue to be available indefinitely from the legacy branch of this repo under the Apache 2.0 license.

How much does it cost?

We offer flexible per-seat monthly and annual plans for the cloud-managed version of Firezone, with optional invoicing for larger organizations. See our pricing page for more details.

Those experimenting with self-hosting can use Firezone for free without feature or seat limitations, but we can't provide support for self-hosted installations at this time.

Documentation

Additional documentation on general usage, troubleshooting, and configuration can be found at https://www.firezone.dev/kb.

Get Help

If you're looking for help installing, configuring, or using Firezone, check our community support options:

Discussion Forums: Ask questions, report bugs, and suggest features.
Join our Discord Server: Join live discussions, meet other users, and chat with the Firezone team.
Open a PR: Contribute a bugfix or make a contribution to Firezone.

If you need help deploying or maintaining Firezone for your business, consider contacting our sales team to speak with a Firezone expert.

See all support options on our main support page.

Star History

Developing and Contributing

See CONTRIBUTING.md.

Security

See SECURITY.md.

License

Portions of this software are licensed as follows:

All content residing under the "elixir/" directory of this repository, if that directory exists, is licensed under the "Elastic License 2.0" license defined in "elixir/LICENSE".
All third party components incorporated into the Firezone Software are licensed under the original license provided by the owner of the applicable component.
Content outside of the above mentioned directories or restrictions above is available under the "Apache 2.0 License" license as defined in "LICENSE".

WireGuard® is a registered trademark of Jason A. Donenfeld.

Languages

Elixir 57.1%

Rust 29.2%

TypeScript 5.9%

Swift 3.3%

Kotlin 1.8%

Other 2.5%