Commit Graph

70 Commits

Author SHA1 Message Date
Thomas Eizinger
96536a23cf refactor(connlib): ignore relays per connection (#5631)
In a previous design of firezone, relays used to be scoped to a certain
connection. For a while now, this constraint has been lifted and all
connections can use all relays. A related, outdated concern is the idea
of STUN-only servers. Those also used to be assigned on a per-connection
basis.

By removing any use of per-connection relays and STUN-only servers, the
entire `StunBinding` concept is unused code and can thus be deleted.

To push this over the finish line, the `snownet-tests` which test the
hole-punching functionality needed to be slightly adapted to make use of
the more recently introduced API `Node::update_relays`.

Resolves: #4749.
2024-06-29 02:36:17 +00:00
Thomas Eizinger
38275ecad0 refactor(gateway): extract fn for update-device task (#5581)
Follow-up feedback from #5512.
2024-06-29 01:27:23 +00:00
Thomas Eizinger
aadb045b27 chore(connlib): batch together sending of ICE candidates (#5616)
Currently, we are sending each ICE candidate individually from the
client to the gateway and vice versa. This causes a slight delay as to
when each ICE candidate gets added on the remote ICE agent. As a result,
they all start being tested with a slight offset which causes "endpoint
hopping" whenever a connection expires as they expire just after each
other.

In addition, sending multiple messages to the portal causes unnecessary
load when establishing connections.

Finally, with #5283 we started **not** adding the server-reflexive
candidate to the local ICE agent. Because we talk to multiple relays, we
detect the same server-reflexive candidate multiple times if we are
behind a non-symmetric NAT. Not adding the server-reflexive candidate to
the ICE agent mitigated our de-duplication strategy here which means we
currently send the same candidate multiple times to a peer, causing
additional, unnecessary load.

All of this can be mitigated by batching together all our ICE candidates
together into one message.

Resolves: #3978.
2024-06-28 02:04:31 +00:00
Thomas Eizinger
1b5076fa57 fix(gateway): handle init messages during operation (#5512)
Currently, the gateway only handles an `init` message on startup. For
clients, we handle `init` messages also during operation so it only
makes sense to do the same thing for gateways.

This allows us to remove some old code from `phoenix_channel`. In
particular, the `init` function which used to wait for the `init`
message before continuing. In
https://github.com/firezone/firezone/pull/4594, we refactored
`phoenix-channel` to reconnect internally on errors. As a result, the
`connect` function became synchronous and no longer needed an `async`
context.

At the time, the gateway wasn't updated to make use of this. We can now
simplify the gateway code and resolve the outstanding TODO of handling
`init` messages during operation.
2024-06-26 00:11:07 +00:00
Reactor Scram
28378fe24e refactor(headless-client): remove FIREZONE_PACKAGE_VERSION (#5487)
Closes #5481 

With this, I can connect to the staging portal without a build.rs or any
extra env var setup

<img width="387" alt="image"
src="https://github.com/firezone/firezone/assets/13400041/9c080b36-3a76-49c7-b706-20723697edc7">


```[tasklist]
### Next steps
- [x] Split out a refactor PR for `ConnectArgs` (#5488)
- [x] Try doing this for other Clients
- [x] Check Gateway
- [x] Check Tauri Client
- [x] Change to `app_version`
- [x] Open for review
- [ ] Use `option_env` so that `FIREZONE_PACKAGE_VERSION` can still override the Cargo.toml version for local testing
- [ ] Check Android Client
- [ ] Check Apple Client
```

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-06-21 23:06:41 +00:00
Gabi
2ea6a5d07e feat(gateway): NAT & mangling for DNS resources (#5354)
As part of #4994, the IP translation and mangling of packets to and from
DNS resources is moved to the gateway. This PR represents the
"gateway-half" of the required changes.

Eventually, the client will send a list of proxy IPs that it assigned
for a certain DNS resource. The gateway assigns each proxy IP to a real
IP and mangles outgoing and incoming traffic accordingly. There are a
number of things that we need to take care of as part of that:

- We need to implement NAT to correctly route traffic. Our NAT table
maps from source port* and destination IP to an assigned port* and real
IP. We say port* because that is only true for UDP and TCP. For ICMP, we
use the identifier.
- We need to translate between IPv4 and IPv6 in case a DNS resource e.g.
only resolves to IPv6 addresses but the client gave out an IPv4 proxy
address to the application. This translation is was added in #5364 and
is now being used here.

This PR is backwards-compatible because currently, clients don't send
any IPs to the gateway. No proxy IPs means we cannot do any translation
and thus, packets are simply routed through as is which is what the
current clients expect.

---------

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-06-19 01:15:27 +00:00
Thomas Eizinger
c4e608bd14 fix(gateway): ensure DNS resolution times out before connection (#5419)
When we attempt to establish a connection to a gateway for a DNS
resource, the gateway must resolve the requested domain name before it
can accept the connection. Currently, this timeout is set to 60s which
is much longer than the client's connection timeout.

DNS resolution is typically a very fast protocol so reducing this
timeout to 5s should be safe. In addition, we add a compile-time
assertion that this timeout must be less than the client's connection
timeout.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-06-18 22:08:49 +00:00
Reactor Scram
deefabd8f8 refactor(firezone-tunnel): move routes and DNS control out of connlib and up to the Client (#5111)
Refs #3636 (This pays down some of the technical debt from Linux DNS)
Refs #4473 (This partially fulfills it)
Refs #5068 (This is needed to make `FIREZONE_DNS_CONTROL` mandatory)

As of dd6421:

- On both Linux and Windows, DNS control and IP setting (i.e.
`on_set_interface_config`) both move to the Client
- On Windows, route setting stays in `tun_windows.rs`. Route setting in
Windows requires us to know the interface index, which we don't know in
the Client code. If we could pass opaque platform-specific data between
the tunnel and the Client it would be easy.
- On Linux, route setting moves to the Client and Gateway, which
completely removes the `worker` task in `tun_linux.rs`
- Notifying systemd that we're ready moves up to the headless Client /
IPC service

```[tasklist]
### Before merging / notes
- [x] Does DNS roaming work on Linux on `main`? I don't see where it hooks up. I think I only set up DNS in `Tun::new` (Yes, the `Tun` gets recreated every time we reconfigure the device)
- [x] Fix Windows Clients
- [x] Fix Gateway
- [x] Make sure connlib doesn't get the DNS control method from the env var (will be fixed in #5068)
- [x] De-dupe consts
- [ ] ~~Add DNS control test~~ (failed)
- [ ] Smoke test Linux
- [ ] Smoke test Windows
```
2024-06-03 14:32:08 +00:00
Gabi
b3d2059cad chore(connlib): split allowed_ips into ipv4 and ipv6 in ClientOnGateway (#5160)
To encode that clients always have both ipv4 and ipv6 and they are the
only allowed source ips for any given client, into the type, we split
those into their specific fields in the `ClientOnGateway` struct and
update tests accordingly.

Furthermore, these will be used for the DNS refactor for ipv6-in-ipv4
and ipv4-in-ipv6 to set the source ip of outgoing packets, without
having to do additional routing or mappings. There will be more notes on
this on the corresponding PR #5049 .

---------

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-05-30 05:51:44 +00:00
Gabi
361aafb746 chore(connlib): upgrade domain version from 0.9 to 0.10 (#5028) 2024-05-20 20:54:22 +00:00
Gabi
c46967e1d6 fix(connlib): resource filter deserialization (#4910)
There was an error on how resource filters were deserialized in the
gateway:

* we always assumed that there would be the ports included but the
portal sends no port down when the "all" range is allowed
* also we didn't support the resource_updated message, this fixes it,
and resources allow-list can be changes in-flight
2024-05-08 00:16:06 +00:00
Gabi
68ece0a940 feat(connlib): traffic filtering (#4779)
This implements traffic filtering on the gateway. Filters are set on the
portal, per-resource, in an allow-list manner.

If no filters exist for a given resource all packets are allowed,
otherwise only packets that matches port/protocol for the filters are
allowed, otherwise they are dropped.

Filters can be either TCP, UDP or ICMP. For the first 2 multiple ports
can be given. Furthermore, multiple filters can exists for the same
resource.

To be able to add and remove filters with the same IP/CIDR we keep
around the whole list of filters for any given peer using an ID map and
recalculate the IP each time something is added is removed.

This allows us to remove filters and simply recalculate the allowlist
for each IP.

Furthermore, for any IP, all rules apply, meaning if there are multiple
IPs that apply for a resource all port/protocol combinations for that IP
will apply.

This works well right now for DNS resources, since access is requested
by DNS name, then the resource for that DNS name will arrive at the
gateway, and the port filtering will apply given that resource(and any
other resource with the same IP).

However, since the client has no idea of the filters, it can't request
the resource access based on the port/protocol combination and we are
still using the most specific("longest match") IP. This will mean that
for overlapping CIDR resources, only the rules for the most specific
will be used, even if the gateway supports applying them all, since it
will not have the other resources. This will be solved in #4789.

It can also lead to some weirdness, let's say that you have 10.0.0.0/24
-> TCP/80 and 10.0.0.0/16 -> TCP/443 for your user.

The user tries to access 10.0.0.1, and will then only be allowed port
80. At some point the user might access 10.1.0.1 and it will be allowed
port 443. But from that point on, the user will be allowed to access 80
and 443 in 10.0.0.1 because the rules correctly work on the gateway, the
problem is the client side. Again, #4789 will fix this.

Left for next PRs (in tentative order!):

- #4792 
- #4789 

Depends on: #4773.
Resolves #2030.
Resolves #4791.

---------

Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
2024-05-07 19:47:49 +00:00
Thomas Eizinger
51089b89e7 feat(connlib): smoothly migrate relayed connections (#4568)
Whenever we receive a `relays_presence` message from the portal, we
invalidate the candidates of all now disconnected relays and make
allocations on the new ones. This triggers signalling of new candidates
to the remote party and migrates the connection to the newly nominated
socket.

This still relies on #4613 until we have #4634.

Resolves: #4548.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-04-20 06:16:35 +00:00
Thomas Eizinger
0f7e80642d chore(snownet): don't update remote socket from WG activity (#4615)
Resolves: #4613.
2024-04-20 00:15:19 +00:00
Thomas Eizinger
bfe07d7ebd chore(connlib): upsert relays from "init" message (#4567)
This is another step towards #4548. The portal now includes a list of
relays as part of the "init" message. Any time we receive an "init", we
will now upsert those relays based on their ID. This requires us to
change our internal bookkeeping of relays from indexing them by address
to indexing by ID.

To ensure that this works correctly, the unit tests are rewritten to use
the new `upsert_relays` API.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-04-15 21:30:49 +00:00
Thomas Eizinger
be1a719e2c chore(relay): perform graceful shutdown upon receiving SIGTERM (#4552)
Upon receiving a SIGTERM, we immediately disconnect from the websocket
connection to the portal and set a flag that we are shutting down.

Once we are disconnected from the portal and no longer have an active
allocations, we exit with 0. A repeated SIGTERM signal will interrupt
this process and force the relay to shutdown.

Disconnecting from the portal will (eventually) trigger a message to
clients and gateways that this relay should no longer be used. Thus,
depending on the timeout our supervisor has configured after sending
SIGTERM, the relay will continue all TURN operations until the number of
allocations drops to 0.

Currently, we also allow clients to make new allocations and refreshing
existing allocations. In the future, it may make sense to implement a
dedicated status code and refuse `ALLOCATE` and `REFRESH` messages
whilst we are shutting down.

Related: #4548.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-04-12 08:45:08 +00:00
Thomas Eizinger
5e871d955b chore(gateway): remove unused derives and messages (#4563) 2024-04-10 09:18:59 +00:00
Thomas Eizinger
03d89fec50 chore(relay): fail health-check with 400 on being partitioned for > 15min (#4553)
During the latest relay outage, we failed to send heartbeats to the
portal because we were busy-looping and never got to handle messages or
timers for the portal.

To mitigate this or similar bugs, we update an `Instant` every time we
send a heartbeat to the portal. In case we are actually
network-partitioned, this will cause the health-check to fail after 15
minutes. This value is the same as the partition timeout for the portal
connection itself[^1]. Very likely, we will never see a relay being
shutdown because of a failing health check in this case as it would have
already shut itself down.

An exception to this are bugs in the eventloop where we fail to interact
with the portal at all.

Resolves: #4510.

[^1]: Previously, this was unlimited.
2024-04-10 02:05:59 +00:00
Thomas Eizinger
a8201abd6e chore(connlib): remove stale code (#4562)
Reducing the number of crates as outlined in #4470 would help with
detecting this sort of unused code because we could make more things
`pub(crate)` which allows the compiler to check whether code is actually
used.

Public API items are never subject to the dead-code analysis of the
compiler because they could be used by other crates.
2024-04-10 02:12:59 +00:00
Thomas Eizinger
e169150ee7 fix(gateway): don't errenously suspend eventloop (#4486)
Within the gateway's eventloop, we MUST only return `Poll::Pending` if
`Waker`s are registered for anything that needs to happen. To ensure
that, we MUST `loop` around our the calls to `poll()` to ensure we drain
everything that is `Poll::Ready`.

Only once all sub-state machines return `Poll::Pending`, we can return
`Poll::Pending`.
2024-04-03 17:24:38 -06:00
Gabi
ee34621ee8 chore(connlib): unit tests for additional fields in messages (#4337)
Fixes #4308
2024-03-28 02:14:02 +00:00
Jamil
228389882e refactor(connlib): delay initialization of Sockets until we have a tokio runtime (#4286)
Our sockets need to be initialized within a tokio runtime context. To
achieve this, we don't actually initialize anything on `Sockets::new`.
Instead, we call `rebind` within the constructor of `Tunnel` which
already runs in a tokio context.

Fixes: #4282

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-03-25 22:51:35 +00:00
Thomas Eizinger
e628fa5d06 refactor(connlib): implement new FFI guidelines (#4263)
This updates connlib to follow the new guidelines described in #4262. I
only made the bare-minimum changes to the clients. With these changes
`reconnect` should only be called when the network interface actually
changed, meaning clients have to be updated to reflect that.
2024-03-23 04:13:05 +00:00
Thomas Eizinger
8c1500d03e chore(connlib): tidy up logs and docs (#4265)
Wrong / outdated docs are worse than no docs. This PR removes some of
these stale docs. We may add new docs at a later point again.
2024-03-23 00:52:24 +00:00
Thomas Eizinger
e8f2320d08 fix(gateway): answer with empty list of addresses on DNS resolution failure (#4266)
Currently, a failure during DNS resolution results in the client hanging
during the connection setup. Instead, we fall back to an empty list
which results in an empty DNS query result for the client.

That in turn will make most application consider the DNS request failed.
As far as I know, we don't currently retry these DNS requests, meaning a
user would have to sign-in and out again to fix this state.

Whilst not ideal, I think this is a better behaviour and what we
currently have where the initial connection just hangs.
2024-03-22 22:16:38 +00:00
Thomas Eizinger
2a46fce574 refactor(connlib): remove Result return values from callbacks (#4158)
Currently, an error returned by `Tunnel::poll_next_event` is only
logged. In other words, they are never fatal. This creates a tricky to
understand relationship on what kind of errors should be returned from
callbacks. Because connlib is used on multiple operating systems, it has
no idea how fatal a particular error is.

This PR removes all of these `Result` return values with the following
consequences:

- For Android, we now panic when a callback fails. This is a slight
change in behaviour. I believe that previously, any exception thrown by
a callback into Android was caught and returned as an error. Now, we
panic because in the FFI layer, we don't have any information on how
fatal the error is. For non-fatal errors, the Android app should simply
not throw an exception. The panics will cause the connlib task to be
shut down which triggers an `on_disconnect`.
- For Swift, there is no behaviour change. The FFI layer already did not
support `Result`s for those callbacks. I don't know how exceptions from
Swift are translated across the FFI layer but there is no change to what
we had before.
- For the Tauri client:
- I chose to log errors on ERROR level and continue gracefully for the
DNS resolvers.
- We panic in case the controller channel is full / closed. That should
really never happen in practice though unless we are currently shutting
down the app.

Resolves: #4064.
2024-03-20 02:09:20 +00:00
Thomas Eizinger
62e082d47a refactor(connlib): make {Client,Gateway}State SANS-IO (#4096)
Resolves: #3929.
2024-03-14 23:44:36 +00:00
Thomas Eizinger
9767bddcca feat(gateway): add HTTP health check (#4120)
This adds the same kind of HTTP health-check that is already present in
the relay to the gateway. The health-check returns 200 OK for as long as
the gateway is active. The gateway automatically shuts down on fatal
errors (like authentication failures with the portal).

To enable this, I've extracted a crate `http-health-check` that shares
this code between the relay and the gateway.

Resolves: #2465.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-03-13 21:05:21 +00:00
Thomas Eizinger
407d20d817 refactor(connlib): use phoenix-channel crate for clients (#3682)
Depends-On: #4048.
Depends-On: #4015.

Resolves: #2158.

---------

Co-authored-by: conectado <gabrielalejandro7@gmail.com>
2024-03-12 08:10:56 +00:00
Thomas Eizinger
fdb33674cd refactor(connlib): introduce LoginUrl component (#4048)
Currently, we are passing a lot of data into `Session::connect`. Half of
this data is only needed to construct the URL we will use to connect to
the portal. We can simplify this by extracting a dedicated `LoginUrl`
component that captures and validates this data early.

Not only does this reduce the number of parameters we pass to
`Session::connect`, it also reduces the number of failure cases we have
to deal with in `Session::connect`. Any time the session fails, we have
to call `onDisconnected` to inform the client. Thus, we should perform
as much validation as we can early on. In other words, once
`Session::connect` returns, the client should be able to expect that the
tunnel is starting.
2024-03-09 09:35:15 +00:00
Thomas Eizinger
4339030d03 refactor(phoenix-channel): reduce Error to fatal errors (#4015)
As part of doing https://github.com/firezone/firezone/pull/3682, we
noticed that the handling of errors up to the clients needs to
differentiate between fatal errors that require clearing the token vs
not.

Upon closer inspection of `phoenix_channel::Error`, it becomes obvious
that the current design is not good here. In particular, we handle
certain errors with retries internally but still expose those same
errors.

To make this more obvious, we reduce the public `Error` to the variants
that are actually fatal. Those can really only be three:

- HTTP client errors (those are by definition non-retryable)
- Token expired
- We have reached our max number of retries
2024-03-09 08:03:25 +00:00
Thomas Eizinger
0ed2480ac0 refactor(connlib): merge control_protocol::gateway into gateway module (#3984)
This separation doesn't really hold anymore as we already have an `impl
Tunnel` and `impl GatewayState` within `gateway.rs`. It is easier to
maintain if more gateway-specific things are in `gateway.rs`. Plus, once
we integrate the portal connection into the tunnel, we can collapse a
lot of these APIs.
2024-03-06 19:37:09 +00:00
Andrew Dryga
bfe1fb0ff4 refactor(portal): unify format of error payloads in websocket connection (#3697)
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-02-28 23:06:52 +00:00
Gabi
77b00b3be9 feat(connlib): support resource updates from the portal (#3754)
This PR doesn't yet provide support for the update of upstream DNS but
it does provide support for all the other resources update messages.

Should comply with the description of issue #2022 but it doesn't respond
to DNS upstream updates which is imply it should on the issue title

---------

Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-02-27 03:24:14 +00:00
Gabi
5edd195320 refactor(connlib): unify peer storage (#3738)
Now that we have `&mut` access everywhere in the tunnel, the remaining
shared-memory and locks are in how we store peers. To resolve this, we
introduce a new `PeerStore` that allows us to look up peers by IP and by
ID.
2024-02-26 16:07:38 +00:00
Thomas Eizinger
e766407dfb feat!(portal): return relays as plain socket addresses (#3665)
Extracted out of #3391.

We don't actually need this for #3391 though because we've added a
compatibility layer during deserialization. But, it will be good to
remove that compat layer at some point which means we have to return the
addresses as plain socket addresses. Because that is a breaking change,
I decided to extract this into a different PR.

Co-authored-by: conectado <gabrielalejandro7@gmail.com>

---------

Co-authored-by: conectado <gabrielalejandro7@gmail.com>
2024-02-21 01:31:03 +00:00
Gabi
3d3e737ba3 refactor(connlib): replace webrtc-rs with snownet (#3391)
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

Resolves: #3377.

---------

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-02-20 06:56:31 +00:00
Gabi
55e4fb100f fix(gateway): re-implement resource address resolution in eventloop (#3656)
Reimplements what #3654 reverted with a fix

---------

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-02-15 20:51:59 +00:00
Reactor Scram
085351f455 revert: 3622 to fix failing DNS CI test (#3654)
Reverts #3622 I don't know why, but that change seemed to cause the
`/etc/resolv.conf` test to fail in CI and I was thinking of the "roll
back first" principle
https://cloud.google.com/blog/products/gcp/reliable-releases-and-rollbacks-cre-life-lessons

~~I also change one `ping` in CI to `until ping`. This was an earlier
attempt before I did the revert, and it seems safe to leave it in.~~
2024-02-15 19:26:34 +00:00
Thomas Eizinger
f42aa862a8 refactor(gateway): perform DNS resolution of resources in eventloop (#3622)
With #3391, constructing a new tunnel will no longer be `async` which
makes DNS resolution the only `async` component of
`set_peer_connection_request`. In general, adding resources as part of
setting up a connection is a duplicated of the logic within
`allow_access`.

We solve both of these problems at once by moving the DNS resolution out
of `connlib` into the `gateway` binary and perform it as part of the
eventloop during a connection setup.
2024-02-15 01:40:44 +00:00
Thomas Eizinger
6c2fdcfd0a chore: bump Rust version to 1.76 (#3632) 2024-02-13 17:01:22 +00:00
Thomas Eizinger
d550c9da89 refactor(connlib): remove unnecessary Serialize derive (#3595)
These messages are only deserialized, never serialized. The `derive` can
thus be removed.

Extracted from: #3391.
2024-02-07 19:54:25 +00:00
Andrew Dryga
a211f96109 feat(portal): Broadcast state changes to connected clients and gateways (#2240)
# Gateways
- [x] When Gateway Group is deleted all gateways should be disconnected
- [x] When Gateway Group is updated (eg. routing) broadcast to all
affected gateway to disconnect all the clients
- [x] When Gateway is deleted it should be disconnected
- [x] When Gateway Token is revoked all gateways that use it should be
disconnected

# Relays
- [x] When Relay Group is deleted all relays should be disconnected
- [x] When Relay is deleted it should be disconnected
- [x] When Relay Token is revoked all gateways that use it should be
disconnected

# Clients
- [x] Remove Delete Client button, show clients using the token on the
Actors page (#2669)
- [x] When client is deleted disconnect it
- [ ] ~When Gateway is offline broadcast to the Clients connected to it
it's status~
- [x] Persist `last_used_token_id` in Clients and show it in tokens UI

# Resources
- [x] When Resource is deleted it should be removed from all gateways
and clients
- [x] When Resource connection is removed it should be deleted from
removed gateway groups
- [x] When Resource is updated (eg. traffic filters) all it's
authorizations should removed

# Authentication
- [x] When Token is deleted related sessions are terminated
- [x] When an Actor is deleted or disabled it should be disconnected
from browser and client
- [x] When Identity is deleted it's sessions should be disconnected from
browser and client
- [x] ^ Ensure the same happens for identities during IdP sync
- [x] When IdP is disabled act like all actors for it are disabled?
- [x] When IdP is deleted act like all actors for it are deleted?

# Authorization
- [x] When Policy is created clients that gain access to a resource
should get an update
- [x] When Policy is deleted we need to all authorizations it's made
- [x] When Policy is disabled we need to all authorizations it's made
- [x] When Actor Group adds or removes a user, related policies should
be re-evaluated
- [x] ^ Ensure the same happens for identities during IdP sync

# Settings
- [x] Re-send init message to Client when DNS settings change

# Code
- [x] Crear way to see all available topics and messages, do not use
binary topics any more

---------

Co-authored-by: conectado <gabrielalejandro7@gmail.com>
2024-02-01 11:02:13 -06:00
Thomas Eizinger
35cdf84578 feat(gateway): don't print stacktrace upon exit (#3404)
Previously, we would print the following whenever the gateway exits:

```
2024-01-25T17:37:53.258145Z  INFO init{user_agent="Alpine Linux/3.19.0 (x86_64;6.6.11;) connlib/1.0.0" login_topic="gateway"}: phoenix_channel: Connected to portal, waiting for `init` message
2024-01-25T17:37:53.260751Z  WARN init{user_agent="Alpine Linux/3.19.0 (x86_64;6.6.11;) connlib/1.0.0" login_topic="gateway"}: phoenix_channel: Fatal client error (401 Unauthorized) in portal connection: Invalid token

Error: websocket failed

Caused by:
    HTTP error: 401 Unauthorized

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.79/src/error.rs:565:25
   1: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/result.rs:1963:27
   2: firezone_gateway::run::{{closure}}
             at /build/gateway/src/main.rs:85:26
   3: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/future/future.rs:125:9
   4: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:328:17
   5: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/loom/std/unsafe_cell.rs:16:9
   6: tokio::runtime::task::core::Core<T,S>::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:317:13
   7: tokio::runtime::task::harness::poll_future::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:485:19
   8: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panic/unwind_safe.rs:272:9
   9: std::panicking::try::do_call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
  10: __rust_try
  11: std::panicking::try
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  12: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  13: tokio::runtime::task::harness::poll_future
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:473:18
  14: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:208:27
  15: tokio::runtime::task::harness::Harness<T,S>::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:153:15
  16: tokio::runtime::task::raw::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/raw.rs:271:5
  17: tokio::runtime::task::raw::RawTask::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/raw.rs:201:18
  18: tokio::runtime::task::LocalNotified<S>::run
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/mod.rs:416:9
  19: tokio::runtime::scheduler::multi_thread::worker::Context::run_task::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:576:13
  20: tokio::runtime::coop::with_budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:107:5
  21: tokio::runtime::coop::budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:73:5
  22: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:575:9
  23: tokio::runtime::scheduler::multi_thread::worker::Context::run
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:526:24
  24: tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:491:21
  25: tokio::runtime::context::scoped::Scoped<T>::set
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/scoped.rs:40:9
  26: tokio::runtime::context::set_scheduler::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context.rs:176:26
  27: std::thread::local::LocalKey<T>::try_with
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/thread/local.rs:270:16
  28: std::thread::local::LocalKey<T>::with
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/thread/local.rs:246:9
  29: tokio::runtime::context::set_scheduler
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context.rs:176:9
  30: tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:486:9
  31: tokio::runtime::context::runtime::enter_runtime
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/runtime.rs:65:16
  32: tokio::runtime::scheduler::multi_thread::worker::run
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:478:5
  33: tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:447:45
  34: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/blocking/task.rs:42:21
  35: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:328:17
  36: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/loom/std/unsafe_cell.rs:16:9
  37: tokio::runtime::task::core::Core<T,S>::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:317:13
  38: tokio::runtime::task::harness::poll_future::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:485:19
  39: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panic/unwind_safe.rs:272:9
  40: std::panicking::try::do_call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
  41: __rust_try
  42: std::panicking::try
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  43: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  44: tokio::runtime::task::harness::poll_future
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:473:18
  45: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:208:27
  46: tokio::runtime::task::harness::Harness<T,S>::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:153:15
  47: tokio::runtime::task::raw::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/raw.rs:271:5
  48: tokio::runtime::task::raw::RawTask::poll
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/raw.rs:201:18
  49: tokio::runtime::task::UnownedTask<S>::run
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/mod.rs:453:9
  50: tokio::runtime::blocking::pool::Task::run
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/blocking/pool.rs:159:9
  51: tokio::runtime::blocking::pool::Inner::run
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/blocking/pool.rs:513:17
  52: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/blocking/pool.rs:471:13
  53: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:154:18
  54: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/thread/mod.rs:529:17
  55: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panic/unwind_safe.rs:272:9
  56: std::panicking::try::do_call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
  57: __rust_try
  58: std::panicking::try
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  59: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  60: std::thread::Builder::spawn_unchecked_::{{closure}}
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/thread/mod.rs:528:30
  61: core::ops::function::FnOnce::call_once{{vtable.shim}}
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
  62: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2007:9
  63: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2007:9
  64: std::sys::unix::thread::Thread::new::thread_start
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys/unix/thread.rs:108:17
```

Now, we are just printing this:

```
2024-01-25T17:32:51.613258Z  INFO init{user_agent="Alpine Linux/3.19.0 (x86_64;6.6.11;) connlib/1.0.0" login_topic="gateway"}: phoenix_channel: Connected to portal, waiting for `init` message
2024-01-25T17:32:51.617971Z  WARN init{user_agent="Alpine Linux/3.19.0 (x86_64;6.6.11;) connlib/1.0.0" login_topic="gateway"}: phoenix_channel: Fatal client error (401 Unauthorized) in portal connection: Invalid token
2024-01-25T17:32:51.619680Z ERROR firezone_gateway: websocket failed: HTTP error: 401 Unauthorized
```

Resolves: #3401.
2024-01-26 00:12:32 +00:00
Thomas Eizinger
f9f95677d5 feat: automatically rejoin channel on portal after reconnect (#3393)
In https://github.com/firezone/firezone/pull/3364, we forgot to rejoin
the channel on the portal. Additionally, I found a way to detect the
disconnect even more quickly.
2024-01-25 02:05:15 +00:00
Thomas Eizinger
6b789d6932 feat(phoenix-channel): automatically reconnect based on provided ExponentialBackoff (#3364)
Currently, only the gateway has a reconnect logic for (transient) errors
when connecting to the portal. Instead of duplicating this for the
relay, I moved the reconnect state machine to `phoenix-channel`. This
means the relay now automatically gets it too and in the future, the
clients will also benefit from it.

As a nice benefit, this also greatly simplifies the gateway's
`Eventloop` and removes a bunch of cruft with channels.

Resolves: #2915.
2024-01-24 16:39:53 +00:00
Gabi
0629afce3a connlib: make dns request in a new task without blocking peers (#3370)
This required making `allow_access` `async` which is ugly, but we can
fix it later like we did it with `set_peer_connection_request`, but
doing this ASAP otherwise this would block the `peers_by_ip` struct and
also block the executor a bunch of times and slow everything down.

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-01-24 00:17:50 +00:00
Gabi
7233ccdc0a gateway(fix): accept nil expiration times (#3288)
Fixes #3240
2024-01-17 21:13:11 +00:00
Jamil
1251397651 fix(ios/android): Pass device name and os version as overrides over connect (#3036)
Fixes #3035 
Fixes #3037 

# Before

<img width="738" alt="Screenshot 2023-12-28 at 8 05 31 AM"
src="https://github.com/firezone/firezone/assets/167144/c7ab4d74-672c-4536-97fe-f75d8d158bfb">

<img width="546" alt="Screenshot 2023-12-28 at 6 12 30 PM"
src="https://github.com/firezone/firezone/assets/167144/1bd4ba98-d11d-4277-bd14-b0afcdf78119">

# After

<img width="742" alt="Screenshot 2023-12-28 at 10 48 31 AM"
src="https://github.com/firezone/firezone/assets/167144/96054f82-069f-47f7-862c-986455ef76c0">
<img width="744" alt="Screenshot 2023-12-28 at 6 29 37 PM"
src="https://github.com/firezone/firezone/assets/167144/4ffc19b6-7c87-4ccb-bcfe-cb0e76fe95b7">
2024-01-03 20:08:33 +00:00
Gabi
73823ecba0 Fix/firezone id handling (#2958)
fixes #2651 

Wip because firezone portal doesn't handle names longer than 8
characters yet cc @AndrewDryga
2023-12-19 15:38:27 -06:00