mirror of
https://github.com/outbackdingo/firezone.git
synced 2026-01-27 18:18:55 +00:00
Our `phoenix-channel` component is responsible for maintaining a WebSocket connection to the portal. In case that connection fails, we want to reconnect to it using an exponential backoff, eventually giving up after a certain amount of time. Unfortunately, the code we have today doesn't quite do that. An `ExponentialBackoff` has a setting for the `max_elapsed_time`. Regardless of how many and how often we retry something, we won't ever wait longer than this amount of time. For the Relay, this is set to 15min. For other components its indefinite (Gateway, headless-client), or very long (30 days for Android, 1 day for Apple). The point in time from which this duration is counted is when the `ExponentialBackoff` is **constructed** which translates to when we **first** connected to the portal. As a result, our backoff would immediately fail on the first error if it has been longer than `max_elapsed_time` since we first connected. For most components, this codepath is not relevant because the `max_elapsed_time` is so long. For the Relay however, that is only 15 minutes so chances are, the Relay would immediately fail (and get rebooted) on the first connection error with the portal. To fix this, we now lazily create the `ExponentialBackoff` on the first error. This bug has some interesting consequences: When a relay reboots, it looses all its state, i.e. allocations, channel bindings, available nonces etc, stamp-secret. Thus, all credentials and state that got distributed to Clients and Gateways get invalidated, causing disconnects from the Relay. We have observed these alerts in Sentry for a while and couldn't explain them. Most likely, this is the root cause for those because whilst a Relay disconnects, the portal also cannot detect its presence and pro-actively inform Clients and Gateways to no longer use this Relay.
Connlib
Firezone's connectivity library shared by all clients.
Building Connlib
You shouldn't need to build connlib directly; it's typically built as a dependency of one of the other Firezone components. See READMEs in those directories for relevant instructions.