firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 18:18:55 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	e84bdc5566	refactor(connlib): periodically record queue depths (#10242 ) Instead of recording the queue depths on every event-loop tick, we now record them once a second by setting a Gauge. Not only is that a simpler instrument to work with but it is significantly more performant. The current version - when metrics are enabled - takes on quite a bit of CPU time. Resolves: #10237	2025-09-02 02:57:36 +00:00
Thomas Eizinger	9cddfe59fa	fix(rust): don't require Internet on startup (#10264 ) With the introduction of the pre-resolved Sentry host, all Firezone clients now require Internet on startup. That is a signficant usability hit that we can easily fix by simply falling back to resolving the host on-demand.	2025-09-01 01:31:05 +00:00
Thomas Eizinger	544ba11f21	chore(rust): allow `too_many_arguments` repo-wide (#10236 ) We always end up allow this lint when it pops up so we can also just allow it for the whole repo in general. Most of the time, the reason for too many arguments are borrow-checker limitations of Rust where mutable references need to be tracked explicitly.	2025-08-22 13:21:07 +00:00
Thomas Eizinger	a109c1a2ef	feat(connlib): discard intermediate resource and TUN updates (#10223 ) Right now, the Client event-loops have a channel with 1000 items for sending new resource lists and updates to the TUN device to the host app. This is kind of unnecessary as we always only care about the last version of these. Intermediate updates that the host app doesn't process are effectively irrelevant. We've had an issue before where a bug in the portal caused us to receive many updates to resources which ended up crashing Client apps because this channel filled up. To be more resilient on this front, we refactor the Client event loop to use a `watch` channel for this. Watch channels only retain the last value that got sent into them.	2025-08-21 05:42:54 +00:00
Thomas Eizinger	46afa52f78	feat(telemetry): pre-resolve Sentry ingest host (#10206 ) Our Sentry client needs to resolve DNS before being able to send logs or errors to the backend. Currently, this DNS resolution happens on-demand as we don't take any control of the underlying HTTP client. In addition, this will use HTTP/1.1 by default which isn't as efficient as it could be, especially with concurrent requests. Finally, if we decide to ever proxy all Sentry for traffic through our own domain, we have to take control of the underlying client anyway. To resolve all of the above, we create a custom `TransportFactory` where we reuse the existing `ReqwestHttpTransport` but provide an already configured `reqwest::Client` that always uses HTTP/2 with a pre-configured set of DNS records for the given ingest host.	2025-08-21 03:28:05 +00:00
Thomas Eizinger	4e11112d9b	feat(connlib): improve throughput on higher latencies (#10231 ) Turns out the multi-threaded access of the TUN device on the Gateway causes packet reordering which makes the TCP congestion controller throttle the connection. Additionally, the default TX queue length of a TUN device on Linux is only 500 packets. With just a single thread and an increased TX queue length, we get a throughput performance of just over 1 GBit/s for a 20ms link between Client and Gateway with basically no packet drops: ``` Connecting to host 172.20.0.110, port 5201 [ 5] local 100.79.130.70 port 49546 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 116 MBytes 977 Mbits/sec 0 6.40 MBytes [ 5] 1.00-2.00 sec 137 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 2.00-3.00 sec 134 MBytes 1.13 Gbits/sec 0 6.40 MBytes [ 5] 3.00-4.00 sec 136 MBytes 1.14 Gbits/sec 47 6.40 MBytes [ 5] 4.00-5.00 sec 137 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 5.00-6.00 sec 138 MBytes 1.16 Gbits/sec 0 6.40 MBytes [ 5] 6.00-7.00 sec 138 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 7.00-8.00 sec 138 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 8.00-9.00 sec 138 MBytes 1.16 Gbits/sec 0 6.40 MBytes [ 5] 9.00-10.00 sec 138 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 10.00-11.00 sec 139 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 11.00-12.00 sec 139 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 12.00-13.00 sec 136 MBytes 1.14 Gbits/sec 0 6.40 MBytes [ 5] 13.00-14.00 sec 139 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 14.00-15.00 sec 140 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 15.00-16.00 sec 138 MBytes 1.16 Gbits/sec 0 6.40 MBytes [ 5] 16.00-17.00 sec 137 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 17.00-18.00 sec 139 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 18.00-19.00 sec 138 MBytes 1.16 Gbits/sec 0 6.40 MBytes [ 5] 19.00-20.00 sec 136 MBytes 1.14 Gbits/sec 0 6.40 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 2.67 GBytes 1.15 Gbits/sec 47 sender [ 5] 0.00-20.02 sec 2.67 GBytes 1.15 Gbits/sec receiver iperf Done. ``` For further debugging in the future, we are now recording the send and receive queue depths of both the TUN device and the UDP sockets. Neither of those showed to be full in my testing which leads me to conclude that it isn't any buffer inside Firezone that is too small here. Related: #7452 --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2025-08-20 23:08:56 +00:00
Thomas Eizinger	0f2cfa2e3c	fix(rust): don't block runtime shutdown (#10204 ) By default, dropping a `tokio` runtime waits until all tasks have finished. The tasks we spawn within `connlib` can have complex dependencies with each other. To ensure that we can shut down in any case and don't hang, we apply a timeout of 1s to the runtime.	2025-08-18 01:59:03 +00:00
Thomas Eizinger	5141817134	feat(connlib): add `reason` argument to `reset` API (#9878 ) In order to provide more detailed logs, why `connlib`'s network state is being reset, we add a `reason` parameter that is gets logged. Resolves: #9867	2025-07-15 13:48:33 +00:00
Thomas Eizinger	2b70596636	fix(rust): only apply filter to select tracing layers (#9872 ) Applying a filter globally to the entire subscriber means it filters events for all layers. This prevents the Sentry layer from uploading DEBUG logs if configured.	2025-07-15 13:44:53 +00:00
Thomas Eizinger	04499da11e	feat(telemetry): grab env and `distinct_id` from Sentry session (#9801 ) At present, our primary indicator as to whether telemetry is active is whether we have a Sentry session. For our analytics events however, we currently require passing in the Firezone ID and API url again. This makes it difficult to send analytics events from areas of the code that don't have this information available. To still allow for that, we integrate the `analytics` module more tightly with the Sentry session. This allows us to drop two parameters from the `$identify` event and also means we now respect the `NO_TELEMETRY` setting for these events except for `new_session`. This event is sent regardless because it allows us to track, how many on-prem installations of Firezone are out there.	2025-07-10 20:05:08 +00:00
Thomas Eizinger	3b972643b1	feat(rust): stream logs to Sentry when enabled in PostHog (#9635 ) Sentry has a new "Logs" feature where we can stream logs directly to Sentry. Doing this for all Clients and Gateways would be way too much data to collect though. In order to aid debugging from customer installations, we add a PostHog-managed feature flag that - if set to `true` - enables the streaming of logs to Sentry. This feature flag is evaluated every time the telemetry context is initialised: - For all FFI usages of connlib, this happens every time a new session is created. - For the Windows/Linux Tunnel service, this also happens every time we create a new session. - For the Headless Client and Gateway, it happens on startup and afterwards, every minute. The feature-flag context itself is only checked every 5 minutes though so it might take up to 5 minutes before this takes effect. The default value - like all feature flags - is `false`. Therefore, if there is any issue with the PostHog service, we will fallback to the previous behaviour where logs are simply stored locally. Resolves: #9600	2025-06-25 16:14:14 +00:00
Thomas Eizinger	d376a122e4	feat(telemetry): send `account_slug` to PostHog (#9636 ) In order to more easily target customers with certain feature flags, we include the `account_slug` in the `$identify` event to PostHog. This will allow us to create Cohorts in PostHog and enable / disable feature flags for all installations of Firezone for a particular customer.	2025-06-24 09:00:24 +00:00
Thomas Eizinger	faeb958882	refactor: use UniFFI for Android FFI (#9415 ) To make our FFI layer between Android and Rust safer, we adopt the UniFFI tool from Mozilla. UniFFI allows us to create a dedicated crate (here `client-ffi`) that contains Rust structs annotated with various attributes. These macros then generate code at compile time that is built into the shared object. Using a dedicated CLI from the UniFFI project, we can then generate Kotlin bindings from this shared object. The primary motivation for this effort is memory safety across the FFI boundary. Most importantly, we want to ensure that: - The session pointer is not used after it has been free'd - Disconnecting the session frees the pointer - Freeing the session does not happen as part of a callback as that triggers a cyclic dependency on the Rust side (callbacks are executed on a runtime and that runtime is dropped as part of dropping the session) To achieve all of these goals, we move away from callbacks altogether. UniFFI has great support for async functions. We leverage this support to expose a `suspend fn` to Android that returns `Event`s. These events map to the current callback functions. Internally, these events are read from a channel with a capacity of 1000 events. It is therefore not very time-critical that the app reads from this channel. `connlib` will happily continue even if the channel is full. 1000 events should be more than sufficient though in case the host app cannot immediately process them. We don't send events very often after all. This event-based design has major advantages: It allows us to make use of `AutoCloseable` on the Kotlin side, meaning the `session` pointer is only ever accessed as part of a `use` block and automatically closed (and therefore free'd) at the end of the block. To communicate with the session, we introduce a `TunnelCommand` which represents all actions that the host app can send to `connlib`. These are passed through a channel to the `suspend fn` which continuously listens for events and commands. Resolves: #9499 Related: #3959 --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>	2025-06-17 21:48:34 +00:00

13 Commits