firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 18:18:55 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	a297c6dbbd	chore: differentiate between `shutdown` and `shut down` (#10494 ) In a prior code review, CoPilot flagged that we were using the noun "shutdown" as a verb in certain places. Resolves: #10425	2025-10-01 02:55:22 +00:00
Thomas Eizinger	685acdac3a	feat: add more specific component type to user-agent header (#10457 ) In order to allow the portal to more easily classify, what kind of component is connecting, we extend the `get_user_agent` header to include a component type instead of the generic `connlib/`. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2025-09-26 00:18:36 +00:00
Thomas Eizinger	aa68029a33	feat(gateway): use hickory resolver to resolve A/AAAA queries (#10373 ) At present, the Gateway performs DNS resolution for A & AAAA queries via `libc`. The `resolve` system call only provides us with the resolved IPs but not any of the metadata around the query such as TTL. As a result, we can only cache DNS queries for a static amount of time, currently 30s. It would be more correct to cache them for their TTL instead. To do so, we re-introduce `hickory-resolver` to our codebase. Deliberately, we only use it for resolving A and AAAA records on the Gateway for now. DNS resolution for SRV & TXT records happens one layer below and uses the same infrastructure as DNS resolution on the Client. Merging this is difficult however because the Gateway still supports the control protocol of 1.3.x clients. That one requires DNS resolution prior to setting up the connection of DNS resources which means it needs to happen in the event-loop of the Gateway binary and cannot be moved into the `Tunnel` where DNS resolution for Client and SRV/TXT records happen. Once we can drop support for 1.3.x clients, this Gateway's event-loop will simplify drastically which will allow us to refactor this to a more unified approach of DNS resolution. Until then, we can at least fix the hardcoded TTL by using `hickory-resolver` in the event-loop. The functionality is guarded behind a feature-flag which - as usual - is off by default (i.e. for as long as we haven't fetched the flags). The feature flag is already configured to `true` for staging and production so we can test the new behaviour. Resolves: #8232 Related: #10385	2025-09-23 06:00:16 +00:00
Thomas Eizinger	8e00870942	refactor(gateway): close connections on error (#10401 ) Previously, the Gateway would only proactively close connections to its peers when it was shutdown gracefully via a SIGTERM or SIGINT signal. By copying the same design for the event-loop as I've implemented in #10400, we can now also initiate the graceful shutdown in case the event-loop exits with an error.	2025-09-20 20:55:48 +00:00
Thomas Eizinger	88e801ad97	fix(gateway): re-join topic in phoenix-channel on error (#10397 ) For whatever reason, we seem to sometimes lose the association with the "room" we are meant to be in in order to send messages to the portal. Without joining the right room, messages get dropped silently. To fix this, we re-join the room on such errors. Long-term, this will be fixed by ditching phoenix-channel in favor of simple HTTP requests. Related: #9649	2025-09-20 05:14:12 +00:00
Thomas Eizinger	90d10a8634	refactor(connlib): improve fairness of event-loop (#10347 ) The event-loop inside `Tunnel` processes input according to a certain priority. We only take input from lower priority sources when the higher priority sources are not ready. The current priorities are: - Flush all buffers - Read from UDP sockets - Read from TUN device - Read from DNS servers - Process recursive DNS queries - Check timeout The idea of this priority ordering is to keep all kinds of processing bounded and "finish" any kind of work that is on-going before taking on new work. Anything that sits in a buffer is basically done with processing and just needs to be written out to the network / device. Arriving UDP packets have already traversed the network and been encrypted on the other end, meaning they are higher priority than reading from the TUN device. Packets from the TUN device still need to be encrypted and sent to the remote. Whilst there is merit in this design, it also bears the potential of starving input sources further down if the top ones are extremely busy. To prevent this, we refactor `Io` to read from all input sources and present it to the event-loop as a batch, allowing all sources to make progress before looping around. Since this event-loop has first been conceived, we have refactored `Io` to use background threads for the UDP sockets and TUN device, meaning they will make progress by themselves anyway until the channels to the main-thread fill up. As such, there shouldn't be any latency increase in processing packets even though we are performing slightly more work per event-loop tick. This kind of batch-processing highlights a problem: Bailing out with an error midway through processing a batch leaves the remainder of the batch unprocessed, essentially dropping packets. To fix this, we introduce a new `TunnelError` type that presents a collection of errors that we encountered while processing the batch. This might actually also be a problem with what is currently in `main` because we are already batch-processing packets there but possibly are bailing out midway through the batch. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Mariusz Klochowicz <mariusz@klochowicz.com>	2025-09-17 23:28:36 +00:00
Thomas Eizinger	3e6094af8d	feat(linux): try to set `rmem_max` and `wmem_max` on startup (#10349 ) The default send and receive buffer sizes on Linux are too small (only ~200 KB). Checking `nstat` after an iperf run revealed that the number of dropped packets in the first interval directly correlates with the number of receive buffer errors reported by `nstat`. We already try to increase the send and receive buffer sizes for our UDP socket but unfortunately, we cannot increase them beyond what the system limits them to. To workaround this, we try to set `rmem_max` and `wmem_max` during startup of the Linux headless client and Gateway. This behaviour can be disabled by setting `FIREZONE_NO_INC_BUF=true`. This doesn't work in Docker unfortunately, so we set the values manually in the CI perf tests and verify after the test that we didn't encounter any send and receive buffer errors. It is yet to be determined how we should deal with this problem for all the GUI clients. See #10350 as an issue tracking that. Unfortunately, this doesn't fix all packet drops during the first iperf interval. With this PR, we now see packet drops on the interface itself.	2025-09-17 23:05:01 +00:00
Thomas Eizinger	69afe71215	refactor(connlib): remove concept of "ReplyMessages" (#10361 ) In earlier versions of Firezone, the WebSocket protocol with the portal was using the request-response semantics built into Phoenix. This however is quite cumbersome to work with to due to the polymorphic nature of the protocol design. We ended up moving away from it and instead only use one-way messages where each event directly corresponds to a message type. However, we have never removed the capability reply messages from the `phoenix-channel` module, instead all usages just set it to `()`. We can simplify the code here by always setting this to `()`. Resolves: #7091	2025-09-17 04:10:56 +00:00
Firezone Bot	cacef44b4b	chore: publish gateway 1.4.16 (#10321 )	2025-09-10 04:50:43 +00:00
Thomas Eizinger	e84bdc5566	refactor(connlib): periodically record queue depths (#10242 ) Instead of recording the queue depths on every event-loop tick, we now record them once a second by setting a Gauge. Not only is that a simpler instrument to work with but it is significantly more performant. The current version - when metrics are enabled - takes on quite a bit of CPU time. Resolves: #10237	2025-09-02 02:57:36 +00:00
Thomas Eizinger	533f4c319b	feat(connlib): gracefully shutdown connections (#10076 ) Right now, connections cannot be actively closed in Firezone. The WireGuard tunnel and the ICE agent are coupled together, meaning only if either one of them fails will we clean up the connection. One exception here is when the Client roams. In that case, the Client simply clears its local memory completely and then re-establishes all necessary connections by re-requesting access. There are three cases where gracefully closing a connection is useful: 1. If an access authorization is revoked or expires and this was the last resource authorisation for that peer, we don't currently remove the connection on the Gateway. Instead, the Client is still able to send packets by they'll be dropped because we don't have a peer state anymore. 1. If a Gateway gets restarted due to e.g. an upgrade or other maintenance work, it loses all its connections and every Client needs to wait for the ICE timeout (~15 seconds) before it can establish a new one. 1. If a Client has its access revoked for all resources it has access to in a particular site we also don't remove this connection, even though it has become practically useless. All of these cases are fixed with this PR. Here we introduce a way to gracefully shutdown a connection without forcing the other side into an ICE timeout. The graceful connection shutdown works by introducing a new "goodbye" p2p control protocol message. Like all our p2p control protocol messages, this is based on IP and therefore delivery is not guaranteed. In other words, this "goodbye" message is sent on a best-effort basis. In the case of shutdown, the Gateway will wait for all UDP packets to be flushed but will not resend them or wait for an ACK. If either end receives such a "goodbye" message, they simply remove the local peer and connection state just as if the connection would have failed due to either ICE or WireGuard. For the Client, this means that the next packet for a resource will trigger a new access authorization request.	2025-09-01 06:30:13 +00:00
Thomas Eizinger	9cddfe59fa	fix(rust): don't require Internet on startup (#10264 ) With the introduction of the pre-resolved Sentry host, all Firezone clients now require Internet on startup. That is a signficant usability hit that we can easily fix by simply falling back to resolving the host on-demand.	2025-09-01 01:31:05 +00:00
Thomas Eizinger	46afa52f78	feat(telemetry): pre-resolve Sentry ingest host (#10206 ) Our Sentry client needs to resolve DNS before being able to send logs or errors to the backend. Currently, this DNS resolution happens on-demand as we don't take any control of the underlying HTTP client. In addition, this will use HTTP/1.1 by default which isn't as efficient as it could be, especially with concurrent requests. Finally, if we decide to ever proxy all Sentry for traffic through our own domain, we have to take control of the underlying client anyway. To resolve all of the above, we create a custom `TransportFactory` where we reuse the existing `ReqwestHttpTransport` but provide an already configured `reqwest::Client` that always uses HTTP/2 with a pre-configured set of DNS records for the given ingest host.	2025-08-21 03:28:05 +00:00
Thomas Eizinger	b4cbc4f33b	fix(connlib): exit phoenix-channel event-loop on error (#10229 ) We cannot poll the `PhoenixChannel` after it has returned an error, otherwise it will panic. Therefore, we exit the event-loop then. The outer event-loop also exits as soon as it receives an error from this channel so this is fine. `PhoenixChannel` only returns an error when it has irrecoverably disconnected, e.g. after the retries have been exhausted or we hit a 4xx error on the WebSocket connection. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-08-21 03:25:46 +00:00
Thomas Eizinger	4e11112d9b	feat(connlib): improve throughput on higher latencies (#10231 ) Turns out the multi-threaded access of the TUN device on the Gateway causes packet reordering which makes the TCP congestion controller throttle the connection. Additionally, the default TX queue length of a TUN device on Linux is only 500 packets. With just a single thread and an increased TX queue length, we get a throughput performance of just over 1 GBit/s for a 20ms link between Client and Gateway with basically no packet drops: ``` Connecting to host 172.20.0.110, port 5201 [ 5] local 100.79.130.70 port 49546 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 116 MBytes 977 Mbits/sec 0 6.40 MBytes [ 5] 1.00-2.00 sec 137 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 2.00-3.00 sec 134 MBytes 1.13 Gbits/sec 0 6.40 MBytes [ 5] 3.00-4.00 sec 136 MBytes 1.14 Gbits/sec 47 6.40 MBytes [ 5] 4.00-5.00 sec 137 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 5.00-6.00 sec 138 MBytes 1.16 Gbits/sec 0 6.40 MBytes [ 5] 6.00-7.00 sec 138 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 7.00-8.00 sec 138 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 8.00-9.00 sec 138 MBytes 1.16 Gbits/sec 0 6.40 MBytes [ 5] 9.00-10.00 sec 138 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 10.00-11.00 sec 139 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 11.00-12.00 sec 139 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 12.00-13.00 sec 136 MBytes 1.14 Gbits/sec 0 6.40 MBytes [ 5] 13.00-14.00 sec 139 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 14.00-15.00 sec 140 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 15.00-16.00 sec 138 MBytes 1.16 Gbits/sec 0 6.40 MBytes [ 5] 16.00-17.00 sec 137 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 17.00-18.00 sec 139 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 18.00-19.00 sec 138 MBytes 1.16 Gbits/sec 0 6.40 MBytes [ 5] 19.00-20.00 sec 136 MBytes 1.14 Gbits/sec 0 6.40 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 2.67 GBytes 1.15 Gbits/sec 47 sender [ 5] 0.00-20.02 sec 2.67 GBytes 1.15 Gbits/sec receiver iperf Done. ``` For further debugging in the future, we are now recording the send and receive queue depths of both the TUN device and the UDP sockets. Neither of those showed to be full in my testing which leads me to conclude that it isn't any buffer inside Firezone that is too small here. Related: #7452 --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2025-08-20 23:08:56 +00:00
Thomas Eizinger	6f4242769a	refactor(connlib): move gw phoenix-channel to separate task (#10211 ) Similar to #10210, we also move the phoenix-channel to a separate task for the Gateway's and connect it with channels to the event-loop. Related: #10003 --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-08-18 14:55:02 +00:00
Firezone Bot	3e529ed36c	chore: publish gateway 1.4.15 (#10134 )	2025-08-05 17:17:25 +10:00
Thomas Eizinger	cd177a6448	fix(gateway): don't remove peer state on disconnect (#10040 ) When the connection to a Client disappears, the Gateway currently clears all state related to this peer. Whilst eagerly cleaning up memory can be good, in this case, it may lead to the Client thinking it has access to a resource when in reality it doesn't. Just because the connection to a Client failed doesn't mean their access authorizations are invalid. In case the Client reconnects, it should be able to just continue sending traffic. At the moment, this only works if the connection also failed on the Client and therefore, its view of the world in regards to "which resources do I have access to" was also reset. What we are seeing in Sentry reports though is that Clients are attempting to access these resources, thinking they have access but the Gateway denies it because it has lost the access authorization state.	2025-08-02 08:27:49 +00:00
Thomas Eizinger	69f9a03ee8	refactor(connlib): simplify `IpPacket` struct (#9795 ) With the removal of the NAT64/46 modules, we can now simplify the internals of our `IpPacket` struct. The requirements for our `IpPacket` struct are somewhat delicate. On the one hand, we don't want to be overly restrictive in our parsing / validation code because there is a lot of broken software out there that doesn't necessarily follow RFCs. Hence, we want to be as lenient as possible in what we accept. On the other hand, we do need to verify certain aspects of the packet, like the payload lengths. At the moment, we are somewhat too lenient there which causes errors on the Gateway where we have to NAT or otherwise manipulate the packets. See #9567 or #9552 for example. To fix this, we make the parsing in the `IpPacket` constructor more restrictive. If it is a UDP, TCP or ICMP packet, we attempt to fully parse its headers and validate the payload lengths. This parsing allows us to then rely on the integrity of the packet as part of the implementation. This does create several code paths that can in theory panic but in practice, should be impossible to hit. To ensure that this does in fact not happen, we also tackle an issue that is long overdue: Fuzzing. Resolves: #6667 Resolves: #9567 Resolves: #9552	2025-07-29 04:42:57 +00:00
Thomas Eizinger	5c3b15c1a9	chore(connlib): harmonise naming of IDs (#10038 ) When filtering through logs in Sentry, it is useful to narrow them down by context of a client, gateway or resource. Currently, these fields are sometimes called `client`, `cid`, `client_id` etc and the same for the Gateway and Resources. To make this filtering easier, name all of them `cid` for Client IDs, `gid` for Gateway IDs and `rid` for Resource IDs.	2025-07-29 03:33:09 +00:00
Thomas Eizinger	e9c74b1bfe	chore(connlib): treat `Invalid Argument` as unreachable hosts (#10037 ) These appear to happen on systems that e.g. don't have IPv6 support or where the destination cannot be reached. It is a bit of a catch-all but all the ones I am seeing in Sentry are false-positives. To reduce the noise a bit, we log these on DEBUG now.	2025-07-29 03:04:13 +00:00
Firezone Bot	cf40f4dd96	chore: publish gateway 1.4.14 (#10030 )	2025-07-28 06:14:07 +00:00
Thomas Eizinger	d7b9ecb60b	feat(gateway): update expiry of access authoritzations on init (#9975 ) Resolves: #9971	2025-07-24 06:36:56 +00:00
Thomas Eizinger	301d2137e5	refactor(windows): share src IP cache across UDP sockets (#9976 ) When looking through customer logs, we see a lot of "Resolved best route outside of tunnel" messages. Those get logged every time we need to rerun our re-implementation of Windows' weighting algorithm as to which source interface / IP a packet should be sent from. Currently, this gets cached in every socket instance so for the peer-to-peer socket, this is only computed once per destination IP. However, for DNS queries, we make a new socket for every query. Using a new source port DNS queries is recommended to avoid fingerprinting of DNS queries. Using a new socket also means that we need to re-run this algorithm every time we make a DNS query which is why we see this log so often. To fix this, we need to share this cache across all UDP sockets. Cache invalidation is one of the hardest problems in computer science and this instance is no different. This cache needs to be reset every time we roam as that changes the weighting of which source interface to use. To achieve this, we extend the `SocketFactory` trait with a `reset` method. This method is called whenever we roam and can then reset a shared cache inside the `UdpSocketFactory`. The "source IP resolver" function that is passed to the UDP socket now simply accesses this shared cache and inserts a new entry when it needs to resolve the IP. As an added benefit, this may speed up DNS queries on Windows a bit (although I haven't benchmarked it). It should certainly drastically reduce the amount of syscalls we make on Windows.	2025-07-24 01:36:53 +00:00
Thomas Eizinger	ecb2bbc86b	feat(gateway): allow updating expiry of access authorization (#9973 ) Resolves: #9966	2025-07-23 07:25:36 +00:00
Firezone Bot	a11983e4b3	chore: publish gateway 1.4.13 (#9969 )	2025-07-22 18:56:40 +00:00
Thomas Eizinger	c4457bf203	feat(gateway): shutdown after 15m of portal disconnect (#9894 )	2025-07-18 05:47:30 +00:00
Thomas Eizinger	3e71a91667	feat(gateway): revoke unlisted authorizations upon `init` (#9896 ) When receiving an `init` message from the portal, we will now revoke all authorizations not listed in the `authorizations` list of the `init` message. We (partly) test this by introducing a new transition in our proptests that de-authorizes a certain resource whilst the Gateway is simulated to be partitioned. It is difficult to test that we cannot make a connection once that has happened because we would have to simulate a malicious client that knows about resources / connections or ignores the "remove resource" message. Testing this is deferred to a dedicated task. We do test that we hit the code path of revoking the resource authorization and because the other resources keep working, we also test that we are at least not revoking the wrong ones. Resolves: #9892	2025-07-17 19:04:54 +00:00
Thomas Eizinger	2e0ed018ee	chore: document metrics config switches as private API (#9865 )	2025-07-14 13:53:03 +00:00
Thomas Eizinger	cecca37073	feat(gateway): allow exporting metrics to an OTEL collector (#9838 ) As a first step in preparation for sending OTEL metrics from Clients and Gateways to a cloud-hosted OTEL collector, we extend the CLI of the Gateway with configuration options to provide a gRPC endpoint to an OTEL collector. If `FIREZONE_METRICS` is set to `otel-collector` and an endpoint is configured via `OTLP_GRPC_ENDPOINT`, we will report our metrics to that collector. The future plan for extending this is such that if `FIREZONE_METRICS` is set to `otel-collector` (which will likely be the default) and no `OTLP_GRPC_ENDPOINT` is set, then we will use our own, hosted OTEL collector and report metrics IF the `export-metrics` feature-flag is set to `true`. This is a similar integration as we have done it with streaming logs to Sentry. We can therefore enable it on a similar granularity as we do with the logs and e.g. only enable it for the `firezone` account to start with. In meantime, customers can already make use of those metrics if they'd like by using the current integration. Resolves: #1550 Related: #7419 --------- Co-authored-by: Antoine Labarussias <antoinelabarussias@gmail.com>	2025-07-14 03:54:38 +00:00
Thomas Eizinger	d01701148b	fix(rust): remove jemalloc (#9849 ) I am no longer able to compile `jemalloc` on my system in a debug build. It fails with the following error: ``` src/malloc_io.c: In function ‘buferror’: src/malloc_io.c:107:16: error: returning ‘char *’ from a function with return type ‘int’ makes integer from pointer without a cast [-Wint-conversion] 107 \| return strerror_r(err, buf, buflen); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` This appears to be a problem with modern versions of clang/gcc. I believe this started happening when I recently upgraded my system. The upstream [`jemalloc`](https://github.com/jemalloc/jemalloc) repository is now archived and thus unmaintained. I am not sure if we ever measured a significant benefit in using `jemalloc`. Related: https://github.com/servo/servo/issues/31059	2025-07-12 19:22:06 +00:00
Thomas Eizinger	d6805d7e48	chore(rust): bump to Rust 1.88 (#9714 ) Rust 1.88 has been released and brings with it a quite exciting feature: let-chains! It allows us to mix-and-match `if` and `let` expressions, therefore often reducing the "right-drift" of the relevant code, making it easier to read. Rust.188 also comes with a new clippy lint that warns when creating a mutable reference from an immutable pointer. Attempting to fix this revealed that this is exactly what we are doing in the eBPF kernel. Unfortunately, it doesn't seem to be possible to design this in a way that is both accepted by the borrow-checker AND by the eBPF verifier. Hence, we simply make the function `unsafe` and document for the programmer, what needs to be upheld.	2025-07-12 06:42:50 +00:00
Thomas Eizinger	04499da11e	feat(telemetry): grab env and `distinct_id` from Sentry session (#9801 ) At present, our primary indicator as to whether telemetry is active is whether we have a Sentry session. For our analytics events however, we currently require passing in the Firezone ID and API url again. This makes it difficult to send analytics events from areas of the code that don't have this information available. To still allow for that, we integrate the `analytics` module more tightly with the Sentry session. This allows us to drop two parameters from the `$identify` event and also means we now respect the `NO_TELEMETRY` setting for these events except for `new_session`. This event is sent regardless because it allows us to track, how many on-prem installations of Firezone are out there.	2025-07-10 20:05:08 +00:00
Thomas Eizinger	ec2599d545	chore(rust): simplify stream logs feature (#9780 ) Instead of conditionally enabling the `logs` feature in the Sentry client, we always enable it and control via the `tracing` integration, which events should get forwarded to Sentry. The feature-flag check accesses only shared-memory and is therefore really fast. We already re-evaluate feature flags on a timer which means this boolean will flip over automatically and logs will be streamed to Sentry.	2025-07-04 14:51:53 +00:00
Jamil	a4cf3ead0f	ci: publish gateway 1.4.12 (#9736 )	2025-07-01 14:04:21 +00:00
Jamil	699739deae	fix(docs): use sha256sum over sha256 (#9690 ) `sha256` isn't found by default on some machines.	2025-06-27 20:08:41 +00:00
Thomas Eizinger	6fc2ebe576	chore(gateway): log on startup (#9684 ) As with some of our other applications, it is useful to know when they restart and which version is running. Adding a log on INFO on startup solves this.	2025-06-26 13:59:09 +00:00
Thomas Eizinger	d5be185ae4	chore(rust): remove telemetry spans and events (#9634 ) Originally, we introduced these to gather some data from logs / warnings that we considered to be too spammy. We've since merged a burst-protection that will at most submit the same event once every 5 minutes. The data from the telemetry spans themselves have not been used at all.	2025-06-25 17:15:57 +00:00
Thomas Eizinger	3b972643b1	feat(rust): stream logs to Sentry when enabled in PostHog (#9635 ) Sentry has a new "Logs" feature where we can stream logs directly to Sentry. Doing this for all Clients and Gateways would be way too much data to collect though. In order to aid debugging from customer installations, we add a PostHog-managed feature flag that - if set to `true` - enables the streaming of logs to Sentry. This feature flag is evaluated every time the telemetry context is initialised: - For all FFI usages of connlib, this happens every time a new session is created. - For the Windows/Linux Tunnel service, this also happens every time we create a new session. - For the Headless Client and Gateway, it happens on startup and afterwards, every minute. The feature-flag context itself is only checked every 5 minutes though so it might take up to 5 minutes before this takes effect. The default value - like all feature flags - is `false`. Therefore, if there is any issue with the PostHog service, we will fallback to the previous behaviour where logs are simply stored locally. Resolves: #9600	2025-06-25 16:14:14 +00:00
Thomas Eizinger	91edd11a47	feat(gateway): send `$identify` event with account-slug (#9658 ) When we receive the `account_slug` from the portal, the Gateway now sends a `$identify` event to PostHog. This will allow us to target Gateways with feature-flags based on the account they are connected to.	2025-06-24 11:31:56 +00:00
Thomas Eizinger	a91dda139f	feat(connlib): only conditionally hash firezone ID (#9633 ) A bit of legacy that we have inherited around our Firezone ID is that the ID stored on the user's device is sha'd before being passed to the portal as the "external ID". This makes it difficult to correlate IDs in Sentry and PostHog with the data we have in the portal. For Sentry and PostHog, we submit the raw UUID stored on the user's device. As a first step in overcoming this, we embed an "external ID" in those services as well IF the provided Firezone ID is a valid UUID. This will allow us to immediately correlate those events. As a second step, we automatically generate all new Firezone IDs for the Windows and Linux Client as `hex(sha256(uuid))`. These won't parse as valid UUIDs and therefore will be submitted as is to the portal. As a third step, we update all documentation around generating Firezone IDs to use `uuidgen \| sha256` instead of just `uuidgen`. This is effectively the equivalent of (2) but for the Headless Client and Gateway where the Firezone ID can be configured via environment variables. Resolves: #9382 --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2025-06-24 07:05:48 +00:00
Thomas Eizinger	950afd9b2d	chore(gateway): set account-slug in telemetry context (#9545 ) This PR adds an optional field `account_slug` to the Gateway's init message. If populated, we will use this field to set the account-slug in the telemetry context. This will allow us to know, which customers a particular Sentry issue is related to.	2025-06-23 18:52:39 +00:00
Jamil	081b075f2c	chore: bump gui, apple, gateway (#9586 ) The new publish automation still [has some kinks](https://github.com/firezone/firezone/actions/runs/15764891111) so publishing this manually.	2025-06-19 12:29:46 -07:00
Thomas Eizinger	cc50d58d8c	chore(client,gateway): log portal connection hiccups on INFO (#9557 ) These don't happen very often so are safe to log on INFO. That is the default log level and it is useful to see, why we are re-connecting to the portal.	2025-06-17 14:01:34 +00:00
Jamil	b60d77cef4	chore: publish gateway 1.4.10 (#9412 )	2025-06-05 08:55:13 +00:00
Thomas Eizinger	e05c98bfca	ci: update to new `cargo sort` release (#9354 ) The latest release now also sorts workspace dependencies, as well as different dependency sections. Keeping these things sorted reduces the chances of merge conflicts when multiple PRs edit these files.	2025-06-02 02:01:09 +00:00
Thomas Eizinger	cee4be9e24	build(deps): bump Rust dependencies (#9192 ) A mass upgrade of our Rust dependencies. Most crucially, these remove several duplicated dependencies from our tree. - The Tauri plugins have been stuck on `windows v0.60` for a while. They are now updated to use `windows v0.61` which is what the rest of our dependency tree uses. - By bumping `axum`, can also bump `reqwest` which reduces a few more duplicated dependencies. - By removing `env_logger`, we can get rid of a few dependencies.	2025-05-22 13:15:01 +00:00
Thomas Eizinger	b7451fcdae	chore: release Gateway 1.4.9 (#9132 )	2025-05-14 06:39:03 +00:00
Thomas Eizinger	ea0ad9d089	chore(gateway): log CLI args we got invoked with (#9089 )	2025-05-12 22:10:37 +00:00
Thomas Eizinger	f965487739	chore(connlib): turn down logs for non-fatal IO errors (#9091 ) Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-05-12 11:48:40 +00:00

1 2 3 4 5 ...

269 Commits