firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 10:18:54 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	bc2febed99	fix(connlib): use correct constant for truncating DNS responses (#7551 ) In case an upstream DNS server responds with a payload that exceeds the available buffer space of an IP packet, we need to truncate the response. Currently, this truncation uses the wrong constant to check for the maximum allowed length. Instead of the `MAX_DATAGRAM_PAYLOAD`, we actually need to check against a limit that is less than the MTU as the IP layer and the UDP layer both add an overhead. To fix this, we introduce such a constant and provide additional documentation on the remaining ones to hopefully avoid future errors.	2024-12-19 17:15:43 +00:00
Thomas Eizinger	af1834d0e5	build(deps): bump quinn-udp to 0.5.9 (#7556 ) This release disables URO/GRO on Windows entirely due to hardware / driver bugs. Related: https://github.com/quinn-rs/quinn/issues/2041.	2024-12-19 15:19:54 +00:00
Jamil	56c592a27b	fix(apple): Make clear logs and log size functions work across the IPC boundary (#7467 ) The macOS client starting in 1.4.0 uses a system extension for its network extension package type. This process runs as root and does not have access to the app's Group Container folder for reading / writing log files directly, and vice-versa. This means the tunnel now writes its logs to a separate directory as the GUI app process. Since the logging functions of clearing logs, calculating their size, and exporting them assume all the logs are in the same directory, we need to introduce IPC handlers to ensure the GUI app can conveniently still perform these functions when initiated by the user. We already use the Network Extension API `sendProviderMessage` as our IPC mechanism for adhoc, bi-directional communication through the tunnel, so we add more handlers to this mechanism to support the logging functions summarized above. In this PR we only fix the log size calculation and clear log functionality. Exporting logs is more involved and will be implemented in another dedicated PR.	2024-12-18 20:36:02 +00:00
Thomas Eizinger	a1cf409af3	fix(connlib): clear all in-flight upstream DNS queries on reset (#7552 ) When a Firezone Client roams, we reset all network connections and rebind our local sockets. Doing that enables us to start from a clean state and establish new connections to Gateways. What we are currently not clearing are in-flight DNS queries. Those are all very likely to fail because our network connection is changing. There is no point in us keeping those around. Additionally, as part of roaming, it may also be that our upstream DNS server changes and thus, we may suddenly receive a response from a DNS server that we no longer know about. Clear all in-flight DNS queries on reset solves this.	2024-12-18 20:35:30 +00:00
Thomas Eizinger	d39b6ff1b9	chore(gateway): don't log errors for untranslatable packets (#7541 ) Certain packets cannot be translated as part of NAT64/46. The RFC says to "Silently drop" those. Currently, we log all errors that happens during the translation and don't follow this guideline. Most of these "silently drop" errors are related to ICMP types that cannot be represented in the other version such as ICMPv6 Neighbor Solicitation. To fix this, we introduce a new error type in the `ip_packet` module: `ImpossibleTranslation`. For convenience reasons, we carry that one through all layers as an `anyhow::Error` and test at the very top of the event-loop, whether the root-cause of the error is such a failed translation. If so, we ignore the error and move on. This isn't as type-safe as it could be but it is much easier to implement. Additionally, the risk of a bug here (i.e. if we stop emitting this error within the IP packet translation layer) is merely that the log will pop up again. Resolves: #7516.	2024-12-18 20:35:08 +00:00
Thomas Eizinger	7df4389fa6	refactor(relay): avoid stringifying error early (#7553 ) When the portal connection in a relay fails, we currently stringify the error early. This is unnecessary and we should instead retain the full error chain for as long as possible.	2024-12-18 18:13:55 +00:00
Jamil	cf5d8d08ed	docs: List minimum supported macOS as 13 (#7545 ) macOS 12's SwiftUI framework is quite buggy, which leads to buttons and window layout bugs in the Firezone app. Since virtually no customers are on macOS 12 and it's officially EOL, we'll be dropping support for it. https://endoflife.date/macos Refs https://firezonehq.slack.com/archives/C06L41XN05T/p1734360363455569 Refs #7531	2024-12-18 17:58:02 +00:00
Thomas Eizinger	992b97e6a9	fix(connlib): bind new channel to peer if needed (#7548 ) Initially, when we receive a new candidate from a remote peer, we bind a channel for each remote address on the relay that we sampled. This ensures that every possible communication path is actually functioning. In ICE, all candidates are tried against each other, meaning the remote will attempt to send from each of their candidates to every one of ours, including our relay candidates. To allow this traffic, a channel needs to be bound first. For various reasons, an allocation might become stale or needs to be otherwise invalidated. In that case, all the channel bindings are lost but there might still be an active connection that wants to utilise them. In that case, we will see "No channel" warnings like https://firezone-inc.sentry.io/issues/6036662614/events/f8375883fd3243a4afbb27c36f253e23/. To fix this, we use the attempt to encode a message for a channel as an intent to bind a new one. This is deemed safe because wanting to encode a message to a peer as a channel data message means we want such a channel to exist. The first message here is still dropped but that is better than not establishing the channel at all.	2024-12-18 17:15:17 +00:00
Thomas Eizinger	8e0f00a3a6	fix(relay): buffer packets in case IO is busy (#7536 ) At present, the relay's event-loop simply drops a UDP packet in case the socket is not ready for writing. This is terrible for throughput because it means the encapsulated packet within the WG payload needs to be retransmitted by the source after a timeout. To avoid this, we instead buffer the packet and suspend the event loop until it has been correctly flushed out. This may still cause packet loss because the receive buffer may overflow in the meantime. However, there is nothing we can do about that because UDP itself doesn't have any backpressure. The relay listens on many sockets at once via a separate worker thread and an `mio` event-loop. In addition to the current subscription to readable event, we now also subscribe to writable events. At the very top of the relay's event-loop, we insert a `flush` function that ensures all buffered packets have been written out and - in case writing a packet fails - suspends the event-loop with a waker. If we receive a new event for write-readiness, we wake the waker which will trigger a new call to `Eventloop::poll` where we again try to flush the pending packet. We don't bother with tracking exactly, which socket sent the write-readiness and which socket we have still pending packets in. Instead, we suspend the entire event-loop until all pending packets have been flushed. Resolves: #7519.	2024-12-18 17:01:24 +00:00
Thomas Eizinger	a80abec4ff	refactor(connlib): remove unused branch in `match` (#7550 ) When deciding what to do with a certain DNS query, we check whether the domain name in question corresponds to any of the (wildcard) DNS resource addresses. If yes, we resolve it to the resource ID of that resource. The source of those resource IDs is the `dns_resources` map. If we have looked up a `ResourceId` in that map, it is impossible for it to not be "known" which means the branch deleted in this PR is completely redundant and already covered by the catch-all branch where `maybe_resource` is `None`.	2024-12-18 15:47:15 +00:00
Thomas Eizinger	940438217c	docs(rust): fix profiling command (#7547 ) Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2024-12-18 13:01:23 +00:00
Thomas Eizinger	62dfe65679	chore(connlib): improve error messages for failed translations (#7540 )	2024-12-18 04:47:26 +00:00
Thomas Eizinger	40ff26ce1a	chore: remove commented out import (#7539 )	2024-12-17 21:00:20 +00:00
Thomas Eizinger	f3c4d461ea	ci(kotlin): remove permanently disabled job (#7538 ) There is no reason to keep this around if we are not running it. We can always look it up in Git's history.	2024-12-17 17:59:50 +00:00
Thomas Eizinger	8a1b6f26b4	fix(connlib): don't log warnings for unreachable errors (#7537 ) When a Gateway or Client is running in an environment without IPv4 or IPv6 connectivity, our initial probes for sending packets to the relays will fail with network unreachable. That isn't a very big concern and happens a lot in the wild. There is no need to report these as telemetry events. Resolves: #7514.	2024-12-17 17:59:20 +00:00
Thomas Eizinger	98be884c3a	fix(telemetry): dispose previous session when starting new one (#7542 ) For persistent applications like the IPC service, it is possible that telemetry gets initialised with different parameters depending on what the user logs in with. Currently, only the first one is persisted and all consecutive ones are ignored, leading to events that may be wrongly tagged for a certain user / environment. To fix this, we only skip the init if we are still in the same environment. Otherwise, the close the previous session and initialise a new one. Fixes: #7525.	2024-12-17 16:22:38 +00:00
Brian Manifold	1f457d2127	fix(portal): Fixing a few edge cases for identity email (#7532 )	2024-12-16 23:11:25 +00:00
Thomas Eizinger	aa8c53a20d	refactor(rust): use a buffer pool for network packets (#7489 ) In order to achieve concurrency within `connlib`, we needed to create a way for IP packets to own the piece of memory they are sitting in. This allows us to concurrently read IP packets and them batch-process them (as opposed to have a dedicated buffer and reference it). At the moment, those IP packets are defined on the stack. With a size of ~1300 bytes that isn't very large but still causes _some_ amount of copying. We can avoid this copying by relying on a buffer pool: 1. When reading a new IP packet, we request a new buffer from the pool. 2. When the IP packet gets dropped, the buffer gets returned to the pool. This allows us to reuse an allocation for a packet once it finished processing, resulting in less CPU time spent on copying around memory. This causes us to make more _individual_ heap-allocations in the beginning: Each packet is being processed by `connlib` is allocated on the heap somewhere. At some point during the lifetime of the tunnel, this will settle in an ideal state where we have allocated enough slots to cover new packets whilst also reusing memory from packets that finished processing already. The actual `IpPacket` data type is now just a pointer. As a result, the channels to and from the TUN thread (where we were holding multiple of these packets) are now significantly smaller, leading to roughly the same memory usage overall. In my local testing on Linux, the client still only uses about ~15MB of RAM even with multiple concurrent speedtests running.	2024-12-16 01:02:17 +00:00
Thomas Eizinger	8cecdc6906	fix(gui-client): ignore `ConnectResult` in wrong state (#7499 ) Similar to #7497, when we receive a `ConnectResult`, we can simply silently bail out of the function and not change our state instead of printing a loud warning.	2024-12-16 01:02:05 +00:00
Jamil	d8dda14759	docs: Appease codespell in elixir/README.md (#7528 )	2024-12-15 17:01:54 -08:00
Jamil	fe164389c1	docs: Add instructions for connecting to Cloud SQL as the `firezone` user (#7527 ) This is needed to perform index surgery. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com>	2024-12-15 16:39:29 -08:00
Jamil	9fdfbea818	chore: fix elixir formatting (#7524 )	2024-12-15 10:50:48 -08:00
Jamil	938448a43b	fix(portal): Update existing auth_identities migration to include `provider_identifier` in the index (#7523 ) #7522 won't successfully complete on production because of the migration in this PR. So, instead, we need to modify this migration, and then manually apply the same operation to staging.	2024-12-15 10:08:25 -08:00
Jamil	d3f38a22ae	fix(portal): Add provider_identifier to identities email unique index (#7522 ) It's possible for two of the same emails to exist within the same provider, so we need to add `provider_identifier` to the unique index to enforce uniqueness properly. Refs https://firezonehq.slack.com/archives/C04HRQTFY0Z/p1734131256450379	2024-12-15 09:37:22 -08:00
Thomas Eizinger	1b04b0eb2b	fix(windows): don't warn on deleting non-existing route (#7507 ) Similarly as Linux (#7502), we don't want to log an error if we cannot delete a route that doesn't exist.	2024-12-13 21:09:09 +00:00
Thomas Eizinger	0861ccaf06	chore(connlib): improve logging on missing flow (#7508 ) Normally, there always be exactly on pending flow per resource. It appears though that it can sometimes happen that we first request a flow for a resource but by the time it is authorised, we've already cleared its local state. Regardless, this isn't a concerning error and not worth logging on WARN (which happens one layer up).	2024-12-13 18:03:53 +00:00
Thomas Eizinger	3c2c01c44c	chore(gui-client): don't warn when tray menu updates fail (#7510 ) Windows appears to randomly fail to update the tray menu. There is nothing we can do about that. Hence, we downgrade these errors to debug and make the functions infallible, reducing the complexity for the caller.	2024-12-13 17:55:00 +00:00
Thomas Eizinger	61d6eceb29	chore(connlib): downgrade warning about missing DNS servers (#7509 ) There is nothing we can do if the user doesn't have any DNS servers defined. The default log level is INFO so a user reading the logs will still come across this message in case they are trying to debug what is happening. Long term, problems like these would probably warrant some kind of notification channel from `connlib` to the GUI where we can display messages to the user.	2024-12-13 17:53:36 +00:00
Thomas Eizinger	7a33146997	chore(connlib): downgrade warning when disconnecting from relay (#7512 ) There are several reasons why we can disconnect from a relay at runtime: - STUN is blocked - We have invalid credentials - The TURN server is not protocol-conform The first two are very much possible in production and there is nothing we can do about them. When relays reboot, their credentials change and if the Internet connection of a user / gateway gets cut, we may disconnect from the relay because the messages get lost. The last one should never happen if we are connected to our own relays. Firezone can be self-hosted so ultimately, we don't have control over what we are talking to. That error however is more of a safe-guard for `connlib` itself to disconnect from the server as soon as it detects that it is behaving weirdly. None of these reasons are worth reporting to Sentry as a problem because they aren't really fixable as such. It is more important that the user sees them in the logs if they decide to dig into them which they will still do on INFO level.	2024-12-13 17:52:59 +00:00
Brian Manifold	f114bc95cd	refactor(portal): Add email as separate column on auth_identities table (#7472 ) Why: * Currently, when using the API, a user has no way of easily identifying what identities they are pulling back as the response only includes the `provider_identifier` which for most of our AuthProviders is an ID for the IdP and not an email address. Along with that, when adding users to an OIDC provider within Firezone, there is no check for whether or not an identity has already been added with a given email address. By creating a separate email column on the `auth_identities` table, it will be very straight forward to know whether an email address exists for a given identity, return it in an API response and allow the admin of a Firezone account to track users (Identities) by email rather than IdP identifier. Fixes #7392	2024-12-13 17:26:47 +00:00
Thomas Eizinger	b63061994d	chore(headless-client): release version 1.4.0 (#7495 ) Headless Client 1.4.0 has been released (https://github.com/firezone/firezone/releases/tag/headless-client-1.4.0). This PR updates the changelog and version numbers accordingly.	2024-12-13 07:10:11 +00:00
Thomas Eizinger	b5f25da5ac	fix(gui-client): remove error about unexpected `TunnelReady` (#7497 ) The communication between the GUI client, the IPC service and `connlib` are asynchronous. As such, it may happen that the state machines run out of sync. Receiving a `TunnelReady` despite not being in the right state for that is no concern and can be handled gracefully.	2024-12-13 05:52:34 +00:00
Thomas Eizinger	7309428cae	chore(gateway): release version 1.4.2 (#7494 ) Gateway 1.4.2 has been released (https://github.com/firezone/firezone/releases/tag/gateway-1.4.2). This PR updates the changelog and version numbers accordingly.	2024-12-13 05:49:19 +00:00
Thomas Eizinger	48857d3bc8	chore(relay): downgrade allocation mismatch warn on CHANNEL_BIND (#7505 ) This code-path is handled gracefully in `connlib`, no need to issue a warning here.	2024-12-13 05:41:28 +00:00
Thomas Eizinger	73625e4669	chore(relay): don't log all AUTH errors on WARN (#7506 ) Not all authentication errors are warnings that we need to be alerted about.	2024-12-13 05:37:15 +00:00
Thomas Eizinger	5d5e5ab0b1	fix(gui-client): make tray menu refresh infallible (#7498 ) In most cases, the caller of this function already handled the case of it failing gracefully by logging. From Sentry alerts, we can see that if this fails, there isn't much we can do about it and most likely, the next refresh will work again (this has only happened a single time). Logging this on `debug` is good enough in case something doesn't work and we need to reproduce it or something really bad happens we need see it in the breadcrumbs of another Sentry event.	2024-12-13 04:54:41 +00:00
Thomas Eizinger	f30cc3226d	fix(gateway): don't return error when client disconnected (#7504 ) When a client disconnects, we clear up the connection on the gateway. There might still be packets arriving from resources that we then cannot route. This isn't worth returning an error.	2024-12-13 04:54:07 +00:00
Thomas Eizinger	b5d6c27680	fix(linux): don't print error when removing non-existent route (#7502 ) We are already handling one case where we are trying to remove a route that doesn't exist. `ESRCH` is another variant of this error that manifests as "No such process". According to the Internet, this just means the route doesn't exist so we can bail out early here.	2024-12-13 04:53:22 +00:00
Thomas Eizinger	30376cd79a	fix(gateway): polish error handling in `main` (#7500 ) Currently, the Gateway logs all errors that happen when the event-loop exits on ERROR level. This creates Sentry alerts for things like "Unauthorized" errors or "404 Not found". That isn't useful to us. To mitigate this, we polish the code a bit to only log an ERROR when we actually fail to setup something during startup (like the TUN device). In all other cases, we now log a more user-friendly message on INFO but still exit with the appropriate exit code (0 on CTRL+C, 1 on any other error).	2024-12-13 04:51:58 +00:00
Thomas Eizinger	db2dd4a618	ci: pass SENTRY_AUTH_TOKEN explicit as input (#7503 ) Secrets are not accessible within actions.	2024-12-13 04:47:47 +00:00
Thomas Eizinger	951edd802a	fix(gui-client): lower log level when update check fails (#7501 )	2024-12-13 04:43:16 +00:00
Thomas Eizinger	f0c2bfa6eb	chore(gui-client): release version 1.4.0 (#7496 ) GUI Client 1.4.0 has been released (https://github.com/firezone/firezone/releases/tag/gui-client-1.4.0). This PR updates the changelog and versions accordingly.	2024-12-13 04:41:49 +00:00
Brian Manifold	9711cf56c1	fix(portal): Fix update API endpoint for resources (#7493 ) Why: * The API endpoint for updating Resources was using `Resources.fetch_resource_by_id_or_persistent_id`, however that function was fetching all Resources, which included deleted Resources. In order to prevent an API user from attempting to update a Resource that is deleted, a new function was added to fetch active Resources only. Fixes: #7492	2024-12-12 22:51:28 +00:00
Thomas Eizinger	67161afd2c	build(deps): switch to `quinn-udp` release (#7491 ) The less Git-dependencies the better.	2024-12-12 16:49:43 +00:00
Thomas Eizinger	da04924da1	chore(relay): downgrade log on missing allocation for REFRESH (#7490 ) Attempting to refresh an allocation is the only idempotent way in TURN to test whether one has an active allocation. As such, logging this on WARN is too aggressive. Resolves: #7481.	2024-12-12 16:48:02 +00:00
Thomas Eizinger	7a478634a8	feat(connlib): buffer packets during connection and NAT setup (#7477 ) At present, `connlib` will always drop all IP packets until a connection is established and the DNS resource NAT is created. This causes an unnecessary delay until the connection is working because we need to wait for retransmission timers of the host's network stack to resend those packets. With the new idempotent control protocol, it is now much easier to buffer these packets and send them to the gateway once the connection is established. The buffer sizes are chosen somewhat conservatively to ensure we don't consume a lot of memory. The hypothesis here is that every protocol - even if the transport layer is unreliable like UDP - will start with a handshake involving only one or at most a few packets and waiting for a reply before sending more. Thus, as long as we can set up a connection quicker than the re-transmit timer in the host's network stack, buffering those packets should result in no packet loss. Typically, setting up a new connection takes at most 500ms which should be fast enough to not trigger any re-transmits. Resolves: #3246.	2024-12-12 11:40:38 +00:00
Jamil	a7b8253766	chore(apple/xcode): Cache rust build more intelligently using build phase (#7488 ) Xcode has decent support for skipping certain build phases when input files haven't changed. This only happens for build phases within a single target, and not for entire Target dependencies. Before, we defined `Connlib` as its own bonafide build target, and then added it as a target dependency for the network extension targets. This causes Xcode to always run our `build-rust.sh` script, which takes around 30s on my M1 even when `rust/` hasn't changed. Instead, we can remove the `Connlib` target, and add a "Run script" phase to the network extension targets themselves. By configuring the input file list, Xcode will skip this phase if `rust/*/.rs`, `rust/*/.toml` and `rust/Cargo.lock` haven't changed. This makes it much faster to iterate on Swift code -- Xcode is _very_ fast when building pure Swift (sometimes under < 1s). <img width="1016" alt="Screenshot 2024-12-11 at 6 10 45 PM" src="https://github.com/user-attachments/assets/29b5f073-3d58-4c07-9592-f9209033c966" />	2024-12-12 03:46:58 +00:00
Jamil	253e1a6972	fix(tauri): Bump nanoid re: CVE-2024-55565 (#7487 ) Fixes https://github.com/firezone/firezone/security/dependabot/136	2024-12-12 00:52:58 +00:00
Jamil	d775487508	fix(tauri): Bump cross-spawn re: CVE-2024-21538 (#7486 ) Fixes https://github.com/firezone/firezone/security/dependabot/129	2024-12-12 00:49:56 +00:00
dependabot[bot]	d0aef8f1d8	build(deps): Bump nanoid from 3.3.7 to 3.3.8 in /website in the npm_and_yarn group (#7485 ) Bumps the npm_and_yarn group in /website with 1 update: [nanoid](https://github.com/ai/nanoid). Updates `nanoid` from 3.3.7 to 3.3.8 <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/ai/nanoid/blob/main/CHANGELOG.md">nanoid's changelog</a>.</em></p> <blockquote> <h2>3.3.8</h2> <ul> <li>Fixed a way to break Nano ID by passing non-integer size (by <a href="https://github.com/myndzi"><code>@myndzi</code></a>).</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`3044cd5e73`"><code>3044cd5</code></a> Release 3.3.8 version</li> <li><a href="`4fe34959c3`"><code>4fe3495</code></a> Update size limit</li> <li><a href="`d643045f40`"><code>d643045</code></a> Fix pool pollution, infinite loop (<a href="https://redirect.github.com/ai/nanoid/issues/510">#510</a>)</li> <li>See full diff in <a href="https://github.com/ai/nanoid/compare/3.3.7...3.3.8">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=nanoid&package-manager=npm_and_yarn&previous-version=3.3.7&new-version=3.3.8)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore <dependency name> major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore <dependency name> minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore <dependency name>` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore <dependency name>` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore <dependency name> <ignore condition>` will remove the ignore condition of the specified dependency and ignore conditions You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/firezone/firezone/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-12-11 22:47:35 +00:00

1 2 3 4 5 ...

6131 Commits