firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 18:18:55 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	cf2470ba1e	test(iperf): install iptables rule inside of container (#9880 ) In Docker environments, applying iptables rules to filter container-container traffic on the Docker bridged network is not reliable, leading to direct connections being established in our relayed tests. To fix this, we insert the rules directly from the client container itself. --------- Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>	2025-07-16 10:29:33 +00:00
Thomas Eizinger	2fd56fb7ae	chore: remove `pull_policy` from containers (#9887 ) Having to pull these every time one does `docker compose up` is annoying and unnecessary.	2025-07-16 09:15:29 +00:00
Thomas Eizinger	d8ca2b4f7e	chore: fix invalid build stage in `docker-compose.yml` (#9886 ) We have since removed the `dev` stage from the Rust Dockerfile. Resolves: #9768	2025-07-16 07:01:20 +00:00
Thomas Eizinger	116b518700	fix(snownet): discard channel-data messages from old allocations (#9885 ) When we invalidate or discard an allocation, it may happen that a relay still sends channel-data messages to us. We don't recognize those and will therefore attempt to parse them as WireGuard packets, ultimately ending in an "Packet has unknown format" error. To avoid this, we check if the packet is a valid channel-data message even if we presently don't have an allocation on the relay that is sending us the packet. In those cases, we can stop processing the packet, thus avoiding these errors from being logged.	2025-07-16 05:57:44 +00:00
Jamil	789a3012d6	fix(portal): only process jsonb strings (#9883 ) As a followup to #9882, we need to ensure that `jsonb` columns that have value data other than strings are not decoded as jsonb. An example of when this happens is when Postgres sends an `:unchanged_toast` to indicate the data hasn't changed.	2025-07-15 18:06:13 -07:00
Jamil	cce21a8dea	fix(portal): handle `jsonb` for embedded schemas (#9882 ) In #9664, we introduced the `Domain.struct_from_params/2` function which converts a set of params containing string keys into a provided struct representing a schema module. This is used to broadcast actual structs pertaining to WAL data as opposed to simple string encodings of the data. The problem is that function was a bit too naive and failed to properly cast embedded schemas, resulting in all embedded schema on the root struct being `nil` or `[]`. To fix this, we need to do two things: 1. We now decode JSON/JSONB fields from binaries (strings) into actual lists and maps in the replication consumer module for downstream processors to use 2. We update our `struct_from_params/2` function to properly cast embedded schemas from these lists and maps using Ecto.Changeset's `apply_changes` function, which uses the same logic to instantiate the schemas as if we were saving a form or API request. Lastly, tests are added to ensure this works under various scenarios, including nested embedded schemas which we use in some places. Fixes #9835 --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-15 23:50:27 +00:00
Thomas Eizinger	cb3f4c0884	ci: fail perf & integration tests on warnings (#9875 ) We already do the same thing for our integration tests. It turns out that it wasn't working there either though. Related: #9874	2025-07-15 14:36:54 +00:00
Thomas Eizinger	29f81c64ff	fix(snownet): wake idle connection on upsert (#9879 ) When a connection is in idle-mode, it only sends a STUN request every 25 seconds. If the Client disconnects e.g. due to a network partition, it may send a new connection intent later. If the Gateway's connection is still around then because it was in idle mode, it won't send any candidates to the remote, making the Client's connection fail with "no candidates received". To alleviate this, we wake a connection out of idle mode every time it is being upserted. This ensures that the connection will fail within 15s IF the above scenario happens, allowing the Client to reconnect within a much shorter time-frame. Note that attempting to repair such a connection is likely pointless. It is much safer to discard it and let them both establish a new connection. Related: #9862	2025-07-15 14:16:27 +00:00
Thomas Eizinger	0f1c5f2818	refactor(relay): simplify auth module (#9873 ) Whilst looking through the auth module of the relay, I noticed that we unnecessarily convert back and forth between expiry timestamps and username formats when we could just be using the already parsed version.	2025-07-15 14:14:51 +00:00
Thomas Eizinger	ffcb269c8b	chore(connlib): add "wake reason" to `poll_timeout` (#9876 ) In order to debug timer interactions, it is useful to know when and why connlib wants to be woken to perform tasks.	2025-07-15 13:58:06 +00:00
Thomas Eizinger	5141817134	feat(connlib): add `reason` argument to `reset` API (#9878 ) In order to provide more detailed logs, why `connlib`'s network state is being reset, we add a `reason` parameter that is gets logged. Resolves: #9867	2025-07-15 13:48:33 +00:00
Thomas Eizinger	2b70596636	fix(rust): only apply filter to select tracing layers (#9872 ) Applying a filter globally to the entire subscriber means it filters events for all layers. This prevents the Sentry layer from uploading DEBUG logs if configured.	2025-07-15 13:44:53 +00:00
Thomas Eizinger	cb497a7435	fix(portal): use correct password generation algorithm (#9874 ) In #9870, the password generation algorithm was broken. The correct order of the elements in the hash is: expiry, stamp_secret, salt. The relay expects this order when it re-generates the password to validate the message. Due to a different bug in our CI system, we weren't actually checking for warnings / errors in our perf-test suite: https://github.com/firezone/firezone/actions/runs/16285038111/job/45982241021#step:9:66	2025-07-15 13:39:31 +00:00
Thomas Eizinger	d92e997878	ci: add work-around for apple-client tag (#9877 ) The current Git tag for releases of the Apple client is out-of-line with the naming of rest of the repository. Ideally, the tag would be renamed to `apple-client-X.Y.Z` as it represents the version for both the macOS and iOS client. I am not familiar with the redirect system on our website to confidentially do this without breaking anything, so the easiest fix here is to employ the same hack we already do for Sentry where we special-case the `macos-client` tag. Resolves: #9871	2025-07-15 13:37:00 +00:00
dependabot[bot]	b9302cdc2a	build(deps): bump rustls from 0.23.28 to 0.23.29 in /rust (#9860 ) Bumps [rustls](https://github.com/rustls/rustls) from 0.23.28 to 0.23.29. <details> <summary>Commits</summary> <ul> <li><a href="`4e0b5fed17`"><code>4e0b5fe</code></a> Bump version to 0.23.29</li> <li><a href="`b8540790dc`"><code>b854079</code></a> Propagate context for webpki signature algorithm errors</li> <li><a href="`c84675e34b`"><code>c84675e</code></a> key_schedule: minimise lifetime of resumption secret</li> <li><a href="`788b0df122`"><code>788b0df</code></a> key_schedule: erase master secret in traffic state</li> <li><a href="`d2c64f0416`"><code>d2c64f0</code></a> key_schedule: separate ops not using current secret</li> <li><a href="`e5998cd100`"><code>e5998cd</code></a> key_schedule: add state for derivations before finish</li> <li><a href="`9620bec130`"><code>9620bec</code></a> tls13::key_schedule: move <code>KeySchedule</code> struct down</li> <li><a href="`373ad888e2`"><code>373ad88</code></a> tls13::key_schedule: move <code>SecretKind</code> down</li> <li><a href="`efa2066469`"><code>efa2066</code></a> Improve compactness of Debug impl for extensions</li> <li><a href="`a5433a154b`"><code>a5433a1</code></a> Correct calculation of ServerHello ECH confirmation</li> <li>Additional commits viewable in <a href="https://github.com/rustls/rustls/compare/v/0.23.28...v/0.23.29">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=rustls&package-manager=cargo&previous-version=0.23.28&new-version=0.23.29)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-15 05:27:44 +00:00
dependabot[bot]	9ed7220520	build(deps): bump clap from 4.5.40 to 4.5.41 in /rust (#9861 ) Bumps [clap](https://github.com/clap-rs/clap) from 4.5.40 to 4.5.41. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/clap-rs/clap/blob/master/CHANGELOG.md">clap's changelog</a>.</em></p> <blockquote> <h2>[4.5.41] - 2025-07-09</h2> <h3>Features</h3> <ul> <li>Add <code>Styles::context</code> and <code>Styles::context_value</code> to customize the styling of <code>[default: value]</code> like notes in the <code>--help</code></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`92fcd83b76`"><code>92fcd83</code></a> chore: Release</li> <li><a href="`aca91b99c1`"><code>aca91b9</code></a> docs: Update changelog</li> <li><a href="`8434510cee`"><code>8434510</code></a> Merge pull request <a href="https://redirect.github.com/clap-rs/clap/issues/5869">#5869</a> from tw4452852/patch-1</li> <li><a href="`33b1fc304e`"><code>33b1fc3</code></a> fix(complete): Fix env leakage in elvish dynamic completion</li> <li><a href="`e5f1f4884c`"><code>e5f1f48</code></a> chore: Release</li> <li><a href="`9466a552fb`"><code>9466a55</code></a> docs: Update changelog</li> <li><a href="`d74b793512`"><code>d74b793</code></a> Merge pull request <a href="https://redirect.github.com/clap-rs/clap/issues/5865">#5865</a> from gifnksm/nushell-completion-value-types</li> <li><a href="`ecbc775d3b`"><code>ecbc775</code></a> fix(nu): Set argument type based on <code>ValueHint</code></li> <li><a href="`6784054536`"><code>6784054</code></a> Merge pull request <a href="https://redirect.github.com/clap-rs/clap/issues/5857">#5857</a> from epage/empty</li> <li><a href="`cca5f32b3a`"><code>cca5f32</code></a> test(complete): Show empty option-value behavior</li> <li>Additional commits viewable in <a href="https://github.com/clap-rs/clap/compare/clap_complete-v4.5.40...clap_complete-v4.5.41">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=clap&package-manager=cargo&previous-version=4.5.40&new-version=4.5.41)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-15 05:27:16 +00:00
dependabot[bot]	8dbb02e549	build(deps): bump zbus from 5.7.1 to 5.8.0 in /rust (#9863 ) Bumps [zbus](https://github.com/dbus2/zbus) from 5.7.1 to 5.8.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/dbus2/zbus/releases">zbus's releases</a>.</em></p> <blockquote> <h2>🔖 zbus 5.8.0</h2> <ul> <li>✨ <code>interface</code> macro now supports write-only properties.</li> <li>✨ Copy attributes over to <code>receive__changed</code> and <code>cached_</code> methods in <code>proxy</code>.</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`7d8e935927`"><code>7d8e935</code></a> Merge pull request <a href="https://redirect.github.com/dbus2/zbus/issues/1425">#1425</a> from zeenix/zb-release</li> <li><a href="`da0ca55c28`"><code>da0ca55</code></a> 🔖 zb,zm: Release 5.8.0</li> <li><a href="`be41117c4b`"><code>be41117</code></a> Merge pull request <a href="https://redirect.github.com/dbus2/zbus/issues/1424">#1424</a> from zeenix/zv-release</li> <li><a href="`dda4f376e4`"><code>dda4f37</code></a> 🔖 zv,zd: Release 5.6.0</li> <li><a href="`747c64505c`"><code>747c645</code></a> ⬆️ micro: Update blocking to v1.6.2 (<a href="https://redirect.github.com/dbus2/zbus/issues/1423">#1423</a>)</li> <li><a href="`d01e893a8b`"><code>d01e893</code></a> ⬆️ micro: Update tokio to v1.46.1 (<a href="https://redirect.github.com/dbus2/zbus/issues/1422">#1422</a>)</li> <li><a href="`8250c5357e`"><code>8250c53</code></a> ⬆️ micro: Update libfuzzer-sys to v0.4.10 (<a href="https://redirect.github.com/dbus2/zbus/issues/1421">#1421</a>)</li> <li><a href="`7ab8fa67ee`"><code>7ab8fa6</code></a> Merge pull request <a href="https://redirect.github.com/dbus2/zbus/issues/1420">#1420</a> from dbus2/renovate/tokio-1.x-lockfile</li> <li><a href="`36fde484aa`"><code>36fde48</code></a> ⬆️ Update tokio to v1.46.0</li> <li><a href="`f9870cde4a`"><code>f9870cd</code></a> Merge pull request <a href="https://redirect.github.com/dbus2/zbus/issues/1419">#1419</a> from zeenix/fix-zv-regression</li> <li>Additional commits viewable in <a href="https://github.com/dbus2/zbus/compare/zbus-5.7.1...zbus-5.8.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=zbus&package-manager=cargo&previous-version=5.7.1&new-version=5.8.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-15 05:26:42 +00:00
Brian Manifold	0d9e865ea8	feat(porat): Update portal telemetry (#9868 ) Why: * Adding more BEAM VM metrics to give us better insight as to how our BEAM cluster is running since we're in the middle of making some moderately large architectural changes to the application.	2025-07-15 02:11:59 +00:00
Jamil	17d7e29b81	fix(portal): use public key for TURN creds (#9870 ) As a followup to #9856, after talking with @bmanifold, we determined using the public_key as the username for TURN credentials is a safer bet because: - It's by definition public and therefore does not need to be obfuscated - It's shorter-lived than the token, especially for the gateway - It essentially represents the data plane connection for client/gateway and naturally rotates along with the key state for those	2025-07-15 01:48:02 +00:00
Jamil	1e577d31b9	fix(portal): use reproducible relay creds (#9857 ) When giving TURN credentials to clients and gateways, it's important that they remain consistent across hiccups in the portal connection so that relayed connections are not interrupted during a deploy, or if the user's internet is flaky, or the GCP load balancer decides to disconnect the client/gateway. Prior to this PR, that was not the case because we essentially tied TURN credentials, required for data plane packet flows, to the WebSocket connection, a control plane element. This happened because we generated random `expires_at` and `salt` elements on _each_ connection to the portal. Instead, what we do now is make these reproducible and tied to the auth token by hashing then base64-encoding it. The expiry is tied to the auth-token's expiry. Fixes #9856	2025-07-14 17:42:11 +00:00
Thomas Eizinger	2e0ed018ee	chore: document metrics config switches as private API (#9865 )	2025-07-14 13:53:03 +00:00
Thomas Eizinger	f5425ac8e4	fix(snownet): fail connection on handshake decryption errors (#9850 ) As per the WireGuard paper, `boringtun` tries to handshake with the remote peer for 90s before it gives up. This timeout is important because when a session is discarded due to e.g. missing replies, WireGuard attempts to handshake a new session. Without this timeout, we would then try to handshake a session forever. Unfortunately, `boringtun` does not distinguish a missing handshake response from a bad one. Decryption errors whilst decoding a handshake response are simply passed up to the upper layer, in our case `snownet`. I am not sure how we can actually fail to decrypt a handshake but the pattern we are seeing in customer logs is that this happens over and over again, so there is no point in having `boringtun` retry the handshake. Therefore, we immediately fail the connection when this happens. Failed connections are immediately removed, triggering the client send a new connection-intent to the portal. Such a new connection intent will then sync-up the state between Client and Gateway so both of them use the most recent public key. Resolves: #9845	2025-07-14 13:22:23 +00:00
Thomas Eizinger	cecca37073	feat(gateway): allow exporting metrics to an OTEL collector (#9838 ) As a first step in preparation for sending OTEL metrics from Clients and Gateways to a cloud-hosted OTEL collector, we extend the CLI of the Gateway with configuration options to provide a gRPC endpoint to an OTEL collector. If `FIREZONE_METRICS` is set to `otel-collector` and an endpoint is configured via `OTLP_GRPC_ENDPOINT`, we will report our metrics to that collector. The future plan for extending this is such that if `FIREZONE_METRICS` is set to `otel-collector` (which will likely be the default) and no `OTLP_GRPC_ENDPOINT` is set, then we will use our own, hosted OTEL collector and report metrics IF the `export-metrics` feature-flag is set to `true`. This is a similar integration as we have done it with streaming logs to Sentry. We can therefore enable it on a similar granularity as we do with the logs and e.g. only enable it for the `firezone` account to start with. In meantime, customers can already make use of those metrics if they'd like by using the current integration. Resolves: #1550 Related: #7419 --------- Co-authored-by: Antoine Labarussias <antoinelabarussias@gmail.com>	2025-07-14 03:54:38 +00:00
Thomas Eizinger	70e4b6572f	chore(rust): log environment when updating feature flags (#9855 ) It is useful to know, which environment we've updated the feature-flags for.	2025-07-13 17:27:10 +00:00
Thomas Eizinger	eb4c54620c	chore(linux): add more error context to TUN device (#9853 ) When failing to create the TUN device, the error messages are currently pretty bare. Add a bit more context so users can self-diagnose easier what is wrong.	2025-07-13 05:51:02 +00:00
Thomas Eizinger	8dedc44735	chore(rust): bump boringtun (#9854 ) The latest commits to our `boringtun` fork bring improved logs. Diff: `d49b63f704...5b1892f061`	2025-07-13 00:52:58 +00:00
Thomas Eizinger	66455ab0ef	feat(gateway): translate TimeExceeded ICMP messages (#9812 ) In the DNS resource NAT table, we track parts of the layer 4 protocol of the connection in order to map packets back to the correct proxy IP in case multiple DNS names resolve to the same real IP. The involvement of layer 4 means we need to perform some packet inspection in case we receive ICMP errors from an upstream router. Presently, the only ICMP error we handle here is destination unreachable. Those are generated e.g. when we are trying to contact an IPv6 address but we don't have an IPv6 egress interface. An additional error that we want to handle here is "time exceeded": Time exceeded is sent when the TTL of a packet reaches 0. Typically, TTLs are set high enough such that the packet makes it to its destination. When using tools such as `tracepath` however, the TTL is specifically only incremented one-by-one in order to resolve the exact hops a packet is taking to a destination. Without handling the time exceeded ICMP error, using `tracepath` through Firezone is broken because the packets get dropped at the DNS resource NAT. With this PR, we generalise the functionality of detecting destination unreachable ICMP errors to also handle time-exceeded errors, allowing tools such as `tracepath` to somewhat work: ``` ❯ sudo docker compose exec --env RUST_LOG=info -it client /bin/sh -c 'tracepath -b example.com' 1?: [LOCALHOST] pmtu 1280 1: 100.82.110.64 (100.82.110.64) 0.795ms 1: 100.82.110.64 (100.82.110.64) 0.593ms 2: example.com (100.96.0.1) 0.696ms asymm 45 3: example.com (100.96.0.1) 5.788ms asymm 45 4: example.com (100.96.0.1) 7.787ms asymm 45 5: example.com (100.96.0.1) 8.412ms asymm 45 6: example.com (100.96.0.1) 9.545ms asymm 45 7: example.com (100.96.0.1) 7.312ms asymm 45 8: example.com (100.96.0.1) 8.779ms asymm 45 9: example.com (100.96.0.1) 9.455ms asymm 45 10: example.com (100.96.0.1) 14.410ms asymm 45 11: example.com (100.96.0.1) 24.244ms asymm 45 12: example.com (100.96.0.1) 31.286ms asymm 45 13: no reply 14: example.com (100.96.0.1) 303.860ms asymm 45 15: no reply 16: example.com (100.96.0.1) 135.616ms (This broken router returned corrupted payload) asymm 45 17: no reply 18: example.com (100.96.0.1) 161.647ms asymm 45 19: no reply 20: no reply 21: no reply 22: example.com (100.96.0.1) 238.066ms reached Resume: pmtu 1280 hops 22 back 45 ``` We say "somewhat work" because due to the NAT that is in place for DNS resources, the output does not disclose the intermediary hops beyond the Gateway. Co-authored-by: Antoine Labarussias <antoinelabarussias@gmail.com> --------- Co-authored-by: Antoine Labarussias <antoinelabarussias@gmail.com>	2025-07-12 21:09:48 +00:00
Thomas Eizinger	16facd394e	chore(rust): bump str0m (#9852 ) The latest version of str0m includes a fix that would result in an immediate ICE timeout if a remote candidate was added prior to a local candidate. We mitigated this in #9793 to make Firezone overall more resilient towards sudden changes in the ICE connection state. As a defense-in-depth measure, we also fixed this issue in str0m by not transitioning to `Disconnected` if haven't even formed an candidate pairs yet. Diff: `2153bf0385...3d6e3d2f27`	2025-07-12 20:55:07 +00:00
Thomas Eizinger	d01701148b	fix(rust): remove jemalloc (#9849 ) I am no longer able to compile `jemalloc` on my system in a debug build. It fails with the following error: ``` src/malloc_io.c: In function ‘buferror’: src/malloc_io.c:107:16: error: returning ‘char *’ from a function with return type ‘int’ makes integer from pointer without a cast [-Wint-conversion] 107 \| return strerror_r(err, buf, buflen); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` This appears to be a problem with modern versions of clang/gcc. I believe this started happening when I recently upgraded my system. The upstream [`jemalloc`](https://github.com/jemalloc/jemalloc) repository is now archived and thus unmaintained. I am not sure if we ever measured a significant benefit in using `jemalloc`. Related: https://github.com/servo/servo/issues/31059	2025-07-12 19:22:06 +00:00
Thomas Eizinger	47c9922131	test(connlib): don't attempt to listen on port 0 for TCP socket (#9851 )	2025-07-12 14:29:34 +00:00
Thomas Eizinger	d6805d7e48	chore(rust): bump to Rust 1.88 (#9714 ) Rust 1.88 has been released and brings with it a quite exciting feature: let-chains! It allows us to mix-and-match `if` and `let` expressions, therefore often reducing the "right-drift" of the relevant code, making it easier to read. Rust.188 also comes with a new clippy lint that warns when creating a mutable reference from an immutable pointer. Attempting to fix this revealed that this is exactly what we are doing in the eBPF kernel. Unfortunately, it doesn't seem to be possible to design this in a way that is both accepted by the borrow-checker AND by the eBPF verifier. Hence, we simply make the function `unsafe` and document for the programmer, what needs to be upheld.	2025-07-12 06:42:50 +00:00
Jamil	e98aa82e8e	fix(portal): respect gateway_group_id filter in REST API (#9840 ) Fixes #9815	2025-07-11 19:12:05 +00:00
Jamil	12351e5985	ci: publish apple 1.5.4 clients (#9842 )	2025-07-11 16:35:25 +00:00
Thomas Eizinger	55eaa7cdc7	test(connlib): establish real TCP connections in proptests (#9814 ) With this patch, we sample a list of DNS resources on each test run and create a "TCP service" for each of their addresses. Using this list of resources, we then change the `SendTcpPayload` transition to `ConnectTcp` and establish TCP connections using `smoltcp` to these services. For now, we don't send any data on these connections but we do set the keep-alive interval to 5s, meaning `smoltcp` itself will keep these connections alive. We also set the timeout to 30s and after each transition in a test-run, we assert that all TCP sockets are still in their expected state: - `ESTABLISHED` for most of them. - `CLOSED` for all sockets where we ended up sampling an IPv4 address but the DNS resource only supports IPv6 addresses (or vice-versa). In these cases, we use the ICMP error to sent by the Gateway to assert that the socket is `CLOSED`. Unfortunately, `smoltcp` currently does not handle ICMP messages for its sockets, so we have to call `abort` ourselves. Overall, this should assert that regardless of whether we roam networks, switch relays or do other kind of stuff with the underlying connection, the tunneled TCP connection stays alive. In order to make this work, I had to tweak the timeouts when we are on-demand refreshing allocations. This only happens in one particular case: When we are being given new relays by the portal, we refresh all _other_ relays to make sure they are still present. In other words, all relays that we didn't remove and didn't just add but still had in-memory are refreshed. This is important for cases where we are network-partitioned from the portal whilst relays are deployed or reset their state otherwise. Instead of the previous 8s max elapsed time of the exponential backoff like we have it for other requests, we now only use a single message with a 1s timeout there. With the increased ICE timeout of 15s, a TCP connection with a 30s timeout would otherwise not survive such an event. This is because it takes the above mentioned 8s for us to remove a non-functioning relay, all whilst trying to establish a new connection (which also incurs its own ICE timeout then). With the reduced timeout on the on-demand refresh of 1s, we detect the disappeared relay much quicker and can immediately establish a new connection via one of the new ones. As always with reduced timeouts, this can create false-positives if the relay doesn't reply within 1s for some reason. Resolves: #9531	2025-07-11 15:10:22 +00:00
Jamil	26cfab3b88	fix(portal): reply to all wal keepalives with ack (#9828 ) The Postgres logical decoding protocol is lacking documentation and unclear about keepalive behavior when `wal_sender_timeout` is set to 0 (disabled). We have it disabled so that Postgres doesn't terminate our connection for falling too far behind. What we failed to take into account is that on some installations, Postgres _never_ requests an immediate reply (keepalive with the reply now bit set) if wal_sender_timeout is disabled. This means we would always reply with the empty message, failing to advance the position of the LSN. In this PR, we fix that to always respond to every keepalive message with a standby status update to advance the LSN position. Relevant documentation: https://www.postgresql.org/docs/current/protocol-replication.html#PROTOCOL-REPLICATION-STANDBY-STATUS-UPDATE	2025-07-11 14:32:56 +00:00
Thomas Eizinger	520dd0aa31	feat(gateway): respond with ICMP error for filtered packets (#9816 ) When defining a resource, a Firezone admin can define traffic filters to only allow traffic on certain TCP and/or UDP ports and/or restrict traffic on the ICMP protocol. Presently, when a packet is filtered out on the Gateway, we simply drop it. Dropping packets means the sending application can only react to timeouts and has no other means on error handling. ICMP was conceived to deal with these kind of situations. In particular, the "destination unreachable" type has a dedicated code for filtered packets: "Communication administratively prohibited". Instead of just dropping the not-allowed packet, we now send back an ICMP error with this particular code set, thus informing the sending application that the packet did not get lost but was in fact not routed for policy reasons. When setting a traffic filter that does not allow TCP traffic, attempting to `curl` such a resource now results in the following: ``` ❯ sudo docker compose exec --env RUST_LOG=info -it client /bin/sh -c 'curl -v example.com' * Host example.com:80 was resolved. * IPv6: fd00:2021:1111:8000::, fd00:2021:1111:8000::1, fd00:2021:1111:8000::2, fd00:2021:1111:8000::3 * IPv4: 100.96.0.1, 100.96.0.2, 100.96.0.3, 100.96.0.4 * Trying [fd00:2021:1111:8000::]:80... * connect to fd00:2021:1111:8000:: port 80 from fd00:2021:1111::1e:7658 port 34560 failed: Permission denied * Trying [fd00:2021:1111:8000::1]:80... * connect to fd00:2021:1111:8000::1 port 80 from fd00:2021:1111::1e:7658 port 34828 failed: Permission denied * Trying [fd00:2021:1111:8000::2]:80... * connect to fd00:2021:1111:8000::2 port 80 from fd00:2021:1111::1e:7658 port 44314 failed: Permission denied * Trying [fd00:2021:1111:8000::3]:80... * connect to fd00:2021:1111:8000::3 port 80 from fd00:2021:1111::1e:7658 port 37628 failed: Permission denied * Trying 100.96.0.1:80... * connect to 100.96.0.1 port 80 from 100.66.110.26 port 53780 failed: Host is unreachable * Trying 100.96.0.2:80... * connect to 100.96.0.2 port 80 from 100.66.110.26 port 60748 failed: Host is unreachable * Trying 100.96.0.3:80... * connect to 100.96.0.3 port 80 from 100.66.110.26 port 38378 failed: Host is unreachable * Trying 100.96.0.4:80... * connect to 100.96.0.4 port 80 from 100.66.110.26 port 49866 failed: Host is unreachable * Failed to connect to example.com port 80 after 9 ms: Could not connect to server * closing connection #0 curl: (7) Failed to connect to example.com port 80 after 9 ms: Could not connect to server ```	2025-07-11 13:54:41 +00:00
Thomas Eizinger	fb7d780b6f	refactor(gui-client): don't hardcode IDs (#9831 ) A linter I am trying out locally suggested to not hardcode HTML IDs. TIL about React's `useId`.	2025-07-11 13:47:15 +00:00
Thomas Eizinger	06f703a0b5	feat(telemetry): log use of `map-enobufs-to-wouldblock` (#9829 ) In order to better track, how well our `ENOBUFS` mitigation is working, we should log the use of our feature flag to PostHog. This will give us some stats how often this is happening. That combined with the lack of error reports should give us good confidence in permanently enabling this behaviour.	2025-07-11 13:32:11 +00:00
Thomas Eizinger	9c4e71a68f	chore(connlib): improve error message for filtered packets (#9833 ) When a packet gets filtered because we are unable to evaluate the source protocol (i.e. TCP/UDP/ICMP), then the current error message currently misleadingly says that the packet got filtered because the protocol is not supported. The truth however is that we were never able to apply the filter in the first place. This is a subtle difference that is quite important when debugging filtered packets. To improve this, we add an error message to the stack here.	2025-07-11 13:24:55 +00:00
Thomas Eizinger	8e5ce66810	feat(gateway): don't apply traffic filters to ICMP errors (#9834 ) Firezone uses ICMP errors to signal to client applications that e.g. a certain IP is not reachable. This happens for example if a DNS resource only resolves to IPv4 addresses yet the client application attempted to use an IPv6 proxy address to connect to it. In the presence of traffic filters for such a resource that does _not_ allow ICMP, we currently filter out these ICMP errors because - well - ICMP traffic is not allowed! However, even in the presence of ICMP traffic being allowed, we would fail to evaluate this filter because the ICMP error packet is not an ICMP echo reply and therefore doesn't have an ICMP identifier. We require this in the DNS resource NAT to identify "connections" and NAT them correctly. The same L4 component is used to evaluate the traffic filters. ICMP errors are critical to many usage scenarios and algorithms like happy-eyeballs. Dropping them usually results in weird behaviour as client applications can then only react to timeouts.	2025-07-11 13:20:37 +00:00
Thomas Eizinger	a363f9e2fb	chore: migrate service ID to hex-representation (#9836 ) We aren't sending the OTEL metrics anywhere yet but it still makes sense to also use the "newer" hex-representation of the Firezone ID here as the service ID.	2025-07-11 12:03:50 +00:00
Jamil	cfcd5b3b8f	chore(portal): track more WAL monitoring info (#9826 ) When debugging WAL processing, it's helpful to know what the last replied LSN was and when the last keepalive message was received from postgres.	2025-07-10 18:30:34 -07:00
Jamil	080818c466	fix(portal): fix reply for remaining wal message (#9824 ) Missed one reply fix from #9821	2025-07-10 21:46:05 +00:00
Jamil	fb0dd36dbc	chore(portal): ignore expected libcluster issue (#9822 ) Adds another expected error message to the ignore list. We have a different (less noisy) log that will alert us if the cluster is below threshold.	2025-07-10 21:35:18 +00:00
Thomas Eizinger	04499da11e	feat(telemetry): grab env and `distinct_id` from Sentry session (#9801 ) At present, our primary indicator as to whether telemetry is active is whether we have a Sentry session. For our analytics events however, we currently require passing in the Firezone ID and API url again. This makes it difficult to send analytics events from areas of the code that don't have this information available. To still allow for that, we integrate the `analytics` module more tightly with the Sentry session. This allows us to drop two parameters from the `$identify` event and also means we now respect the `NO_TELEMETRY` setting for these events except for `new_session`. This event is sent regardless because it allows us to track, how many on-prem installations of Firezone are out there.	2025-07-10 20:05:08 +00:00
Jamil	704ff9fd7a	fix(portal): send empty reply for incoming wal messages (#9821 ) In #9733, we changed the replies of the handle_data messages which seems to have caused Postgres to not respect our acknowledgements sent in the keepalive. To fix this, we revert to sending an empty message in response to write messages.	2025-07-10 19:50:00 +00:00
Thomas Eizinger	13c8c70750	fix(connlib): treat `ENOBUFS` as `EWOULDBLOCK` (#9798 ) Socket APIs across operating systems vary in how they handle back-pressure. In most cases, a non-blocking socket should return `EWOULDBLOCK` when it cannot send a given datagram and would have to block to wait for resources to free up. It appears that macOS doesn't always behave like that. In particular, we are seeing error logs from a few users where sending a datagram fails with > No buffer space available (os error 55) Digging through `libc`, I've found that this error is known as `ENOBUFS` [0]. There are reports on the Apple developer forum [1] that recommend retrying when this error happens. It is however unclear as to whether it is entirely safe to map this error to `EWOULDBLOCK`. Other non-blocking event-loop implementations [2] appear to do that but we don't know whether it is fully correct. At present, Firezone's behaviour here is to drop the packet. This means the host networking stack has to fall-back to running into a timeout and re-send the packet. This very likely negatively impacts the UX for the users hitting this. In order to validate this assumption, we implement a feature-flag. This allows us to ship this code but switch back to the old behaviour, should it negatively impact how Firezone behaves. In particular, if the assumption that mapping `ENOBUFS` to `EWOULDBLOCK` is safe turns out wrong and `kqueue` does in fact not signal readiness when more buffers are available, then we may have missing wake-ups which would lead a further delay in datagrams being sent. [0]: `8e6f36c6ba/src/unix/bsd/apple/mod.rs (L2998)` [1]: https://developer.apple.com/forums/thread/42334 [2]: `aac866f399/src/unix/stream.c (L820)`	2025-07-10 17:51:16 +00:00
dependabot[bot]	eb6830daa2	build(deps): bump flowbite-react from 0.11.7 to 0.11.8 in /rust/gui-client in the flowbite group (#9754 ) Bumps the flowbite group in /rust/gui-client with 1 update: [flowbite-react](https://github.com/themesberg/flowbite-react/tree/HEAD/packages/ui). Updates `flowbite-react` from 0.11.7 to 0.11.8 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/themesberg/flowbite-react/releases">flowbite-react's releases</a>.</em></p> <blockquote> <h2>flowbite-react@0.11.8</h2> <h3>Patch Changes</h3> <ul> <li><a href="https://redirect.github.com/themesberg/flowbite-react/pull/1579">#1579</a> <a href="`d44648d0ab`"><code>d44648d</code></a> Thanks <a href="https://github.com/SutuSebastian"><code>@SutuSebastian</code></a>! - fix(Datepicker): switch hardcoded color <code>cyan</code> -> <code>primary</code></li> </ul> <h2>What's Changed</h2> <ul> <li>Improve table pagination by <a href="https://github.com/jfacoustic"><code>@jfacoustic</code></a> in <a href="https://redirect.github.com/themesberg/flowbite-react/pull/1567">themesberg/flowbite-react#1567</a></li> <li>fix(Datepicker): switch hardcoded color <code>cyan</code> -> <code>primary</code> by <a href="https://github.com/SutuSebastian"><code>@SutuSebastian</code></a> in <a href="https://redirect.github.com/themesberg/flowbite-react/pull/1579">themesberg/flowbite-react#1579</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/jfacoustic"><code>@jfacoustic</code></a> made their first contribution in <a href="https://redirect.github.com/themesberg/flowbite-react/pull/1567">themesberg/flowbite-react#1567</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/themesberg/flowbite-react/compare/flowbite-react@0.11.7...flowbite-react@0.11.8">https://github.com/themesberg/flowbite-react/compare/flowbite-react@0.11.7...flowbite-react@0.11.8</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/themesberg/flowbite-react/blob/main/packages/ui/CHANGELOG.md">flowbite-react's changelog</a>.</em></p> <blockquote> <h2>0.11.8</h2> <h3>Patch Changes</h3> <ul> <li><a href="https://redirect.github.com/themesberg/flowbite-react/pull/1579">#1579</a> <a href="`d44648d0ab`"><code>d44648d</code></a> Thanks <a href="https://github.com/SutuSebastian"><code>@SutuSebastian</code></a>! - fix(Datepicker): switch hardcoded color <code>cyan</code> -> <code>primary</code></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`557d2336b6`"><code>557d233</code></a> Version Packages (<a href="https://github.com/themesberg/flowbite-react/tree/HEAD/packages/ui/issues/1580">#1580</a>)</li> <li><a href="`d44648d0ab`"><code>d44648d</code></a> fix(Datepicker): switch hardcoded color <code>cyan</code> -> <code>primary</code> (<a href="https://github.com/themesberg/flowbite-react/tree/HEAD/packages/ui/issues/1579">#1579</a>)</li> <li><a href="`c6f235cc97`"><code>c6f235c</code></a> Improve table pagination (<a href="https://github.com/themesberg/flowbite-react/tree/HEAD/packages/ui/issues/1567">#1567</a>)</li> <li>See full diff in <a href="https://github.com/themesberg/flowbite-react/commits/flowbite-react@0.11.8/packages/ui">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=flowbite-react&package-manager=npm_and_yarn&previous-version=0.11.7&new-version=0.11.8)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore <dependency name> major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore <dependency name> minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore <dependency name>` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore <dependency name>` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore <dependency name> <ignore condition>` will remove the ignore condition of the specified dependency and ignore conditions </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-10 16:57:12 +00:00
Thomas Eizinger	7689402c50	chore(snownet): print packets of unknown format (#9818 ) When receiving UDP packets that we cannot decode we log an error. In order to identify, whether we might have bugs in our decoding logic, we now also print the hex-encoding of the packet for further analysis on DEBUG.	2025-07-10 15:11:54 +00:00
Thomas Eizinger	b4b50b5615	fix(gui-client): move `tslink` metadata (#9817 ) A recent release of `tslink` now supports configuration via the `package.metadata` table which resolved a warning about "unknown key" that we have seeing for a while.	2025-07-10 14:54:50 +00:00

1 2 3 4 5 ...

7790 Commits