firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-03-22 03:41:56 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	ed5285268d	refactor: merge `on_update_routes` and `on_set_interface_config` (#7699 ) For a while now, `connlib` has been calling these two callbacks right after each other because the internal event already bundles all the information about the TUN device. With this PR, we merge the two callback functions also in layers above `connlib` itself. Resolves: #6182.	2025-01-08 18:26:40 +00:00
Thomas Eizinger	3a8c6c7182	chore(connlib): assert that we don't emit `WouldBlock` errors (#7696 ) When file descriptors like sockets or the TUN device are opened in non-blocking mode, performing operations that would block emit the `WouldBlock` IO error. These errors _should_ be translated into `Poll::Pending` and have a waker registered that gets called whenever the operation should be attempted again. Therefore, we should _never_ see these IO errors. Previously, the implementation of the tunnel's event-loop did not yet properly handle this backpressure and instead sometimes dropped packets when it should have suspended. This has since been fixed but the then introduced branch of just ignored the `io::ErrorKind::WouldBlock` errors had remained. Changing this to a debug-assert will alert us whenever we accidentally break this without altering the behaviour of the release binary.	2025-01-08 14:11:01 +00:00
Thomas Eizinger	8a1b6f26b4	fix(connlib): don't log warnings for unreachable errors (#7537 ) When a Gateway or Client is running in an environment without IPv4 or IPv6 connectivity, our initial probes for sending packets to the relays will fail with network unreachable. That isn't a very big concern and happens a lot in the wild. There is no need to report these as telemetry events. Resolves: #7514.	2024-12-17 17:59:20 +00:00
Thomas Eizinger	7e38d3caee	chore(connlib): downgrade warning about failed flow (#7480 )	2024-12-11 19:01:37 +00:00
Thomas Eizinger	b802021cc4	feat(connlib): implement idempotent control protocol for client (#6942 ) Building on top of the gateway PR (#6941), this PR transitions the clients to the new control protocol. Clients are not backwards-compatible with old gateways. As a result, a certain customer environment MUST have at least one gateway with the above PR running in order for clients to be able to establish connections. With this transition, Clients send explicit events to Gateways whenever they assign IPs to a DNS resource name. The actual assignment only happens once and the IPs then remain stable for the duration of the client session. When the Gateway receives such an event, it will perform a DNS resolution of the requested domain name and set up the NAT between the assigned proxy IPs and the IPs the domain actually resolves to. In order to support self-healing of any problems that happen during this process, the client will send an "Assigned IPs" event every time it receives a DNS query for a particular domain. This in turn will trigger another DNS resolution on the Gateway. Effectively, this means that DNS queries for DNS resources propagate to the Gateway, triggering a DNS resolution there. In case the domain resolves to the same set of IPs, no state is changed to ensure existing connections are not interrupted. With this new functionality in place, we can delete the old logic around detecting "expired" IPs. This is considered a bugfix as this logic isn't currently working as intended. It has been observed multiple times that the Gateway can loop on this behaviour and resolving the same domain over and over again. The only theoretical "incompatibility" here is that pre-1.4.0 clients won't have access to this functionality of triggering DNS refreshes on a Gateway 1.4.2+ Gateway. However, as soon as this PR merges, we expect all admins to have already upgraded to a 1.4.0+ Gateway anyway which already mandates clients to be on 1.4.0+. Resolves: #7391. Resolves: #6828.	2024-12-04 12:05:35 +00:00
Thomas Eizinger	dd6b52b236	chore(rust): share edition key via workspace table (#7451 )	2024-12-03 00:28:06 +00:00
Thomas Eizinger	2c26fc9c0e	ci: lint Rust dependencies using `cargo deny` (#7390 ) One of Rust's promises is "if it compiles, it works". However, there are certain situations in which this isn't true. In particular, when using dynamic typing patterns where trait objects are downcast to concrete types, having two versions of the same dependency can silently break things. This happened in #7379 where I forgot to patch a certain Sentry dependency. A similar problem exists with our `tracing-stackdriver` dependency (see #7241). Lastly, duplicate dependencies increase the compile-times of a project, so we should aim for having as few duplicate versions of a particular dependency as possible in our dependency graph. This PR introduces `cargo deny`, a linter for Rust dependencies. In addition to linting for duplicate dependencies, it also enforces that all dependencies are compatible with an allow-list of licenses and it warns when a dependency is referred to from multiple crates without introducing a workspace dependency. Thanks to existing tooling (https://github.com/mainmatter/cargo-autoinherit), transitioning all dependencies to workspace dependencies was quite easy. Resolves: #7241.	2024-11-22 00:17:28 +00:00
Thomas Eizinger	de35bb067e	fix(telemetry): don't embed errors values in `telemetry_event!` (#7366 ) Due to https://github.com/getsentry/sentry-rust/issues/702, errors which are embedded as `tracing::Value` unfortunately get silently discarded when reported as part of Sentry "Event"s and not "Exception"s. The design idea of these telemetry events is that they aren't fatal errors so we don't need to treat them with the highest priority. They may also appear quite often, so to save performance and bandwidth, we sample them at a rate of 1% at creation time. In order to not lose the context of these errors, we instead format them into the message. This makes them completely identical to the `debug!` logs which we have on every call-site of `telemetry_event!` which prompted me to make that implicit as part of creating the `telemetry_event!`. Resolves: #7343.	2024-11-18 18:17:08 +00:00
Thomas Eizinger	3cf5cbb989	chore(connlib): only send some tunnel errors to Sentry (#7340 ) Errors from the tunnel can potentially happen on a per-packet basis. In order to not flood Sentry, reduce the log-level down to `debug` and only report 1% of all errors. We did the same thing for the gateway in #7299.	2024-11-14 02:32:37 +00:00
Thomas Eizinger	4d2dc3dfcb	refactor(connlib): don't rely on DNS when reconnecting to portal (#7289 ) Currently, `connlib` uses the feature of "known hosts" to provide DNS functionality for some domains even without any network connectivity. This is primarily used to ensure that when we reconnect to the portal, we can resolve the domain name which allows us to then create network connections. With recent changes to how our phoenix-channel implementation works, this is actually no longer necessary. Currently, we re-resolve the domain every time we connect, even though we already resolved them when connecting to it for the first time. This step is unnecessary and we can simply directly use the previously resolved IP addresses for the portal domain.	2024-11-08 05:06:42 +00:00
dependabot[bot]	a2828a217b	build(deps): Bump thiserror from 1.0.64 to 1.0.68 in /rust (#7260 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.64 to 1.0.68. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/dtolnay/thiserror/releases">thiserror's releases</a>.</em></p> <blockquote> <h2>1.0.68</h2> <ul> <li>Handle incomplete expressions more robustly in format arguments, such as while code is being typed (<a href="https://redirect.github.com/dtolnay/thiserror/issues/341">#341</a>, <a href="https://redirect.github.com/dtolnay/thiserror/issues/344">#344</a>)</li> </ul> <h2>1.0.67</h2> <ul> <li>Improve expression syntax support inside format arguments (<a href="https://redirect.github.com/dtolnay/thiserror/issues/335">#335</a>, <a href="https://redirect.github.com/dtolnay/thiserror/issues/337">#337</a>, <a href="https://redirect.github.com/dtolnay/thiserror/issues/339">#339</a>, <a href="https://redirect.github.com/dtolnay/thiserror/issues/340">#340</a>)</li> </ul> <h2>1.0.66</h2> <ul> <li>Improve compile error on malformed format attribute (<a href="https://redirect.github.com/dtolnay/thiserror/issues/327">#327</a>)</li> </ul> <h2>1.0.65</h2> <ul> <li>Ensure OUT_DIR is left with deterministic contents after build script execution (<a href="https://redirect.github.com/dtolnay/thiserror/issues/325">#325</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`8d06fb5549`"><code>8d06fb5</code></a> Release 1.0.68</li> <li><a href="`372fd8a71a`"><code>372fd8a</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/thiserror/issues/344">#344</a> from dtolnay/binop</li> <li><a href="`08f89925bf`"><code>08f8992</code></a> Disregard equality binop in fallback parser</li> <li><a href="`d2a823d2ae`"><code>d2a823d</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/thiserror/issues/343">#343</a> from dtolnay/unnamed</li> <li><a href="`b3bf7a6f69`"><code>b3bf7a6</code></a> Add logic to determine whether unnamed fmt arguments are present</li> <li><a href="`490f9c017b`"><code>490f9c0</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/thiserror/issues/342">#342</a> from dtolnay/synfull</li> <li><a href="`7daf1b169d`"><code>7daf1b1</code></a> Defer is_syn_full() call until first expression</li> <li><a href="`c92ac9940b`"><code>c92ac99</code></a> Merge pull request <a href="https://redirect.github.com/dtolnay/thiserror/issues/341">#341</a> from dtolnay/parsescan</li> <li><a href="`40a53f7f33`"><code>40a53f7</code></a> Interleave Expr parsing and scanning better</li> <li><a href="`925f2dde77`"><code>925f2dd</code></a> Release 1.0.67</li> <li>Additional commits viewable in <a href="https://github.com/dtolnay/thiserror/compare/1.0.64...1.0.68">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=thiserror&package-manager=cargo&previous-version=1.0.64&new-version=1.0.68)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-11-04 19:06:15 +00:00
Thomas Eizinger	f7a388345b	fix(connlib): reconnect in case we lose all relays (#7164 ) During normal operation, we should never lose connectivity to the set of assigned relays in a client or gateway. In the presence of odd network conditions and partitions however, it is possible that we disconnect from a relay that is in fact only temporarily unavailable. Without an explicit mechanism to retrieve new relays, this means that both clients and gateways can end up with no relays at all. For clients, this can be fixed by either roaming or signing out and in again. For gateways, this can only be fixed by a restart! Without connected relays, no connections can be established. With #7163, we will at least be able to still establish direct connections. Yet, that isn't good enough and we need a mechanism for restoring full connectivity in such a case. We creating a new connection, we already sample one of our relays and assign it to this particular connection. This ensures that we don't create an excessive amount of candidates for each individual connection. Currently, this selection is allowed to be silently fallible. With this PR, we make this a hard-error and bubble up the error that all the way to the client's and gateway's event-loop. There, we initiate a reconnect to the portal as a compensating action. Reconnecting to the portal means we will receive another `init` message that allows us to reconnect the relays. Due to the nature of this implementation, this fix may only apply with a certain delay from when we actually lost connectivity to the last relay. However, this design has the advantage that we don't have to introduce an additional state within `snownet`: Connections now simply fail to establish and the next one soon after _should_ succeed again because we will have received a new `init` message. Resolves: #7162.	2024-10-29 01:01:47 +00:00
Thomas Eizinger	2ca91a3b1a	chore(connlib): remove old `mock` feature (#7142 ) This is so stale, it definitely needs to go in the bin.	2024-10-23 16:30:15 +00:00
Thomas Eizinger	ee30368970	refactor(connlib): simplify error handling on crash (#7134 ) The `fmt::Display` implementation of `tokio::task::JoinError` already does exactly what we do here: Extracting the panic message if there is one. Thus, we can simplify this code why just moving the `JoinError` into the `DisconnectError` as its source.	2024-10-23 16:13:39 +00:00
Thomas Eizinger	990324b2ec	chore(rust): enable `sentry-tracing` integration (#7105 ) Using the `sentry-tracing` integration, we can automatically capture events based on what we log via `tracing`. The mapping is defined as follows: - ERROR: Gets captured as a fatal error - WARN: Gets captured as a message - INFO: Gets captured as a breadcrumb - `_`: Does not get captured at all If telemetry isn't active / configured, this integration does nothing. It is therefore safe to just always enable it.	2024-10-22 23:23:49 +00:00
Thomas Eizinger	73eebd2c4d	refactor(rust): consistently record errors as `tracing::Value` (#7104 ) Our logging library, `tracing` supports structured logging. This is useful because it preserves the more than just the string representation of a value and thus allows the active logging backend(s) to capture more information for a particular value. In the case of errors, this is especially useful because it allows us to capture the sources of a particular error. Unfortunately, recording an error as a tracing value is a bit cumbersome because `tracing::Value` is only implemented for `&dyn std::error::Error`. Casting an error to this is quite verbose. To make it easier, we introduce two utility functions in `firezone-logging`: - `std_dyn_err` - `anyhow_dyn_err` Tracking errors as correct `tracing::Value`s will be especially helpful once we enable Sentry's `tracing` integration: https://docs.rs/sentry-tracing/latest/sentry_tracing/#tracking-errors	2024-10-22 04:46:26 +00:00
Thomas Eizinger	dbe618c080	refactor(connlib): expose `&mut TRoleState` for direct access (#7026 ) Currently, we have a lot of stupid code to forward data from the `{Client,Gateway}Tunnel` interface to `{Client,Gateway}State`. Recent refactorings such as #6919 made it possible to get rid of this forwarding layer by directly exposing `&mut TRoleState`. To maintain some type-privacy, several functions are made generic to accept `impl Into` or `impl TryInto`.	2024-10-15 01:05:35 +00:00
Thomas Eizinger	02b0e1dc8d	chore: don't report authentication errors to sentry (#6948 ) Do we want to track 401s in sentry? If we see a lot of them, something is likely wrong but I guess there is some level of 401s that users will just run into. Is there a way of marking these as "might not be a really bad error"? --------- Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>	2024-10-08 06:26:39 +00:00
Thomas Eizinger	2d4818e007	refactor(connlib): rotate tunnel private key on `reset` (#6909 ) With the new control protocol specified in #6461, the client will no longer initiate new connections. Instead, the credentials are generated deterministically by the portal based on the gateway's and the client's public key. For as long as they use the same public key, they also have the same in-memory state which makes creating connections idempotent. What we didn't consider in the new design at first is that when clients roam, they discard all connections but keep the same private key. As a result, the portal would generate the same ICE credentials which means the gateway thinks it can reuse the existing connection when new flows get authorized. The client however discarded all connections (and rotated its ports and maybe IPs), meaning the previous candidates sent to the gateway are no longer valid and connectivity fails. We fix this by also rotating the private keys upon reset. Rotating the keys itself isn't enough, we also need to propagate the new public key all the way "over" to the phoenix channel component which lives separately from connlib's data plane. To achieve this, we change `PhoenixChannel` to now start in the "disconnected" state and require an explicit `connect` call. In addition, the `LoginUrl` constructed by various components now acts merely as a "prototype", which may require additional data to construct a fully valid URL. In the case of client and gateway, this is the public key of the `Node`. This additional parameter needs to be passed to `PhoenixChannel` in the `connect` call, thus forming a type-safe contract that ensures we never attempt to connect without providing a public key. For the relay, this doesn't apply. Lastly, this allows us to tidy up the code a bit by: a) generating the `Node`'s private key from the existing RNG b) removing `ConnectArgs` which only had two members left Related: #6461. Related: #6732.	2024-10-07 22:28:51 +00:00
Thomas Eizinger	be250f1e00	refactor(connlib): repurpose `connlib-shared` as `connlib-model` (#6919 ) The `connlib-shared` crate has become a bit of a dependency magnet without a clear purpose. It hosts utilities like `get_user_agent`, messages for the client and gateway to communicate with the portal and domain types like `ResourceId`. To create a better dependency structure in our workspace, we repurpose `connlib-shared` as a `connlib-model` crate. Its purpose is to host domain-specific model types that multiple crates may want to use. For that purpose, we rename the `callbacks::ResourceDescription` type to `ResourceView`, designating that this is a _view_ onto a resource as seen by `connlib`. The message types which currently double up as connlib-internal model thus become an implementation detail of `firezone-tunnel` and shouldn't be used for anything else. --------- Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com> Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-10-03 14:47:58 +00:00
Reactor Scram	05a2b28d9f	feat(rust/gui-client): add sentry.io error reporting (#6782 ) Refs #6138 Sentry is always enabled for now. In the near future we'll make it opt-out per device and opt-in per org (see #6138 for details) - Replaces the `crash_handling` module - Catches panics in GUI process, tunnel daemon, and Headless Client - Added a couple "breadcrumbs" to play with that feature - User ID is not set yet - Environment is set to the API URL, e.g. `wss://api.firezone.dev` - Reports panics from the connlib async task - Release should be automatically pulled from the Cargo version which we automatically set in the version Makefile Example screenshot of sentry.io with a caught panic: <img width="861" alt="image" src="https://github.com/user-attachments/assets/c5188d86-10d0-4d94-b503-3fba51a21a90">	2024-09-27 16:34:54 +00:00
Thomas Eizinger	5b391a9c66	chore(connlib): remove outdated log (#6808 ) This log is currently printed after we receive the `init` message from the client. It is a left-over from early days of connlib where receiving `init` itself already triggered all kinds of actions. These days, we are mostly just updating state. In addition, `init` can be received multiple times during a client's session which is somewhat confusing when you see multiple "Firezone started" logs.	2024-09-25 22:21:14 +00:00
Thomas Eizinger	480a065bf8	chore(connlib): mitigate WARN logs from phoenix-channel (#6759 ) Merging #6708 had an unintended side-effect that we are seeing a lot of WARN logs from phoenix-channel because we can no longer parse the response from gateways. We didn't do anything with these responses but gateways are sending them for backwards-compatibility reasons. To not confuse ourselves while debugging, we revert the client-side bit of #6708 to remove these warnings.	2024-09-18 20:36:04 +00:00
Thomas Eizinger	5ae06a7b8c	chore(gateway): remove domain response (breaks < 1.1.0 clients) (#6708 ) Prior to version 1.1.0, clients did not have an embedded DNS resolver and relied on the gateway for DNS resolution. In that design, the gateway responded with the IPs that the domain resolved to. Our next iteration of the control protocol (#6461) will decouple the details of how DNS works from the flow-authorization. As a result, we will need to be able to establish a flow for a DNS resource without knowing which concrete domain the client is going to access. Without a concrete domain, we cannot send anything back to these old clients, meaning we unfortunately have to break compatibility with < 1.1.0 clients as part of implementing the new control protocol.	2024-09-18 14:12:46 +00:00
Thomas Eizinger	a9f515a453	chore(rust): use `#[expect]` instead of `#[allow]` (#6692 ) The `expect` attribute is similar to `allow` in that it will silence a particular lint. In addition to `allow` however, `expect` will fail as soon as the lint is no longer emitted. This ensures we don't end up with stale `allow` attributes in our codebase. Additionally, it provides a way of adding a `reason` to document, why the lint is being suppressed.	2024-09-16 13:51:12 +00:00
Thomas Eizinger	c02b5a6333	test(connlib): assert expected routes (#6611 ) When CIDR resources get added or removed, we need to update the routing table on the clients to redirect traffic for these resources to the TUN device. Currently, this is done in a separate event from the remaining `TunConfig` tracked in `connlib`. Having this in a separate event means it is hard to diff, whether anything meaningful changed about the TUN device. Additionally, changes to these routes are currently not tested in `tunnel_test`. Not having this code tested already caused bugs previously, such as #6387. To fix these things, we: - Add the IPv4 and IPv6 routes to the `TunConfig` tracked in `connlib` - Track the expected routes in `RefClient` - Assert that we don't emit `TunConfigUpdated` events without any actual changes Fixes: #6423.	2024-09-09 19:44:47 +00:00
Thomas Eizinger	e3688a475e	refactor(connlib): only buffer 1 unsent packet if socket is busy (#6563 ) Currently, we buffer UDP packets whenever the socket is busy and try to flush them out at a later point. This requires allocations and is tricky to get right. In order to solve both of these problems, we refactor `snownet` to return us an `EncryptedPacket` instead of a `Transmit`. An `EncryptedPacket` is an indirection-abstraction that can be turned into a `Transmit` given an `EncryptBuffer`. This combination of types allows us to hold on to the `EncryptedPacket` (which does not contain any references itself) in the `io` component whilst we are waiting for the socket to be ready to send again. This means we will immediately suspend the event loop in case the socket is no longer ready for sending and resend the datagram in the `EncryptBuffer` once we get re-polled.	2024-09-04 16:59:33 +00:00
Thomas Eizinger	216f6efc5f	chore(connlib): track dedicated `TunConfig` (#6427 ) Currently, `connlib` tracks the `Interface` as it is given it by the portal. This includes the tunnel IP addresses plus the upstream servers. Upstreams servers however only take effect when they are defined. Without upstream DNS servers, `connlib` uses the system-defined DNS servers. In that case, the `Interface` no longer accurately represents, what we actually configure on the TUN device. To fix this, we introduce a dedicated `TunConfig` struct that tracks, what is actually set on the interface. This also allows us to track, whether or not we need to re-emit this configuration after a change. Related: #6423. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2024-08-29 02:32:41 +00:00
Thomas Eizinger	d042addc5f	refactor(connlib): model "routing table updates" (#6436 ) Upon receiving packets for a resource that we are not connected to, connlib emits a "connection intent" to the portal. In case there are gateways online for this resource, the portal sends us a "connection details" event. Currently, this is handled in a `create_or_reuse_connection` function. What the current name doesn't capture is that this message is essentially an update to connlib's "routing table", i.e. which gateway in which site to use for the given resource. If we move this concern to the fore-front of the design, whether or not we will make a new connection or reuse an existing one kind of becomes secondary. Re-framing the way we handle this messages makes it more natural to design it in an asynchronous way, i.e. set its return type to `()` and schedule events to be emitted. The translation of `Request::NewConnection` is more or less 1-to-1 with the introduction of `ClientEvent::RequestConnection`. The translation of `Request::ReuseConnection` turns into the also renamed `ClientEvent::RequestAccess`. This captures better what we need to do: When we have an existing connection, we need to request access for it, otherwise the gateway will drop all packets we send to this resource. The motivation for this refactoring is #6335. Buffering the initial packets while establishing a new connection opens up a race condition where we may send `RequestAccess` before the gateway has processed `RequestConnection`. In order to avoid this, we need to be able to locally buffer our `RequestAccess` messages and wait until the gateway has confirmed our connection.	2024-08-27 04:17:19 +00:00
Reactor Scram	482ded889e	fix(rust/gui-client): throw error when failing to connect to Firezone (#6409 ) Refs #6389 ```[tasklist] - [x] Update changelog - [x] Update manual test cases ``` This changes the behavior from "fail silently" to "fail loudly" so at least the user knows something is wrong and they can restart Firezone after they gain Internet. <img width="439" alt="image" src="https://github.com/user-attachments/assets/d5bbac66-9a5f-40a6-8b4c-71d8ab8abd6d"> <img width="554" alt="image" src="https://github.com/user-attachments/assets/bcee1f87-bd29-4a44-b41f-a01217e3248e">	2024-08-23 15:37:57 +00:00
Thomas Eizinger	a1049b7d78	feat(connlib): suspend if we don't have UDP sockets (#6398 ) Previously, failing to bind to any interfaces was a hard-error. In reality and in `connlib`'s current state, this is quite unlikely because machines will at least have a loopback interface that we will bind to. However, with #6382 in the pipeline, it may be more likely that we actually end up with no functional UDP sockets. Furthermore, we are considering to extend those connectivity checks in the future. Thus, it is important that the case of "no available UDP sockets" is gracefully handled. Instead of failing with a hard-error, we now suspend `connlib's` network stack. The connectivity to the portal is unaffected by this and we will still also receive commands from the client application like `reset`. When we receive a `reset`, we attempt to rebind the sockets and thus retry connectivity. Because we are suspending the entire eventloop, this won't send any messages or trigger any timers whatsoever. For example, if we hypothetically started up without network interfaces, this is now the log output: ``` 2024-08-22T01:50:42.170101Z INFO firezone_headless_client: arch="x86_64" git_version="headless-client-1.2.0-2-gc8eed5938-modified" 2024-08-22T01:50:42.178777Z DEBUG phoenix_channel: Connecting to portal host=api.firez.one user_agent=NixOS/24.5.0 connlib/1.2.1 (x86_64; 6.8.12) 2024-08-22T01:50:42.178978Z DEBUG firezone_headless_client::dns_control::linux: Deactivating DNS control... 2024-08-22T01:50:42.180691Z ERROR firezone_tunnel::sockets: No available UDP sockets 2024-08-22T01:50:42.197098Z INFO firezone_tunnel::device_channel: Initializing TUN device name=tun-firezone 2024-08-22T01:50:42.197165Z DEBUG firezone_tunnel::client: Unable to update DNS servesr without interface configuration 2024-08-22T01:50:42.453988Z DEBUG tungstenite::handshake::client: Client handshake done. 2024-08-22T01:50:42.454161Z INFO phoenix_channel: Connected to portal host=api.firez.one 2024-08-22T01:50:42.676825Z DEBUG firezone_tunnel::client: Updating DNS servers mapping={fd00:2021:1111:8000:100:100:111:0 <> [2606:4700:4700::1111]:53, 100.100.111.1 <> 1.1.1.1:53} 2024-08-22T01:50:42.677084Z INFO firezone_tunnel::client: Activating resource name=IPerf3 address=10.0.32.101/32 sites=AWS Dev (Gateways track `main`) 2024-08-22T01:50:42.677173Z INFO firezone_tunnel::client: Activating resource name=.slack.com address=.slack.com sites=Vultr Stable (Latest Release Gateways) 2024-08-22T01:50:42.677223Z INFO firezone_tunnel::client: Activating resource name=.slack-edge.com address=*.slack-edge.com sites=Vultr Stable (Latest Release Gateways) 2024-08-22T01:50:42.677283Z INFO firezone_tunnel::client: Activating resource name=.spotify.com address=*.spotify.com sites=AWS Dev (Gateways track `main`) 2024-08-22T01:50:42.677345Z INFO firezone_tunnel::client: Activating resource name=.github.com address=.github.com sites=AWS Dev (Gateways track `main`) 2024-08-22T01:50:42.677418Z INFO firezone_tunnel::client: Activating resource name=whatismyip.com address=.whatismyip.com sites=AWS Dev (Gateways track `main`) 2024-08-22T01:50:42.677489Z INFO firezone_tunnel::client: Activating resource name=ifconfig.net address=ifconfig.net sites=Vultr Stable (Latest Release Gateways) 2024-08-22T01:50:42.677538Z INFO firezone_tunnel::client: Activating resource name=.google.com address=.google.com sites=AWS Dev (Gateways track `main`) 2024-08-22T01:50:42.677632Z INFO firezone_tunnel::client: Activating resource name=.fastmail.com address=**.fastmail.com sites=AWS Dev (Gateways track `main`) 2024-08-22T01:50:42.677682Z INFO firezone_tunnel::client: Activating resource name=speed.cloudflare.com address=speed.cloudflare.com sites=Vultr Stable (Latest Release Gateways) 2024-08-22T01:50:42.678212Z INFO snownet::node: Added new TURN server rid=b6fc4d73-9c8e-44df-a941-da7d2134cb70 address=Dual { v4: 34.40.133.55:3478, v6: [2600:1900:40b0:1504:0:97::]:3478 } 2024-08-22T01:50:42.678322Z INFO snownet::node: Added new TURN server rid=c818b11a-d0cc-4f2a-bb88-473d8298a885 address=Dual { v4: 34.81.229.132:3478, v6: [2600:1900:4030:b0d9:0:9b::]:3478 } 2024-08-22T01:50:42.678365Z INFO connlib_client_shared::eventloop: Firezone Started! ``` After this, nothing will happen other than receiving messages via from the portal or the client app. Related: #6382. Related: #6385.	2024-08-22 04:15:31 +00:00
Thomas Eizinger	16da501a7d	feat(connlib): remember recently connected gateways (#6361 ) Previously, `connlib` would only send the currently connected gateways to the portal upon a new connection intent. With our introduced idle connection timeout, this could result in the portal choosing a different gateway upon reconnecting to the resource. To fix this, we introduce an LRU cache with at most 100 entries. Iteration over the LRU cache happens in MRU order, meaning a recently connected gateway will be at the front of the list. We assume that this list is processed in order and thus still prefer gateways that we are still connected to. Related: #6347.	2024-08-21 07:11:36 +00:00
Thomas Eizinger	3b56664e02	test(rust): ensure deterministic proptests (#6319 ) For quite a while now, we have been making extensive use of property-based testing to ensure `connlib` works as intended. The idea of proptests is that - given a certain seed - we deterministically sample test inputs and assert properties on a given function. If the test fails, `proptest` prints the seed which can then be added to a regressions file to iterate on the test case and fix it. It is quite obvious that non-determinism in how the test input gets generated is no bueno and reduces the value we get out of these tests a fair bit. The `HashMap` and `HashSet` data structures are known to be non-deterministic in their iteration order. This causes non-determinism during the input generation because we make use of a lot of maps and sets to gradually build up the test input. We fix all uses of `HashMap` and `HashSet` by replacing them with `BTreeMap` and `BTreeSet`. To ensure this doesn't regress, we refactor `tunnel_test` to not make use of proptest's macros and instead, we initialise and run the test ourselves. This allows us to dump the sampled state and transitions into a file per test run. In CI, we then run a 2nd iteration of all regression tests and compare the sampled state and transitions with the previous run. They must match byte-for-byte. Finally, to discourage use of non-deterministic iteration, we ban the use of the iteration functions on `HashMap` and `HashSet` across the codebase. This doesn't catch iteration in a `for`-loop but it is better than not linting against it at all. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-08-16 23:15:58 +00:00
Thomas Eizinger	d399e65246	build(deps): bump tokio-tungstenite to 0.23 (#5509 ) With the upgrade to 0.23, `tokio-tungstenite` pulls in `rustls` 0.27 which supports multiple crypto providers. By default, this uses the `aws-lc-crypto` provider. The previous default was `ring`. This PR bumps the necessary versions and installs the `ring` crypto provider at the beginning of each application, before connlib starts. We try and do this as early as possible to make it obvious that it only needs to happen once per process. Resolves: #5380.	2024-08-15 06:02:17 +00:00
Thomas Eizinger	7642f37d56	refactor: thin out `connlib-shared` (#6256 ) Most of `connlib-shared` exists only for historical reasons. The `Tunnel` has since been decoupled from the `Callbacks` and most error variants on `ConnlibError` are not actually used. This allows us to move a few things around and trim down `ConnlibError` to just the variants that actually cause a call to `on_disconnect`. Moving everything related to `proptest`s to `firezone-tunnel` also requires us to delete the specialisation for printing IDs in a shorter format during the tests. That is a bit unfortunate but was always kind of a hack. I'd rather make progress on getting rid of `connlib-shared` though and perhaps re-introduce that feature once the messages are fully moved into the tunnel. Related: #4470.	2024-08-12 22:57:06 +00:00
Thomas Eizinger	bed625a312	chore(rust): make logging more ergonomic (#6237 ) Setting up a logger is something that pretty much every entrypoint needs to do, be it a test, a shared library embedded in another app or a standalone application. Thus, it makes sense to introduce a dedicated crate that allows us to bundle all the things together, how we want to do logging. This allows us to introduce convenience functions like `firezone_logging::test` which allow you to construct a logger for a test as a one-liner. Crucially though, introducing `firezone-logging` gives us a place to store a default log directive that silences very noisy crates. When looking into a problem, it is common to start by simply setting the log-filter to `debug`. Without further action, this floods the output with logs from crates like `netlink_proto` on Linux. It is very unlikely that those are the logs that you want to see. Without a preset filter, the only alternative here is to explicitly turn off the log filter for `netlink_proto` by typing something like `RUST_LOG=netlink_proto=off,debug`. Especially when debugging issues with customers, this is annoying. Log filters can be overridden, i.e. a 2nd filter that matches the exact same scope overrides a previous one. Thus, with this design it is still possible to activate certain logs at runtime, even if they have silenced by default. I'd expect `firezone-logging` to attract more functionality in the future. For example, we want to support re-loading of log-filters on other platforms. Additionally, where logs get stored could also be defined in this crate. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-08-10 05:17:03 +00:00
Thomas Eizinger	128d0eb407	feat(connlib): transparently forward non-resources DNS queries (#6181 ) Currently, `connlib` depends on `hickory-resolver` to perform DNS queries for non-resources. This is unnecessary. Instead of buffering the original UDP DNS query, consulting hickory to resolve the name and mapping the response back, we can simply take the UDP payload and send it via our protected socket directly to the original upstream DNS server. This ensures `connlib` is as transparent as possible for DNS queries for non-resources. Additionally, it removes a lot of error handling and other cruft that we currently have to perform because we are using hickory. For example, hickory will automatically retry a DNS query after a certain timeout. However, the OS / client talking to `connlib` will also retry after a certain timeout because it is making DNS queries over an unreliable transport (UDP). It is thus unnecessary for us to do that internally. To correctly test this change, our test-suite needed some refactoring. Specifically, DNS servers are now modelled as dedicated `Host`s that can receive (UDP) traffic. Lastly, we can remove our dependency on `hickory-proto` and `hickory-resolver` everywhere and only use `domain` for parsing DNS messages. Resolves: #6141. Related: #6033. Related: #4800. (Impossible to happen with this design)	2024-08-07 08:54:49 +00:00
Gabi	a2d849087a	feat(android): add setDisabledResources FFI (#6166 ) Builds on top of #6164 Part of the effor towards https://github.com/firezone/firezone/issues/6074 Prepares connlib to call `setDisableResource` from android. Furthermore, we add a `disablable` parameter for resources which default to false for now, in the future the portal will set it for the internet resource, and further in the future it may be used for other resources. The `disablable` parameter only affect UI.	2024-08-05 22:43:27 +00:00
Thomas Eizinger	fc4b8c7b46	refactor: rename `reconnect` to `reset` (#6057 ) Connection roaming within `connlib` has changed a fair-bit since we introduced the `reconnect` function. The new implementation is basically a hard-reset of all state within `connlib`. Renaming this function across all layers makes this more obvious. Resolves: #6038.	2024-07-28 07:41:45 +00:00
Thomas Eizinger	59014a9622	refactor(connlib): encapsulate UDP and TCP sockets (#6028 ) As part of debugging full-route tunneling on Windows, we discovered that we need to always explicitly choose the interface through which we want to send packets, otherwise Windows may cause a routing loop by routing our packets back into the TUN device. We already have a `SocketFactory` abstraction in `connlib` that is used by each platform to customise the setup of each socket to prevent routing loops. So far, this abstraction directly returns tokio sockets which don't allow us to intercept the actual sending of packets. For some of our traffic, i.e. the UDP packets exchanged with relays, we don't specify a source address. To make full-route work on Windows, we need to intercept these packets and explicitly set the source address. To achieve that, we introduce dedicated `TcpSocket` and `UdpSocket` structs within `socket-factory`. With this in place, we will be able to add Windows-conditional code to looks up and sets the source address of outgoing UDP packets. For TCP sockets, the lookup will happen prior to connecting to the address and used to bind to the correct interface. Related: #2667. Related: #5955.	2024-07-25 04:28:46 +00:00
dependabot[bot]	dae90d81e1	build(deps): bump opentelemetry dependencies (#6003 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-07-24 17:45:42 +00:00
dependabot[bot]	7be47f2c6e	build(deps): Bump url from 2.5.0 to 2.5.2 in /rust (#6002 ) Bumps [url](https://github.com/servo/rust-url) from 2.5.0 to 2.5.2. <details> <summary>Commits</summary> <ul> <li><a href="`54346fa288`"><code>54346fa</code></a> Revert "Reimplement idna on top of ICU4X" (<a href="https://redirect.github.com/servo/rust-url/issues/946">#946</a>)</li> <li><a href="`dcfbed3e90`"><code>dcfbed3</code></a> Update idna to 1.0.1 (<a href="https://redirect.github.com/servo/rust-url/issues/945">#945</a>)</li> <li><a href="`467ef63969`"><code>467ef63</code></a> fix panic on <code>xn--55555577</code> (<a href="https://redirect.github.com/servo/rust-url/issues/940">#940</a>)</li> <li><a href="`3d6dbbb1df`"><code>3d6dbbb</code></a> Reimplement idna on top of ICU4X (<a href="https://redirect.github.com/servo/rust-url/issues/923">#923</a>)</li> <li><a href="`de947abf89`"><code>de947ab</code></a> Document possible replacements of the base URL (<a href="https://redirect.github.com/servo/rust-url/issues/926">#926</a>)</li> <li><a href="`8b8431bbe1`"><code>8b8431b</code></a> docs: document SyntaxViolation variants, remove bare URLs (<a href="https://redirect.github.com/servo/rust-url/issues/924">#924</a>)</li> <li><a href="`fd042e003f`"><code>fd042e0</code></a> Non-special URLs can have their paths erased (<a href="https://redirect.github.com/servo/rust-url/issues/921">#921</a>)</li> <li><a href="`49eea1c2eb`"><code>49eea1c</code></a> Fix multiple issues on wasm32: (<a href="https://redirect.github.com/servo/rust-url/issues/886">#886</a>)</li> <li><a href="`a4dd58be59`"><code>a4dd58b</code></a> Fix lint (<a href="https://redirect.github.com/servo/rust-url/issues/920">#920</a>)</li> <li><a href="`73803fa780`"><code>73803fa</code></a> Update URLs (<a href="https://redirect.github.com/servo/rust-url/issues/916">#916</a>)</li> <li>Additional commits viewable in <a href="https://github.com/servo/rust-url/compare/v2.5.0...v2.5.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=url&package-manager=cargo&previous-version=2.5.0&new-version=2.5.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-07-24 07:28:00 +00:00
Thomas Eizinger	50d6b865a1	refactor(connlib): move `Tun` implementations out of `firezone-tunnel` (#5903 ) The different implementations of `Tun` are the last platform-specific code within `firezone-tunnel`. By introducing a dedicated crate and a `Tun` trait, we can move this code into (platform-specific) leaf crates: - `connlib-client-android` - `connlib-client-apple` - `firezone-bin-shared` Related: #4473. --------- Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>	2024-07-24 01:10:50 +00:00
Thomas Eizinger	67ffa7017e	fix(connlib): make iteration of maps and sets deterministic (#5943 ) For `tunnel_test`, it is very important that each execution of a set of state transitions is completely deterministic, otherwise the shrinking behaviour does not work. Iterating over `HashMap` and `HashSet` is non-deterministic. To fix this, we convert several maps and sets to `BTreeMap`s and `BTreeSet`s.	2024-07-22 21:35:39 +00:00
Thomas Eizinger	4937291d23	refactor(connlib): deal with resources one at a time (#5886 ) The two primary users of the `add_resources` and `remove_resources` are the client's eventloop and the `tunnel_test`. Both of them only ever pass a single resource at a time. It is thus simpler to remove the inner loop from within `ClientState` and simply process a single resource at a time.	2024-07-18 04:59:12 +00:00
Thomas Eizinger	da52c66023	refactor(clients): init `PhoenixChannel` in upper layers (#5884 ) This represents a step towards #3837. Eventually, we'd like the abstractions of `Session` and `Eventloop` to go away entirely. For that, we need to thin them out. The introduction of `ConnectArgs` was already a hint that we are passing a lot of data across layers that we shouldn't. To avoid that, we can simply initialise `PhoenixChannel` earlier and thus each callsite can specify the desired configuration directly. I've left `ConnectArgs` intact to keep the diff small.	2024-07-18 02:08:38 +00:00
Thomas Eizinger	58db5f0639	refactor(connlib): remove `Callbacks` from `Tunnel` (#5885 ) Following the removal of the return type from the callback functions in #5839, we can now move the use of the `Callbacks` one layer up the stack and decouple them entirely from the `Tunnel`. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Gabi <gabrielalejandro7@gmail.com>	2024-07-16 21:00:40 +00:00
Gabi	5b0aaa6f81	fix(connlib): protect all sockets from routing loops (#5797 ) Currently, only connlib's UDP sockets for sending and receiving STUN & WireGuard traffic are protected from routing loops. This is was done via the `Sockets::with_protect` function. Connlib has additional sockets though: - A TCP socket to the portal. - UDP & TCP sockets for DNS resolution via hickory. Both of these can incur routing loops on certain platforms which becomes evident as we try to implement #2667. To fix this, we generalise the idea of "protecting" a socket via a `SocketFactory` abstraction. By allowing the different platforms to provide a specialised `SocketFactory`, anything Linux-based can give special treatment to the socket before handing it to connlib. As an additional benefit, this allows us to remove the `Sockets` abstraction from connlib's API again because we can now initialise it internally via the provided `SocketFactory` for UDP sockets. --------- Signed-off-by: Gabi <gabrielalejandro7@gmail.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-07-16 00:40:05 +00:00
Thomas Eizinger	a4a8221b8b	refactor(connlib): explicitly initialise `Tun` (#5839 ) Connlib's routing logic and networking code is entirely platform agnostic. The only platform-specific bit is how we interact with the TUN device. From connlib's perspective though, all it needs is an interface for reading and writing. How the device gets initialised and updated is client-business. For the most part, this is the same on all platforms: We call callbacks and the client updates the state accordingly. The only annoying bit here is that Android recreates the TUN interface on every update and thus our old file descriptor is invalid. The current design works around this by returning the new file descriptor on Android. This is a problematic design for several reasons: - It forces the callback handler to finish synchronously, and halting connlib until this is complete. - The synchronous nature also means we cannot replace the callbacks with events as events don't have a return value. To fix this, we introduce a new `set_tun` method on `Tunnel`. This moves the business of how the `Tun` device is created up to the client. The clients are already platform-specific so this makes sense. In a future iteration, we can move all the various `Tun` implementations all the way up to the client-specific crates, thus co-locating the platform-specific code. Initialising `Tun` from the outside surfaces another issue: The routes are still set via the `Tun` handle on Windows. To fix this, we introduce a `make_tun` function on `TunDeviceManager` in order for it to remember the interface index on Windows and being able to move the setting of routes to `TunDeviceManager`. This simplifies several of connlib's APIs which are now infallible. Resolves: #4473. --------- Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com> Co-authored-by: conectado <gabrielalejandro7@gmail.com>	2024-07-12 23:54:15 +00:00
Thomas Eizinger	a4714d6de3	chore(connlib): print error after panicking (#5854 )	2024-07-12 14:30:11 +00:00

1 2 3 4

195 Commits