firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-04-05 05:06:18 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	2d4818e007	refactor(connlib): rotate tunnel private key on `reset` (#6909 ) With the new control protocol specified in #6461, the client will no longer initiate new connections. Instead, the credentials are generated deterministically by the portal based on the gateway's and the client's public key. For as long as they use the same public key, they also have the same in-memory state which makes creating connections idempotent. What we didn't consider in the new design at first is that when clients roam, they discard all connections but keep the same private key. As a result, the portal would generate the same ICE credentials which means the gateway thinks it can reuse the existing connection when new flows get authorized. The client however discarded all connections (and rotated its ports and maybe IPs), meaning the previous candidates sent to the gateway are no longer valid and connectivity fails. We fix this by also rotating the private keys upon reset. Rotating the keys itself isn't enough, we also need to propagate the new public key all the way "over" to the phoenix channel component which lives separately from connlib's data plane. To achieve this, we change `PhoenixChannel` to now start in the "disconnected" state and require an explicit `connect` call. In addition, the `LoginUrl` constructed by various components now acts merely as a "prototype", which may require additional data to construct a fully valid URL. In the case of client and gateway, this is the public key of the `Node`. This additional parameter needs to be passed to `PhoenixChannel` in the `connect` call, thus forming a type-safe contract that ensures we never attempt to connect without providing a public key. For the relay, this doesn't apply. Lastly, this allows us to tidy up the code a bit by: a) generating the `Node`'s private key from the existing RNG b) removing `ConnectArgs` which only had two members left Related: #6461. Related: #6732.	2024-10-07 22:28:51 +00:00
Thomas Eizinger	be250f1e00	refactor(connlib): repurpose `connlib-shared` as `connlib-model` (#6919 ) The `connlib-shared` crate has become a bit of a dependency magnet without a clear purpose. It hosts utilities like `get_user_agent`, messages for the client and gateway to communicate with the portal and domain types like `ResourceId`. To create a better dependency structure in our workspace, we repurpose `connlib-shared` as a `connlib-model` crate. Its purpose is to host domain-specific model types that multiple crates may want to use. For that purpose, we rename the `callbacks::ResourceDescription` type to `ResourceView`, designating that this is a _view_ onto a resource as seen by `connlib`. The message types which currently double up as connlib-internal model thus become an implementation detail of `firezone-tunnel` and shouldn't be used for anything else. --------- Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com> Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-10-03 14:47:58 +00:00
Thomas Eizinger	e901d51550	refactor(gateway): split proxy IP assignment from authorisation (#6812 ) At the moment, the mapping of proxy IPs to the resolved IPs of a DNS resource happens at the same time as the "authorisation" that the client is allowed to talk to that resource. This is somewhat convoluted because: - Mapping proxy IPs to resolved IPs only needs to happen for DNS resources, yet it is called for all resources (and internally skipped). - Wildcard DNS resources only need to be authorised once, after which the client is allowed to communicate with any domain matching the wildcard address. - The code that models resources within `ClientOnGateway` doesn't differentiate between resource types at all. With #6461, the authorisation of a resource will be completely decoupled from the domain resolution for a particular domain of a DNS resource. To make that easier to implement, we re-model the internals of `ClientOnGateway` to differentiate the various resource types. Instead of holding a single vec of addresses, the IPs are now indexed by the respective domain. For CIDR resources, we only hold a single address anyway and for the Internet Resource, the IP networks are static. This new model now implies that allowing a resource that has already been allowed essentially implies an update and the filters get re-calculated.	2024-09-26 23:04:03 +00:00
Thomas Eizinger	29bc276bf2	refactor(connlib): parallelise TUN operations (#6673 ) Currently, `connlib` is entirely single-threaded. This allows us to reuse a single buffer for processing IP packets and makes reasoning of the packet processing code very simple. Being single-threaded also means we can only make use of a single CPU core and all operations have to be sequential. Analyzing `connlib` using `perf` shows that we spend 26% of our CPU time writing packets to the TUN interface [0]. Because we are single-threaded, `connlib` cannot do anything else during this time. If we could offload the writing of these packets to a different thread, `connlib` could already process the next packet while the current one is writing. Packets that we send to the TUN interface arrived as an encrypted WG packet over UDP and get decrypted into a - currently - shared buffer. Moving the writing to a different thread implies that we have to have more of these buffer that the next packet(s) can be decrypted into. To avoid IP fragmentation, we set the maximum IP MTU to 1280 bytes on the TUN interface. That actually isn't very big and easily fits into a stackframe. The default stack size for threads is 2MB [1]. Instead of creating more buffers and cycling through them, we can also simply stack-allocate our IP packets. This incurs some overhead from copying packets but it is only ~3.5% [2] (This was measured without a separate thread). With stack-allocated packets, almost all lifetime-annotations go away which in itself is already a welcome ergonomics boost. Stack-allocated packets also means we can simply spawn a new thread for the packet processing. This thread is connected with two channel to connlib's main thread. The capacity of 1000 packets will at most consume an additional 3.5 MB of memory which is fine even on our most-constrained devices such as iOS. [0]: https://share.firefox.dev/3z78CzD [1]: https://doc.rust-lang.org/std/thread/#stack-size [2]: https://share.firefox.dev/3Bf4zla Resolves: #6653. Resolves: #5541.	2024-09-26 03:03:35 +00:00
Thomas Eizinger	480a065bf8	chore(connlib): mitigate WARN logs from phoenix-channel (#6759 ) Merging #6708 had an unintended side-effect that we are seeing a lot of WARN logs from phoenix-channel because we can no longer parse the response from gateways. We didn't do anything with these responses but gateways are sending them for backwards-compatibility reasons. To not confuse ourselves while debugging, we revert the client-side bit of #6708 to remove these warnings.	2024-09-18 20:36:04 +00:00
Thomas Eizinger	5ae06a7b8c	chore(gateway): remove domain response (breaks < 1.1.0 clients) (#6708 ) Prior to version 1.1.0, clients did not have an embedded DNS resolver and relied on the gateway for DNS resolution. In that design, the gateway responded with the IPs that the domain resolved to. Our next iteration of the control protocol (#6461) will decouple the details of how DNS works from the flow-authorization. As a result, we will need to be able to establish a flow for a DNS resource without knowing which concrete domain the client is going to access. Without a concrete domain, we cannot send anything back to these old clients, meaning we unfortunately have to break compatibility with < 1.1.0 clients as part of implementing the new control protocol.	2024-09-18 14:12:46 +00:00
Thomas Eizinger	35017537c7	feat(gateway): allow out-of-order `allow_access` requests (#6403 ) Currently, the gateway requires a strict ordering of first receiving a `request_connection` message, following by multiple `allow_access` messages. Additionally, access can be granted as part of the initial `request_connection` message too. This isn't an ideal design. Setting up a new connection is infallible, all we need to do is send our ICE credentials back to the client. However, untangling that will require a bit more effort. Starting with #6335, following this strict order on the client is a more difficult. Whilst we can send them in order, it is harder to maintain those ordering guarantees across all our systems. To avoid this, we change the gateway to perform an upsert for its local ACLs for a client. In case that an `allow_access` call would somehow get to the gateway earlier, we can simply already create the `Peer` and only set up the actual connection later. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-08-28 13:10:06 +00:00
Thomas Eizinger	a1049b7d78	feat(connlib): suspend if we don't have UDP sockets (#6398 ) Previously, failing to bind to any interfaces was a hard-error. In reality and in `connlib`'s current state, this is quite unlikely because machines will at least have a loopback interface that we will bind to. However, with #6382 in the pipeline, it may be more likely that we actually end up with no functional UDP sockets. Furthermore, we are considering to extend those connectivity checks in the future. Thus, it is important that the case of "no available UDP sockets" is gracefully handled. Instead of failing with a hard-error, we now suspend `connlib's` network stack. The connectivity to the portal is unaffected by this and we will still also receive commands from the client application like `reset`. When we receive a `reset`, we attempt to rebind the sockets and thus retry connectivity. Because we are suspending the entire eventloop, this won't send any messages or trigger any timers whatsoever. For example, if we hypothetically started up without network interfaces, this is now the log output: ``` 2024-08-22T01:50:42.170101Z INFO firezone_headless_client: arch="x86_64" git_version="headless-client-1.2.0-2-gc8eed5938-modified" 2024-08-22T01:50:42.178777Z DEBUG phoenix_channel: Connecting to portal host=api.firez.one user_agent=NixOS/24.5.0 connlib/1.2.1 (x86_64; 6.8.12) 2024-08-22T01:50:42.178978Z DEBUG firezone_headless_client::dns_control::linux: Deactivating DNS control... 2024-08-22T01:50:42.180691Z ERROR firezone_tunnel::sockets: No available UDP sockets 2024-08-22T01:50:42.197098Z INFO firezone_tunnel::device_channel: Initializing TUN device name=tun-firezone 2024-08-22T01:50:42.197165Z DEBUG firezone_tunnel::client: Unable to update DNS servesr without interface configuration 2024-08-22T01:50:42.453988Z DEBUG tungstenite::handshake::client: Client handshake done. 2024-08-22T01:50:42.454161Z INFO phoenix_channel: Connected to portal host=api.firez.one 2024-08-22T01:50:42.676825Z DEBUG firezone_tunnel::client: Updating DNS servers mapping={fd00:2021:1111:8000:100:100:111:0 <> [2606:4700:4700::1111]:53, 100.100.111.1 <> 1.1.1.1:53} 2024-08-22T01:50:42.677084Z INFO firezone_tunnel::client: Activating resource name=IPerf3 address=10.0.32.101/32 sites=AWS Dev (Gateways track `main`) 2024-08-22T01:50:42.677173Z INFO firezone_tunnel::client: Activating resource name=.slack.com address=.slack.com sites=Vultr Stable (Latest Release Gateways) 2024-08-22T01:50:42.677223Z INFO firezone_tunnel::client: Activating resource name=.slack-edge.com address=*.slack-edge.com sites=Vultr Stable (Latest Release Gateways) 2024-08-22T01:50:42.677283Z INFO firezone_tunnel::client: Activating resource name=.spotify.com address=*.spotify.com sites=AWS Dev (Gateways track `main`) 2024-08-22T01:50:42.677345Z INFO firezone_tunnel::client: Activating resource name=.github.com address=.github.com sites=AWS Dev (Gateways track `main`) 2024-08-22T01:50:42.677418Z INFO firezone_tunnel::client: Activating resource name=whatismyip.com address=.whatismyip.com sites=AWS Dev (Gateways track `main`) 2024-08-22T01:50:42.677489Z INFO firezone_tunnel::client: Activating resource name=ifconfig.net address=ifconfig.net sites=Vultr Stable (Latest Release Gateways) 2024-08-22T01:50:42.677538Z INFO firezone_tunnel::client: Activating resource name=.google.com address=.google.com sites=AWS Dev (Gateways track `main`) 2024-08-22T01:50:42.677632Z INFO firezone_tunnel::client: Activating resource name=.fastmail.com address=**.fastmail.com sites=AWS Dev (Gateways track `main`) 2024-08-22T01:50:42.677682Z INFO firezone_tunnel::client: Activating resource name=speed.cloudflare.com address=speed.cloudflare.com sites=Vultr Stable (Latest Release Gateways) 2024-08-22T01:50:42.678212Z INFO snownet::node: Added new TURN server rid=b6fc4d73-9c8e-44df-a941-da7d2134cb70 address=Dual { v4: 34.40.133.55:3478, v6: [2600:1900:40b0:1504:0:97::]:3478 } 2024-08-22T01:50:42.678322Z INFO snownet::node: Added new TURN server rid=c818b11a-d0cc-4f2a-bb88-473d8298a885 address=Dual { v4: 34.81.229.132:3478, v6: [2600:1900:4030:b0d9:0:9b::]:3478 } 2024-08-22T01:50:42.678365Z INFO connlib_client_shared::eventloop: Firezone Started! ``` After this, nothing will happen other than receiving messages via from the portal or the client app. Related: #6382. Related: #6385.	2024-08-22 04:15:31 +00:00
Thomas Eizinger	d399e65246	build(deps): bump tokio-tungstenite to 0.23 (#5509 ) With the upgrade to 0.23, `tokio-tungstenite` pulls in `rustls` 0.27 which supports multiple crypto providers. By default, this uses the `aws-lc-crypto` provider. The previous default was `ring`. This PR bumps the necessary versions and installs the `ring` crypto provider at the beginning of each application, before connlib starts. We try and do this as early as possible to make it obvious that it only needs to happen once per process. Resolves: #5380.	2024-08-15 06:02:17 +00:00
Thomas Eizinger	7642f37d56	refactor: thin out `connlib-shared` (#6256 ) Most of `connlib-shared` exists only for historical reasons. The `Tunnel` has since been decoupled from the `Callbacks` and most error variants on `ConnlibError` are not actually used. This allows us to move a few things around and trim down `ConnlibError` to just the variants that actually cause a call to `on_disconnect`. Moving everything related to `proptest`s to `firezone-tunnel` also requires us to delete the specialisation for printing IDs in a shorter format during the tests. That is a bit unfortunate but was always kind of a hack. I'd rather make progress on getting rid of `connlib-shared` though and perhaps re-introduce that feature once the messages are fully moved into the tunnel. Related: #4470.	2024-08-12 22:57:06 +00:00
Thomas Eizinger	0abbf6bba9	refactor(rust): inline `http-health-check` crate into `bin-shared` (#6258 ) Now that we have the `bin-shared` crate, it is easy to move the health-check functionality into there. That allows us to get rid of a crate which makes navigating the workspace a bit easier.	2024-08-12 16:44:52 +00:00
Thomas Eizinger	bed625a312	chore(rust): make logging more ergonomic (#6237 ) Setting up a logger is something that pretty much every entrypoint needs to do, be it a test, a shared library embedded in another app or a standalone application. Thus, it makes sense to introduce a dedicated crate that allows us to bundle all the things together, how we want to do logging. This allows us to introduce convenience functions like `firezone_logging::test` which allow you to construct a logger for a test as a one-liner. Crucially though, introducing `firezone-logging` gives us a place to store a default log directive that silences very noisy crates. When looking into a problem, it is common to start by simply setting the log-filter to `debug`. Without further action, this floods the output with logs from crates like `netlink_proto` on Linux. It is very unlikely that those are the logs that you want to see. Without a preset filter, the only alternative here is to explicitly turn off the log filter for `netlink_proto` by typing something like `RUST_LOG=netlink_proto=off,debug`. Especially when debugging issues with customers, this is annoying. Log filters can be overridden, i.e. a 2nd filter that matches the exact same scope overrides a previous one. Thus, with this design it is still possible to activate certain logs at runtime, even if they have silenced by default. I'd expect `firezone-logging` to attract more functionality in the future. For example, we want to support re-loading of log-filters on other platforms. Additionally, where logs get stored could also be defined in this crate. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-08-10 05:17:03 +00:00
Thomas Eizinger	a87728b791	chore: remove `connlib-shared` dependency from `bin-shared` (#6229 ) The `firezone-bin-shared` crate is meant to house non-tunnel related things. That allows it to compile in parallel to everything else. It currently only depends on `connlib-shared` to access the `DEFAULT_MTU` constant. We can remove that by requiring the MTU as a ctor parameter of `TunDeviceManager`. A longer write-up of the intended dependency structure is in #4470.	2024-08-10 03:58:10 +00:00
Gabi	5841f297a5	fix(gateway): prevent routing loops (#6096 ) In some weird conditions there might be routing loops in the gateway too, so this fixes it and it doesn't do any harm. Could be the cause behind [these logs](https://github.com/firezone/firezone/issues/6067#issuecomment-2259081958)	2024-07-30 22:29:38 +00:00
Thomas Eizinger	c6b576d1b1	fix(gateway): ignore non-client packets (#6086 ) On the gateway, the only packets we are interested in receiving on the TUN device are the ones destined for clients. To achieve this, we specifically set routes for the reserved IP ranges on our interface. Multicast packets as such as MLDV2 get sent to all packets and cause unnecessary noise in our logs. Thus, as a defense-in-depth measure, we drop all packets outside of the IP ranges reserved for our clients.	2024-07-30 06:34:36 +00:00
Reactor Scram	05e3a38701	refactor(bin-shared): remove CommonArgs (#6068 ) Closes #6025 It was only used in the Gateway, so we inline it there and remove `clap` as a dep for ~~that crate~~ `bin-shared`	2024-07-26 21:48:09 +00:00
Thomas Eizinger	50d6b865a1	refactor(connlib): move `Tun` implementations out of `firezone-tunnel` (#5903 ) The different implementations of `Tun` are the last platform-specific code within `firezone-tunnel`. By introducing a dedicated crate and a `Tun` trait, we can move this code into (platform-specific) leaf crates: - `connlib-client-android` - `connlib-client-apple` - `firezone-bin-shared` Related: #4473. --------- Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>	2024-07-24 01:10:50 +00:00
Thomas Eizinger	67ffa7017e	fix(connlib): make iteration of maps and sets deterministic (#5943 ) For `tunnel_test`, it is very important that each execution of a set of state transitions is completely deterministic, otherwise the shrinking behaviour does not work. Iterating over `HashMap` and `HashSet` is non-deterministic. To fix this, we convert several maps and sets to `BTreeMap`s and `BTreeSet`s.	2024-07-22 21:35:39 +00:00
Thomas Eizinger	da52c66023	refactor(clients): init `PhoenixChannel` in upper layers (#5884 ) This represents a step towards #3837. Eventually, we'd like the abstractions of `Session` and `Eventloop` to go away entirely. For that, we need to thin them out. The introduction of `ConnectArgs` was already a hint that we are passing a lot of data across layers that we shouldn't. To avoid that, we can simply initialise `PhoenixChannel` earlier and thus each callsite can specify the desired configuration directly. I've left `ConnectArgs` intact to keep the diff small.	2024-07-18 02:08:38 +00:00
Thomas Eizinger	58db5f0639	refactor(connlib): remove `Callbacks` from `Tunnel` (#5885 ) Following the removal of the return type from the callback functions in #5839, we can now move the use of the `Callbacks` one layer up the stack and decouple them entirely from the `Tunnel`. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Gabi <gabrielalejandro7@gmail.com>	2024-07-16 21:00:40 +00:00
Gabi	5b0aaa6f81	fix(connlib): protect all sockets from routing loops (#5797 ) Currently, only connlib's UDP sockets for sending and receiving STUN & WireGuard traffic are protected from routing loops. This is was done via the `Sockets::with_protect` function. Connlib has additional sockets though: - A TCP socket to the portal. - UDP & TCP sockets for DNS resolution via hickory. Both of these can incur routing loops on certain platforms which becomes evident as we try to implement #2667. To fix this, we generalise the idea of "protecting" a socket via a `SocketFactory` abstraction. By allowing the different platforms to provide a specialised `SocketFactory`, anything Linux-based can give special treatment to the socket before handing it to connlib. As an additional benefit, this allows us to remove the `Sockets` abstraction from connlib's API again because we can now initialise it internally via the provided `SocketFactory` for UDP sockets. --------- Signed-off-by: Gabi <gabrielalejandro7@gmail.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-07-16 00:40:05 +00:00
Thomas Eizinger	a4a8221b8b	refactor(connlib): explicitly initialise `Tun` (#5839 ) Connlib's routing logic and networking code is entirely platform agnostic. The only platform-specific bit is how we interact with the TUN device. From connlib's perspective though, all it needs is an interface for reading and writing. How the device gets initialised and updated is client-business. For the most part, this is the same on all platforms: We call callbacks and the client updates the state accordingly. The only annoying bit here is that Android recreates the TUN interface on every update and thus our old file descriptor is invalid. The current design works around this by returning the new file descriptor on Android. This is a problematic design for several reasons: - It forces the callback handler to finish synchronously, and halting connlib until this is complete. - The synchronous nature also means we cannot replace the callbacks with events as events don't have a return value. To fix this, we introduce a new `set_tun` method on `Tunnel`. This moves the business of how the `Tun` device is created up to the client. The clients are already platform-specific so this makes sense. In a future iteration, we can move all the various `Tun` implementations all the way up to the client-specific crates, thus co-locating the platform-specific code. Initialising `Tun` from the outside surfaces another issue: The routes are still set via the `Tun` handle on Windows. To fix this, we introduce a `make_tun` function on `TunDeviceManager` in order for it to remember the interface index on Windows and being able to move the setting of routes to `TunDeviceManager`. This simplifies several of connlib's APIs which are now infallible. Resolves: #4473. --------- Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com> Co-authored-by: conectado <gabrielalejandro7@gmail.com>	2024-07-12 23:54:15 +00:00
Thomas Eizinger	960ce80680	refactor(connlib): move `TunDeviceManager` into `firezone-bin-shared` (#5843 ) The `TunDeviceManager` is a component that the leaf-nodes of our dependency tree need: the binaries. Thus, it is misplaced in the `connlib-shared` crate which is at the very bottom of the dependency tree. This is necessary to allow the `TunDeviceManager` to actually construct a `Tun` (which currently lives in `firezone-tunnel`). Related: #5839. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-07-11 23:42:33 +00:00
Thomas Eizinger	08182913a5	refactor(connlib): remove `CidrV4` and `CidrV6` types from callbacks (#5842 ) These are only necessary for the Android and Apple client. Other clients should not need to bother with these custom types. Required-for: #5843.	2024-07-11 14:25:26 +00:00
Thomas Eizinger	96536a23cf	refactor(connlib): ignore relays per connection (#5631 ) In a previous design of firezone, relays used to be scoped to a certain connection. For a while now, this constraint has been lifted and all connections can use all relays. A related, outdated concern is the idea of STUN-only servers. Those also used to be assigned on a per-connection basis. By removing any use of per-connection relays and STUN-only servers, the entire `StunBinding` concept is unused code and can thus be deleted. To push this over the finish line, the `snownet-tests` which test the hole-punching functionality needed to be slightly adapted to make use of the more recently introduced API `Node::update_relays`. Resolves: #4749.	2024-06-29 02:36:17 +00:00
Thomas Eizinger	38275ecad0	refactor(gateway): extract fn for update-device task (#5581 ) Follow-up feedback from #5512.	2024-06-29 01:27:23 +00:00
Thomas Eizinger	aadb045b27	chore(connlib): batch together sending of ICE candidates (#5616 ) Currently, we are sending each ICE candidate individually from the client to the gateway and vice versa. This causes a slight delay as to when each ICE candidate gets added on the remote ICE agent. As a result, they all start being tested with a slight offset which causes "endpoint hopping" whenever a connection expires as they expire just after each other. In addition, sending multiple messages to the portal causes unnecessary load when establishing connections. Finally, with #5283 we started not adding the server-reflexive candidate to the local ICE agent. Because we talk to multiple relays, we detect the same server-reflexive candidate multiple times if we are behind a non-symmetric NAT. Not adding the server-reflexive candidate to the ICE agent mitigated our de-duplication strategy here which means we currently send the same candidate multiple times to a peer, causing additional, unnecessary load. All of this can be mitigated by batching together all our ICE candidates together into one message. Resolves: #3978.	2024-06-28 02:04:31 +00:00
Thomas Eizinger	1b5076fa57	fix(gateway): handle `init` messages during operation (#5512 ) Currently, the gateway only handles an `init` message on startup. For clients, we handle `init` messages also during operation so it only makes sense to do the same thing for gateways. This allows us to remove some old code from `phoenix_channel`. In particular, the `init` function which used to wait for the `init` message before continuing. In https://github.com/firezone/firezone/pull/4594, we refactored `phoenix-channel` to reconnect internally on errors. As a result, the `connect` function became synchronous and no longer needed an `async` context. At the time, the gateway wasn't updated to make use of this. We can now simplify the gateway code and resolve the outstanding TODO of handling `init` messages during operation.	2024-06-26 00:11:07 +00:00
Reactor Scram	28378fe24e	refactor(headless-client): remove FIREZONE_PACKAGE_VERSION (#5487 ) Closes #5481 With this, I can connect to the staging portal without a build.rs or any extra env var setup <img width="387" alt="image" src="https://github.com/firezone/firezone/assets/13400041/9c080b36-3a76-49c7-b706-20723697edc7"> ```[tasklist] ### Next steps - [x] Split out a refactor PR for `ConnectArgs` (#5488) - [x] Try doing this for other Clients - [x] Check Gateway - [x] Check Tauri Client - [x] Change to `app_version` - [x] Open for review - [ ] Use `option_env` so that `FIREZONE_PACKAGE_VERSION` can still override the Cargo.toml version for local testing - [ ] Check Android Client - [ ] Check Apple Client ``` --------- Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-06-21 23:06:41 +00:00
Gabi	2ea6a5d07e	feat(gateway): NAT & mangling for DNS resources (#5354 ) As part of #4994, the IP translation and mangling of packets to and from DNS resources is moved to the gateway. This PR represents the "gateway-half" of the required changes. Eventually, the client will send a list of proxy IPs that it assigned for a certain DNS resource. The gateway assigns each proxy IP to a real IP and mangles outgoing and incoming traffic accordingly. There are a number of things that we need to take care of as part of that: - We need to implement NAT to correctly route traffic. Our NAT table maps from source port* and destination IP to an assigned port* and real IP. We say port* because that is only true for UDP and TCP. For ICMP, we use the identifier. - We need to translate between IPv4 and IPv6 in case a DNS resource e.g. only resolves to IPv6 addresses but the client gave out an IPv4 proxy address to the application. This translation is was added in #5364 and is now being used here. This PR is backwards-compatible because currently, clients don't send any IPs to the gateway. No proxy IPs means we cannot do any translation and thus, packets are simply routed through as is which is what the current clients expect. --------- Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-06-19 01:15:27 +00:00
Thomas Eizinger	c4e608bd14	fix(gateway): ensure DNS resolution times out before connection (#5419 ) When we attempt to establish a connection to a gateway for a DNS resource, the gateway must resolve the requested domain name before it can accept the connection. Currently, this timeout is set to 60s which is much longer than the client's connection timeout. DNS resolution is typically a very fast protocol so reducing this timeout to 5s should be safe. In addition, we add a compile-time assertion that this timeout must be less than the client's connection timeout. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-06-18 22:08:49 +00:00
Reactor Scram	deefabd8f8	refactor(firezone-tunnel): move routes and DNS control out of connlib and up to the Client (#5111 ) Refs #3636 (This pays down some of the technical debt from Linux DNS) Refs #4473 (This partially fulfills it) Refs #5068 (This is needed to make `FIREZONE_DNS_CONTROL` mandatory) As of dd6421: - On both Linux and Windows, DNS control and IP setting (i.e. `on_set_interface_config`) both move to the Client - On Windows, route setting stays in `tun_windows.rs`. Route setting in Windows requires us to know the interface index, which we don't know in the Client code. If we could pass opaque platform-specific data between the tunnel and the Client it would be easy. - On Linux, route setting moves to the Client and Gateway, which completely removes the `worker` task in `tun_linux.rs` - Notifying systemd that we're ready moves up to the headless Client / IPC service ```[tasklist] ### Before merging / notes - [x] Does DNS roaming work on Linux on `main`? I don't see where it hooks up. I think I only set up DNS in `Tun::new` (Yes, the `Tun` gets recreated every time we reconfigure the device) - [x] Fix Windows Clients - [x] Fix Gateway - [x] Make sure connlib doesn't get the DNS control method from the env var (will be fixed in #5068) - [x] De-dupe consts - [ ] ~~Add DNS control test~~ (failed) - [ ] Smoke test Linux - [ ] Smoke test Windows ```	2024-06-03 14:32:08 +00:00
Gabi	b3d2059cad	chore(connlib): split allowed_ips into ipv4 and ipv6 in `ClientOnGateway` (#5160 ) To encode that clients always have both ipv4 and ipv6 and they are the only allowed source ips for any given client, into the type, we split those into their specific fields in the `ClientOnGateway` struct and update tests accordingly. Furthermore, these will be used for the DNS refactor for ipv6-in-ipv4 and ipv4-in-ipv6 to set the source ip of outgoing packets, without having to do additional routing or mappings. There will be more notes on this on the corresponding PR #5049 . --------- Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-05-30 05:51:44 +00:00
Gabi	361aafb746	chore(connlib): upgrade domain version from 0.9 to 0.10 (#5028 )	2024-05-20 20:54:22 +00:00
Gabi	c46967e1d6	fix(connlib): resource filter deserialization (#4910 ) There was an error on how resource filters were deserialized in the gateway: * we always assumed that there would be the ports included but the portal sends no port down when the "all" range is allowed * also we didn't support the resource_updated message, this fixes it, and resources allow-list can be changes in-flight	2024-05-08 00:16:06 +00:00
Gabi	68ece0a940	feat(connlib): traffic filtering (#4779 ) This implements traffic filtering on the gateway. Filters are set on the portal, per-resource, in an allow-list manner. If no filters exist for a given resource all packets are allowed, otherwise only packets that matches port/protocol for the filters are allowed, otherwise they are dropped. Filters can be either TCP, UDP or ICMP. For the first 2 multiple ports can be given. Furthermore, multiple filters can exists for the same resource. To be able to add and remove filters with the same IP/CIDR we keep around the whole list of filters for any given peer using an ID map and recalculate the IP each time something is added is removed. This allows us to remove filters and simply recalculate the allowlist for each IP. Furthermore, for any IP, all rules apply, meaning if there are multiple IPs that apply for a resource all port/protocol combinations for that IP will apply. This works well right now for DNS resources, since access is requested by DNS name, then the resource for that DNS name will arrive at the gateway, and the port filtering will apply given that resource(and any other resource with the same IP). However, since the client has no idea of the filters, it can't request the resource access based on the port/protocol combination and we are still using the most specific("longest match") IP. This will mean that for overlapping CIDR resources, only the rules for the most specific will be used, even if the gateway supports applying them all, since it will not have the other resources. This will be solved in #4789. It can also lead to some weirdness, let's say that you have 10.0.0.0/24 -> TCP/80 and 10.0.0.0/16 -> TCP/443 for your user. The user tries to access 10.0.0.1, and will then only be allowed port 80. At some point the user might access 10.1.0.1 and it will be allowed port 443. But from that point on, the user will be allowed to access 80 and 443 in 10.0.0.1 because the rules correctly work on the gateway, the problem is the client side. Again, #4789 will fix this. Left for next PRs (in tentative order!): - #4792 - #4789 Depends on: #4773. Resolves #2030. Resolves #4791. --------- Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>	2024-05-07 19:47:49 +00:00
Thomas Eizinger	51089b89e7	feat(connlib): smoothly migrate relayed connections (#4568 ) Whenever we receive a `relays_presence` message from the portal, we invalidate the candidates of all now disconnected relays and make allocations on the new ones. This triggers signalling of new candidates to the remote party and migrates the connection to the newly nominated socket. This still relies on #4613 until we have #4634. Resolves: #4548. --------- Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-04-20 06:16:35 +00:00
Thomas Eizinger	0f7e80642d	chore(snownet): don't update remote socket from WG activity (#4615 ) Resolves: #4613.	2024-04-20 00:15:19 +00:00
Thomas Eizinger	bfe07d7ebd	chore(connlib): upsert relays from "init" message (#4567 ) This is another step towards #4548. The portal now includes a list of relays as part of the "init" message. Any time we receive an "init", we will now upsert those relays based on their ID. This requires us to change our internal bookkeeping of relays from indexing them by address to indexing by ID. To ensure that this works correctly, the unit tests are rewritten to use the new `upsert_relays` API. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-04-15 21:30:49 +00:00
Thomas Eizinger	be1a719e2c	chore(relay): perform graceful shutdown upon receiving SIGTERM (#4552 ) Upon receiving a SIGTERM, we immediately disconnect from the websocket connection to the portal and set a flag that we are shutting down. Once we are disconnected from the portal and no longer have an active allocations, we exit with 0. A repeated SIGTERM signal will interrupt this process and force the relay to shutdown. Disconnecting from the portal will (eventually) trigger a message to clients and gateways that this relay should no longer be used. Thus, depending on the timeout our supervisor has configured after sending SIGTERM, the relay will continue all TURN operations until the number of allocations drops to 0. Currently, we also allow clients to make new allocations and refreshing existing allocations. In the future, it may make sense to implement a dedicated status code and refuse `ALLOCATE` and `REFRESH` messages whilst we are shutting down. Related: #4548. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-04-12 08:45:08 +00:00
Thomas Eizinger	5e871d955b	chore(gateway): remove unused derives and messages (#4563 )	2024-04-10 09:18:59 +00:00
Thomas Eizinger	03d89fec50	chore(relay): fail health-check with 400 on being partitioned for > 15min (#4553 ) During the latest relay outage, we failed to send heartbeats to the portal because we were busy-looping and never got to handle messages or timers for the portal. To mitigate this or similar bugs, we update an `Instant` every time we send a heartbeat to the portal. In case we are actually network-partitioned, this will cause the health-check to fail after 15 minutes. This value is the same as the partition timeout for the portal connection itself[^1]. Very likely, we will never see a relay being shutdown because of a failing health check in this case as it would have already shut itself down. An exception to this are bugs in the eventloop where we fail to interact with the portal at all. Resolves: #4510. [^1]: Previously, this was unlimited.	2024-04-10 02:05:59 +00:00
Thomas Eizinger	a8201abd6e	chore(connlib): remove stale code (#4562 ) Reducing the number of crates as outlined in #4470 would help with detecting this sort of unused code because we could make more things `pub(crate)` which allows the compiler to check whether code is actually used. Public API items are never subject to the dead-code analysis of the compiler because they could be used by other crates.	2024-04-10 02:12:59 +00:00
Thomas Eizinger	e169150ee7	fix(gateway): don't errenously suspend eventloop (#4486 ) Within the gateway's eventloop, we MUST only return `Poll::Pending` if `Waker`s are registered for anything that needs to happen. To ensure that, we MUST `loop` around our the calls to `poll()` to ensure we drain everything that is `Poll::Ready`. Only once all sub-state machines return `Poll::Pending`, we can return `Poll::Pending`.	2024-04-03 17:24:38 -06:00
Gabi	ee34621ee8	chore(connlib): unit tests for additional fields in messages (#4337 ) Fixes #4308	2024-03-28 02:14:02 +00:00
Jamil	228389882e	refactor(connlib): delay initialization of `Sockets` until we have a tokio runtime (#4286 ) Our sockets need to be initialized within a tokio runtime context. To achieve this, we don't actually initialize anything on `Sockets::new`. Instead, we call `rebind` within the constructor of `Tunnel` which already runs in a tokio context. Fixes: #4282 --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-03-25 22:51:35 +00:00
Thomas Eizinger	e628fa5d06	refactor(connlib): implement new FFI guidelines (#4263 ) This updates connlib to follow the new guidelines described in #4262. I only made the bare-minimum changes to the clients. With these changes `reconnect` should only be called when the network interface actually changed, meaning clients have to be updated to reflect that.	2024-03-23 04:13:05 +00:00
Thomas Eizinger	8c1500d03e	chore(connlib): tidy up logs and docs (#4265 ) Wrong / outdated docs are worse than no docs. This PR removes some of these stale docs. We may add new docs at a later point again.	2024-03-23 00:52:24 +00:00
Thomas Eizinger	e8f2320d08	fix(gateway): answer with empty list of addresses on DNS resolution failure (#4266 ) Currently, a failure during DNS resolution results in the client hanging during the connection setup. Instead, we fall back to an empty list which results in an empty DNS query result for the client. That in turn will make most application consider the DNS request failed. As far as I know, we don't currently retry these DNS requests, meaning a user would have to sign-in and out again to fix this state. Whilst not ideal, I think this is a better behaviour and what we currently have where the initial connection just hangs.	2024-03-22 22:16:38 +00:00
Thomas Eizinger	2a46fce574	refactor(connlib): remove `Result` return values from callbacks (#4158 ) Currently, an error returned by `Tunnel::poll_next_event` is only logged. In other words, they are never fatal. This creates a tricky to understand relationship on what kind of errors should be returned from callbacks. Because connlib is used on multiple operating systems, it has no idea how fatal a particular error is. This PR removes all of these `Result` return values with the following consequences: - For Android, we now panic when a callback fails. This is a slight change in behaviour. I believe that previously, any exception thrown by a callback into Android was caught and returned as an error. Now, we panic because in the FFI layer, we don't have any information on how fatal the error is. For non-fatal errors, the Android app should simply not throw an exception. The panics will cause the connlib task to be shut down which triggers an `on_disconnect`. - For Swift, there is no behaviour change. The FFI layer already did not support `Result`s for those callbacks. I don't know how exceptions from Swift are translated across the FFI layer but there is no change to what we had before. - For the Tauri client: - I chose to log errors on ERROR level and continue gracefully for the DNS resolvers. - We panic in case the controller channel is full / closed. That should really never happen in practice though unless we are currently shutting down the app. Resolves: #4064.	2024-03-20 02:09:20 +00:00

1 2

94 Commits