firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-28 02:18:50 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	037a2e64b6	fix(connlib): attempt to detect runtime shutdown within TUN task (#7605 ) Reading and writing to the TUN device within `connlib` happens in a separate thread. The task running within these threads is connected to the rest of `connlib` via channels. When the application shuts down, these threads also need to exit. Currently, we attempt to detect this from within the task when these channels close. It appears that there is a race condition here because we first attempt to read from the TUN device before reading from the channels. We treat read & write errors on the TUN device as non-fatal so we loop around and attempt to read from it again, causing an infinite-loop and log spam. To fix this, we swap the order in which we evaluate the two concurrent tasks: The first task to be polled is now the channel for outbound packets and only if that one is empty, we attempt to read new packets from the TUN device. This is also better from a backpressure point of view: We should attempt to flush out our local buffers of already processed packets before taking on "new work". As a defense-in-depth strategy, we also attempt to detect the particular error from the tokio runtime when it is being shut down and exit the task. Resolves: #7601. Related: https://github.com/tokio-rs/tokio/issues/7056.	2025-01-05 20:41:24 +00:00
Thomas Eizinger	26824fb3c7	fix(gateway): check if we run with correct permissions (#7565 ) The gateway needs either the `CAP_NET_ADMIN` capability or run as `root` in order to access the TUN device as well as configure routes via `netlink`. Running without either leads to "Permission denied" errors at runtime. It is good to fail early in these kind of situations. By checking for this capability early on during startup, these should no longer surface later. As a bonus, we won't receive (unactionable) Sentry alerts. Resolves: #7559. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-12-29 21:45:56 +00:00
Thomas Eizinger	e7cc0e5eef	fix(linux): don't fail on unsupported IP version (#7583 ) Firezone always attempts to handle IPv4 and IPv6. On Linux systems without an IPv6 stack, attempts to add an IPv6 route may fail with "Not supported (os error 95)". We don't need the IPv6 routes on those systems as we will never receive IPv6 traffic. Therefore, we can safely ignore these errors and not log them.	2024-12-25 11:09:22 +00:00
Thomas Eizinger	1b04b0eb2b	fix(windows): don't warn on deleting non-existing route (#7507 ) Similarly as Linux (#7502), we don't want to log an error if we cannot delete a route that doesn't exist.	2024-12-13 21:09:09 +00:00
Thomas Eizinger	b5d6c27680	fix(linux): don't print error when removing non-existent route (#7502 ) We are already handling one case where we are trying to remove a route that doesn't exist. `ESRCH` is another variant of this error that manifests as "No such process". According to the Internet, this just means the route doesn't exist so we can bail out early here.	2024-12-13 04:53:22 +00:00
Thomas Eizinger	90cf191a7c	feat(linux): multi-threaded TUN device operations (#7449 ) ## Context At present, we only have a single thread that reads and writes to the TUN device on all platforms. On Linux, it is possible to open the file descriptor of a TUN device multiple times by setting the `IFF_MULTI_QUEUE` option using `ioctl`. Using multi-queue, we can then spawn multiple threads that concurrently read and write to the TUN device. This is critical for achieving a better throughput. ## Solution `IFF_MULTI_QUEUE` is a Linux-only thing and therefore only applies to headless-client, GUI-client on Linux and the Gateway (it may also be possible on Android, I haven't tried). As such, we need to first change our internal abstractions a bit to move the creation of the TUN thread to the `Tun` abstraction itself. For this, we change the interface of `Tun` to the following: - `poll_recv_many`: An API, inspired by tokio's `mpsc::Receiver` where multiple items in a channel can be batch-received. - `poll_send_ready`: Mimics the API of `Sink` to check whether more items can be written. - `send`: Mimics the API of `Sink` to actually send an item. With these APIs in place, we can implement various (performance) improvements for the different platforms. - On Linux, this allows us to spawn multiple threads to read and write from the TUN device and send all packets into the same channel. The `Io` component of `connlib` then uses `poll_recv_many` to read batches of up to 100 packets at once. This ties in well with #7210 because we can then use GSO to send the encrypted packets in single syscalls to the OS. - On Windows, we already have a dedicated recv thread because `WinTun`'s most-convenient API uses blocking IO. As such, we can now also tie into that by batch-receiving from this channel. - In addition to using multiple threads, this API now also uses correct readiness checks on Linux, Darwin and Android to uphold backpressure in case we cannot write to the TUN device. ## Configuration Local testing has shown that 2 threads give the best performance for a local `iperf3` run. I suspect this is because there is only so much traffic that a single application (i.e. `iperf3`) can generate. With more than 2 threads, the throughput actually drops drastically because `connlib`'s main thread is too busy with lock-contention and triggering `Waker`s for the TUN threads (which mostly idle around if there are 4+ of them). I've made it configurable on the Gateway though so we can experiment with this during concurrent speedtests etc. In addition, switching `connlib` to a single-threaded tokio runtime further increased the throughput. I suspect due to less task / context switching. ## Results Local testing with `iperf3` shows some very promising results. We now achieve a throughput of 2+ Gbit/s. ``` Connecting to host 172.20.0.110, port 5201 Reverse mode, remote host 172.20.0.110 is sending [ 5] local 100.80.159.34 port 57040 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 274 MBytes 2.30 Gbits/sec [ 5] 1.00-2.00 sec 279 MBytes 2.34 Gbits/sec [ 5] 2.00-3.00 sec 216 MBytes 1.82 Gbits/sec [ 5] 3.00-4.00 sec 224 MBytes 1.88 Gbits/sec [ 5] 4.00-5.00 sec 234 MBytes 1.96 Gbits/sec [ 5] 5.00-6.00 sec 238 MBytes 2.00 Gbits/sec [ 5] 6.00-7.00 sec 229 MBytes 1.92 Gbits/sec [ 5] 7.00-8.00 sec 222 MBytes 1.86 Gbits/sec [ 5] 8.00-9.00 sec 223 MBytes 1.87 Gbits/sec [ 5] 9.00-10.00 sec 217 MBytes 1.82 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 2.30 GBytes 1.98 Gbits/sec 22247 sender [ 5] 0.00-10.00 sec 2.30 GBytes 1.98 Gbits/sec receiver iperf Done. ``` This is a pretty solid improvement over what is in `main`: ``` Connecting to host 172.20.0.110, port 5201 [ 5] local 100.65.159.3 port 56970 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 90.4 MBytes 758 Mbits/sec 1800 106 KBytes [ 5] 1.00-2.00 sec 93.4 MBytes 783 Mbits/sec 1550 51.6 KBytes [ 5] 2.00-3.00 sec 92.6 MBytes 777 Mbits/sec 1350 76.8 KBytes [ 5] 3.00-4.00 sec 92.9 MBytes 779 Mbits/sec 1800 56.4 KBytes [ 5] 4.00-5.00 sec 93.4 MBytes 783 Mbits/sec 1650 69.6 KBytes [ 5] 5.00-6.00 sec 90.6 MBytes 760 Mbits/sec 1500 73.2 KBytes [ 5] 6.00-7.00 sec 87.6 MBytes 735 Mbits/sec 1400 76.8 KBytes [ 5] 7.00-8.00 sec 92.6 MBytes 777 Mbits/sec 1600 82.7 KBytes [ 5] 8.00-9.00 sec 91.1 MBytes 764 Mbits/sec 1500 70.8 KBytes [ 5] 9.00-10.00 sec 92.0 MBytes 771 Mbits/sec 1550 85.1 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 917 MBytes 769 Mbits/sec 15700 sender [ 5] 0.00-10.00 sec 916 MBytes 768 Mbits/sec receiver iperf Done. ```	2024-12-05 00:18:20 +00:00
Thomas Eizinger	4f92a0d7ca	refactor(gui-client): tidy up GUI controller code (#7444 ) This PR intends to be a pure refactoring, i.e. no behaviour change. It simplifies a few aspects of the GUI controller event-loop by getting rid of the `select!` macro. We also remove some indirection of the `gui_controller::Builder`.	2024-12-02 20:07:44 +00:00
Thomas Eizinger	0a6554122a	feat(connlib): utilise GSO for UDP sockets (#7210 ) ## Context At present, `connlib` sends UDP packets one at a time. Sending a packet requires us to make a syscall which is quite expensive. Under load, i.e. during a speedtest, syscalls account for over 50% of our CPU time [0]. In order to improve this situation, we need to somehow make use of GSO (generic segmentation offload). With GSO, we can send multiple packets to the same destination in a single syscall. The tricky question here is, how can we achieve having multiple UDP packets ready at once so we can send them in a single syscall? Our TUN interface only feeds us packets one at a time and `connlib`'s state machine is single-threaded. Additionally, we currently only have a single `EncryptBuffer` in which the to-be-sent datagram sits. ## 1. Stack-allocating encrypted IP packets As a first step, we get rid of the single `EncryptBuffer` and instead stack-allocate each encrypted IP packet. Due to our small MTU, these packets are only around 1300 bytes. Stack-allocating that requires a few memcpy's but those are in the single-digit % range in the terms of CPU time performance hit. That is nothing compared to how much time we are spending on UDP syscalls. With the `EncryptBuffer` out the way, we can now "freely" move around the `EncryptedPacket` structs and - technically - we can have multiple of them at the same time. ## 2. Implementing GSO The GSO interface allows you to pass multiple packets of the same length and for the same destination in a single syscall, meaning we cannot just batch-up arbitrary UDP packets. Counterintuitively, making use of GSO requires us to do more copying: In particular, we change the interface of `Io` such that "sending" a packet performs essentially a lookup of a `BytesMut`-buffer by destination and packet length and appends the payload to that packet. ## 3. Batch-read IP packets In order to actually perform GSO, we need to process more than a single IP packet in one event-loop tick. We achieve this by batch-reading up to 50 IP packets from the mpsc-channel that connects `connlib`'s main event-loop with the dedicated thread that reads and writes to the TUN device. These reads and writes happen concurrently to `connlib`'s packet processing. Thus, it is likely that by the time `connlib` is ready to process another IP packet, multiple have been read from the device and are sitting in the channel. Batch-processing these IP packets means that the buffers in our `GsoQueue` are more likely to contain more than a single datagram. Imagine you are running a file upload. The OS will send many packets to the same destination IP and likely max MTU to the TUN device. It is likely, that we read 10-20 of these packets in one batch (i.e. within a single "tick" of the event-loop). All packets will be appended to the same buffer in the `GsoQueue` and on the next event-loop tick, they will all be flushed out in a single syscall. ## Results Overall, this results in a significant reduction of syscalls for sending UDP message. In [1], we spend only a total of 16% of our CPU time in `udpv6_sendmsg` whereas in [0] (main), we spent a total of 34%. Do note that these numbers are relative to the total CPU time spent per program run and thus can't be compared directly (i.e. you cannot just do 34 - 16 and say we now spend 18% less time sending UDP packets). Nevertheless, this appears to be a great improvement. In terms of throughput, we achieve a ~60% improvement in our benchmark suite. That one is running on localhost though so it might not necessarily be reflect like that in a real network. [0]: https://share.firefox.dev/4hvoPju [1]: https://share.firefox.dev/4frhCPv	2024-12-02 01:09:44 +00:00
Thomas Eizinger	c6e7e6192e	build(rust): bump Rust to 1.83 (#7409 ) Rust 1.83 comes with a bunch of new lints for elidible lifetimes. Those also trigger in the generated code of `derivative`. That crate is actually unmaintained so we replace our usages of it with `derive_more`.	2024-11-29 01:04:06 +00:00
Thomas Eizinger	24f7ba530d	refactor(gui-client): add more context to connection failures (#7364 ) Adding more context to these errors makes it easier to identify, which of the operations fails. In addition, we remove some usages of the "log and return" anti-pattern to avoid duplicate reports of the same issue.	2024-11-18 18:16:16 +00:00
Thomas Eizinger	48ba2869a8	chore(rust): ban the use of `.unwrap` except in tests (#7319 ) Using the clippy lint `unwrap_used`, we can automatically lint against all uses of `.unwrap()` on `Result` and `Option`. This turns up quite a few results actually. In most cases, they are invariants that can't actually be hit. For these, we change them to `Option`. In other cases, they can actually be hit. For example, if the user supplies an invalid log-filter. Activating this lint ensures the compiler will yell at us every time we use `.unwrap` to double-check whether we do indeed want to panic here. Resolves: #7292.	2024-11-13 03:59:22 +00:00
Thomas Eizinger	ad4eea29ff	chore(rust): don't panic in fallible functions (#7298 ) "Just let it crash" is terrible advice for software that is shipped to end users. Where possible, we should use proper error handling and only fail the current function / task that is active, e.g. drop a particular packet instead of failing all of connlib. We more or less already do that. Activating the clippy lint `unwrap_in_result` surfaced a few more places where we panic despite being in a function that is fallible already. These cases can easily be converted to not panic and return an error instead.	2024-11-11 23:55:23 +00:00
Thomas Eizinger	e261cb3c27	chore: remove `git_version!` (#7270 ) Reading the Git version requires the entire Git repository to be present, including all tags. The tags are only created _after_ the artifact is being built, when we publish the release. Therefore, these tags are never included in the actual released binary. For Sentry, we use the `CARGO_PKG_VERSION` variable instead. This doesn't tell us whether somebody built a client from source and then used it so there could be some confusion in Sentry events. It is quite unlikely that this happens though so for the majority of Sentry alerts, this will give us the correct version. For the Android client, we also depend on the `GITHUB_SHA` env variable at compile-time. We do the same thing for the GUI client here. Resolves: #6925.	2024-11-07 22:56:17 +00:00
Thomas Eizinger	78ebad13ab	chore(rust): log more errors as `tracing::Value`s (#7208 ) Logging these as structured values gives us a better stacktrace in Sentry (assuming the errors themselves make proper use of defining an error-chain).	2024-11-05 14:36:47 +00:00
Thomas Eizinger	7213eb823d	fix(rust): fallback to `CARGO_PKG_VERSION` if git is unavailable (#7188 ) When building inside a docker container, like we do for the headless-client and gateway, the `.git` directory is not available. Thus, determining what our current version is fails and gets reported as "unknown". We are now also using this for Sentry which is not very helpful if all errors are categorised under the same version. In case somebody builds a gateway / client from source, we will have the full version available. Most users will use our docker containers though, meaning the version will only always be for a full release. Resolves: #7184.	2024-10-30 17:42:44 +00:00
Thomas Eizinger	73eebd2c4d	refactor(rust): consistently record errors as `tracing::Value` (#7104 ) Our logging library, `tracing` supports structured logging. This is useful because it preserves the more than just the string representation of a value and thus allows the active logging backend(s) to capture more information for a particular value. In the case of errors, this is especially useful because it allows us to capture the sources of a particular error. Unfortunately, recording an error as a tracing value is a bit cumbersome because `tracing::Value` is only implemented for `&dyn std::error::Error`. Casting an error to this is quite verbose. To make it easier, we introduce two utility functions in `firezone-logging`: - `std_dyn_err` - `anyhow_dyn_err` Tracking errors as correct `tracing::Value`s will be especially helpful once we enable Sentry's `tracing` integration: https://docs.rs/sentry-tracing/latest/sentry_tracing/#tracking-errors	2024-10-22 04:46:26 +00:00
Reactor Scram	9b93fc2a2c	fix(rust/client/windows): set our DNS resolvers on our interface (#6931 ) Closes #6777	2024-10-07 15:03:22 +00:00
Thomas Eizinger	4ae29c604c	fix(windows): only consider online adapters (#6810 ) When deciding which interface we are going to use for connecting to the portal API, we need to filter through all adapters on Windows and exclude our own TUN adapter to avoid routing loops. In addition, we also need to filter for only online adapters, otherwise we might pick one that is not actually routable. Resolves: #6802.	2024-09-25 21:19:15 +00:00
Thomas Eizinger	a9f515a453	chore(rust): use `#[expect]` instead of `#[allow]` (#6692 ) The `expect` attribute is similar to `allow` in that it will silence a particular lint. In addition to `allow` however, `expect` will fail as soon as the lint is no longer emitted. This ensures we don't end up with stale `allow` attributes in our codebase. Additionally, it provides a way of adding a `reason` to document, why the lint is being suppressed.	2024-09-16 13:51:12 +00:00
Reactor Scram	54b6222722	fix(client/windows): set MTU even if IPv6 is disabled (#6681 ) Refs #6547, this fixes a similar error message but it's not the same exact issue. When IPv6 is disabled on a system, our call to set the MTU was failing with error code 0x80070490. This patch allows some of the MTU-related syscalls to fail with a warning log. To replicate the issue, run this command to set a registry value to disable IPv6, then reboot the system: `reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip6\Parameters" /v DisabledComponents /t REG_DWORD /d 255 /f` ```[tasklist] - [x] Update changelog - [x] Apply PR feedback ``` --------- Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-09-13 17:43:21 +00:00
Reactor Scram	5a44151bba	test(bin-shared): improve network notifier test (#6676 ) On Windows, the network notifier always notifies once at startup. We make the DNS notifier and Linux match this behavior, and we assert it in the unit test. Part of a yak shave towards removing Tauri.	2024-09-13 14:53:13 +00:00
Reactor Scram	ece8f7a5b7	test(rust/client): add unit test for DNS and network change notifiers (#6635 ) Part of a yak shave towards removing Tauri. --------- Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-09-12 15:59:13 +00:00
Reactor Scram	f507a01f9f	fix(windows): prevent routing loops for TCP connections (#6584 ) In #6032, we attempted to fix routing loops for Windows and did so successfully for UDP packets. For TCP sockets, we believed that binding the socket to an interface is enough to prevent routing loops. This assumptions is wrong. > On Windows, a call to bind() affects card selection only incoming traffic, not outgoing traffic. > > Thus, on a client running in a multi-homed system (i.e., more than one interface card), it's the network stack that selects the card to use, and it makes its selection based solely on the destination IP, which in turn is based on the routing table. A call to bind() will not affect the choice of the card in any way. On most of our testing machines, this problem didn't surface but it turns out that on some machines, especially with WiFi cards there is a conflict between the routes added on the system. In particular, with the Internet resource active, we also add a catch-all route that we _want_ to have the most priority, i.e. Windows SHOULD send all traffic to our TUN device. Except for traffic that we generate, like TCP connections to the portal or UDP packets sent to gateways, relays or DNS servers. It appears that on some systems, mostly with Ethernet adapters, Windows picks the "correct" interface for our socket and sends traffic via that but on other systems, it doesn't. TCP sockets are only used for the WebSocket connection to the portal. Without that one, Firezone completely stops working because we can't send any control messages. To reliably fix this issue, we need to add a dedicated route for the target IP of each TCP socket that is more specific than the Internet resource route (`0.0.0.0/0`) but otherwise identical. We do this as part of creating a new TCP socket. This route is for the _default_ interface and thus, doesn't get automatically removed when Firezone exits. We implement a RAII guard that attempts to drop the route on a best-effort basis. Despite this RAII guard, this route can linger around in case Firezone is being forced to exit or exits in otherwise unclean ways. To avoid lingering routes, we always delete all routing table entries matching the IP of the portal just before we are about to add one. Fixes: #6591. [0]: https://forums.codeguru.com/showthread.php?487139-Socket-binding-with-routing-table&s=a31637836c1bf7f0bc71c1955e47bdf9&p=1891235#post1891235 --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Foo Bar <foo@bar.com> Co-authored-by: conectado <gabrielalejandro7@gmail.com>	2024-09-05 06:17:28 +00:00
Reactor Scram	df3067a8f8	chore(rust/windows): more detailed error log for `wintun::Adapter::create` (#6596 ) Without this we don't log the `std::io::ErrorKind`, which is useful to know. Refs #6547	2024-09-05 02:43:25 +00:00
Thomas Eizinger	e3688a475e	refactor(connlib): only buffer 1 unsent packet if socket is busy (#6563 ) Currently, we buffer UDP packets whenever the socket is busy and try to flush them out at a later point. This requires allocations and is tricky to get right. In order to solve both of these problems, we refactor `snownet` to return us an `EncryptedPacket` instead of a `Transmit`. An `EncryptedPacket` is an indirection-abstraction that can be turned into a `Transmit` given an `EncryptBuffer`. This combination of types allows us to hold on to the `EncryptedPacket` (which does not contain any references itself) in the `io` component whilst we are waiting for the socket to be ready to send again. This means we will immediately suspend the event loop in case the socket is no longer ready for sending and resend the datagram in the `EncryptBuffer` once we get re-polled.	2024-09-04 16:59:33 +00:00
Reactor Scram	d7810ef9c0	chore(rust/gui-client/windows): update `windows` to 0.58 (#6565 ) Updates `windows` crates to 0.58 without the bug in #6551. Supersedes #6556. The bug was calling `try_send()?` on an MPSC channel of capacity 1, which would bail out of the worker thread if we got 2 DNS change notifications faster than the controller task / thread could process the first one.	2024-09-03 04:18:46 +00:00
Reactor Scram	1505b699e5	fix(client/windows): Revert "chore(rust/gui-client/windows): update `windows` to 0.58 (#6506 )" (#6555 ) This reverts commit `d8f25f9bf8`. #6506 broke the Clients and I guess I didn't do any manual smoke test, so I didn't catch it. I have leads for a permanent fix in #6551 but I don't want to leave `main` broken since it will screw up bisects.	2024-09-02 20:25:10 +00:00
Reactor Scram	d8f25f9bf8	chore(rust/gui-client/windows): update `windows` to 0.58 (#6506 ) Supersedes #5913 This required a big refactor because `HANDLE` is no longer `Send` and was never supposed to be. So we add a worker thread for listening to DNS changes, since that requires us to hold a `HANDLE` across `await` points and I couldn't find any simpler way to do it. I could add integration tests for this in a future PR that prove the notifiers work by poking the registry or setting DNS servers and seeing if we pick up the changes on time. But setting DNS servers without the tunnel up may be tricky, so I left it out of scope for this PR. ```[tasklist] - [x] Fix force-kill bug ```	2024-09-02 18:00:45 +00:00
Thomas Eizinger	4c30d78cda	fix: refer to correct tag in git-version (#6334 ) The output of `git describe` always refers to the last tag that it can find. This leads to confusing versions being printed such as: ``` 2024-08-19T00:24:08.983891Z INFO firezone_headless_client: arch="x86_64" git_version="gateway-1.1.5-30-gf82fee162-modified" ``` Note that this is code running in the headless-client and it refers to the gateway tag. Whilst not wrong from git's PoV, it is certainly confusing. We can fix this by providing a glob-pattern to `git describe` via `--match`. This makes git ignore any other tags and print a version identifier that refers to the current program: ``` 2024-08-19T00:39:48.634191Z INFO firezone_headless_client: arch="x86_64" git_version="headless-client-1.1.7-31-ga08a3411d-modified" ```	2024-08-19 22:42:15 +00:00
Thomas Eizinger	c51cf096ae	build(rust): avoid unnecessary rebuilds (#6321 ) Parsing the current Git version within `firezone-bin-shared` means this crate (and all its dependents) need to be rebuilt everytime one makes a commit, even if none of the code actually changes. To avoid this whilst still allowing `firezone-bin-shared` to export a useful, shared function, we export a macro instead that can be called from the respective crates that need the GIT version. This means only those binaries will be marked as dirty and rebuilds of e.g. unit tests don't need to rebuild these workspace crates.	2024-08-16 15:30:04 +00:00
Thomas Eizinger	0abbf6bba9	refactor(rust): inline `http-health-check` crate into `bin-shared` (#6258 ) Now that we have the `bin-shared` crate, it is easy to move the health-check functionality into there. That allows us to get rid of a crate which makes navigating the workspace a bit easier.	2024-08-12 16:44:52 +00:00
Thomas Eizinger	bed625a312	chore(rust): make logging more ergonomic (#6237 ) Setting up a logger is something that pretty much every entrypoint needs to do, be it a test, a shared library embedded in another app or a standalone application. Thus, it makes sense to introduce a dedicated crate that allows us to bundle all the things together, how we want to do logging. This allows us to introduce convenience functions like `firezone_logging::test` which allow you to construct a logger for a test as a one-liner. Crucially though, introducing `firezone-logging` gives us a place to store a default log directive that silences very noisy crates. When looking into a problem, it is common to start by simply setting the log-filter to `debug`. Without further action, this floods the output with logs from crates like `netlink_proto` on Linux. It is very unlikely that those are the logs that you want to see. Without a preset filter, the only alternative here is to explicitly turn off the log filter for `netlink_proto` by typing something like `RUST_LOG=netlink_proto=off,debug`. Especially when debugging issues with customers, this is annoying. Log filters can be overridden, i.e. a 2nd filter that matches the exact same scope overrides a previous one. Thus, with this design it is still possible to activate certain logs at runtime, even if they have silenced by default. I'd expect `firezone-logging` to attract more functionality in the future. For example, we want to support re-loading of log-filters on other platforms. Additionally, where logs get stored could also be defined in this crate. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-08-10 05:17:03 +00:00
Thomas Eizinger	a87728b791	chore: remove `connlib-shared` dependency from `bin-shared` (#6229 ) The `firezone-bin-shared` crate is meant to house non-tunnel related things. That allows it to compile in parallel to everything else. It currently only depends on `connlib-shared` to access the `DEFAULT_MTU` constant. We can remove that by requiring the MTU as a ctor parameter of `TunDeviceManager`. A longer write-up of the intended dependency structure is in #4470.	2024-08-10 03:58:10 +00:00
Reactor Scram	68d934ee59	refactor(headless-client): remove unnecessary layering (#6211 ) Refs #5754 The IPC service is still layered, but moving it around is more difficult than moving the headless Client.	2024-08-09 14:10:21 +00:00
Reactor Scram	5eb2bba47b	feat(headless-client): use `systemd-resolved` DNS control by default (#6163 ) Closes #5063, supersedes #5850 Other refactors and changes made as part of this: - Adds the ability to disable DNS control on Windows - Removes the spooky-action-at-a-distance `from_env` functions that used to be buried in `tunnel` - `FIREZONE_DNS_CONTROL` is now a regular `clap` argument again --------- Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-08-06 18:16:51 +00:00
Gabi	5841f297a5	fix(gateway): prevent routing loops (#6096 ) In some weird conditions there might be routing loops in the gateway too, so this fixes it and it doesn't do any harm. Could be the cause behind [these logs](https://github.com/firezone/firezone/issues/6067#issuecomment-2259081958)	2024-07-30 22:29:38 +00:00
Reactor Scram	e6cbb5fa8a	feat(gui-client/linux): network roaming (#5978 ) Closes #5846 Will be moved down to the IPC service eventually. The goal for connection roaming is not for totally transparent "Change Wi-Fi networks without dropping SSH" handoffs, but just for Firezone to re-connect itself as quickly as possible so that everything above us can re-connect as quickly as it times out, and won't be hung up with a broken tunnel.	2024-07-30 16:01:45 +00:00
Gabi	c3a45f53df	fix(connlib): prevent routing loops on windows (#6032 ) In `connlib`, traffic is sent through sockets via one of three ways: 1. Direct p2p traffic between clients and gateways: For these, we always explicitly set the source IP (and thus interface). 2. UDP traffic to the relays: For these, we let the OS pick an appropriate source interface. 3. WebSocket traffic over TCP to the portal: For this too, we let the OS pick the source interface. For (2) and (3), it is possible to run into routing loops, depending on the routes that we have configured on the TUN device. In Linux, we can prevent routing loops by marking a socket [0] and repeating the mark when we add routes [1]. Packets sent via a marked socket won't be routed by a rule that contains this mark. On Android, we can do something similar by "protecting" a socket via a syscall on the Java side [2]. On Windows, routing works slightly different. There, the source interface is determined based on a computed metric [3] [4]. To prevent routing loops on Windows, we thus need to find the "next best" interface after our TUN interface. We can achieve this with a combination of several syscalls: 1. List all interfaces on the machine 2. Ask Windows for the best route on each interface, except our TUN interface. 3. Sort by Windows' routing metric and pick the lowest one (lower is better). Thanks to the abstraction of `SocketFactory` that we already previously introduced, Integrating this into `connlib` isn't too difficult: 1. For TCP sockets, we simply resolve the best route after creating the socket and then bind it to that local interface. That way, all packets will always going via that interface, regardless of which routes are present on our TUN interface. 2. UDP is connection-less so we need to decide per-packet, which interface to use. "Pick the best interface for me" is modelled in `connlib` via the `DatagramOut::src` field being `None`. - To ensure those packets don't cause a routing loop, we introduce a "source IP resolver" for our `UdpSocket`. This function gets called every time we need to send a packet without a source IP. - For improved performance, we cache these results. The Windows client uses this source IP resolver to use the above devised strategy to find a suitable source IP. - In case the source IP resolution fails, we don't send the packet. This is important, otherwise, the kernel might choose our TUN interface again and trigger a routing loop. The last remark to make here is that this also works for connection roaming. The TCP socket gets thrown away when we reconnect to the portal. Thus, the new socket will pick the new best interface as it is re-created. The UDP sockets also get thrown away as part of roaming. That clears the above cache which is what we want: Upon roaming, the best interface for a given destination IP will likely have changed. [0]: `59014a9622/rust/headless-client/src/linux.rs (L19-L29)` [1]: `59014a9622/rust/bin-shared/src/tun_device_manager/linux.rs (L204-L224)` [2]: `59014a9622/rust/connlib/clients/android/src/lib.rs (L535-L549)` [3]: https://learn.microsoft.com/en-us/previous-versions/technet-magazine/cc137807(v=msdn.10)?redirectedfrom=MSDN [4]: https://learn.microsoft.com/en-us/windows-server/networking/technologies/network-subsystem/net-sub-interface-metric Fixes: #5955. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-07-29 22:25:42 +00:00
Reactor Scram	05e3a38701	refactor(bin-shared): remove CommonArgs (#6068 ) Closes #6025 It was only used in the Gateway, so we inline it there and remove `clap` as a dep for ~~that crate~~ `bin-shared`	2024-07-26 21:48:09 +00:00
Gabi	a39b853bc1	fix(windows,linux): ensure `set_routes` is idempotent (#6051 ) Windows may delete the default route during roaming. To prevent this from causing problems, we make `set_routes` add all routes regardless of the previously stored ones. The known routes are only used to compute, what routes are to be removed. For Linux we do the same to make it consistent across platforms. This also give us the chance to not clear the cache when ips are set, since now all routes are always added, meaning they will be always re-added when roaming. Overall, this more closely aligns Linux and Windows with how Firezone works on Apple and Android. There, we always remove all routes and set new ones. Removing routes happens very rarely (only when CIDR resources are deactivated), thus, not removing all and re-adding the routes is still deemed to be worth it. With the new implementation, this is guaranteed to always make the new routes take effect and at the same time be idempotent. --------- Signed-off-by: Gabi <gabrielalejandro7@gmail.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-07-26 05:13:58 +00:00
Reactor Scram	cc1478adc2	feat(headless-client/windows): add DNS change / network change listening to the Headless Client (#6022 ) Note that for GUI Clients, listening is still done by the GUI process, not the IPC service. Yak shave towards #5846. This allows for faster dev cycles since I won't have to compile all the GUI stuff. Some changes in here were extracted from other draft PRs. Changes: - Remove `thiserror` that was never matched on - Don't return the DNS resolvers from the notifier directly, just send a notification and allow the caller to check the resolvers itself if needed - Rename `DnsListener` to `DnsNotifier` - Rename `Worker` to `NetworkNotifier` - remove `unwrap_or_default` when getting resolvers. I don't know why it's there, if there's a good reason then it should be handled inside the function, not in the caller ```[tasklist] ### Tasks - [x] Rename `Listener` to `Notifier` - [x] (not needed) ~~Support `/etc/resolv.conf` DNS control method too?~~ ```	2024-07-25 15:45:22 +00:00
Reactor Scram	82b8de4c9c	refactor(client/windows): de-dupe wintun.dll (#6020 ) Closes #5977 Refactored some other stuff to make this work Also removed a redundant impl of `ensure_dll` in a benchmark	2024-07-25 14:28:35 +00:00
Thomas Eizinger	50d6b865a1	refactor(connlib): move `Tun` implementations out of `firezone-tunnel` (#5903 ) The different implementations of `Tun` are the last platform-specific code within `firezone-tunnel`. By introducing a dedicated crate and a `Tun` trait, we can move this code into (platform-specific) leaf crates: - `connlib-client-android` - `connlib-client-apple` - `firezone-bin-shared` Related: #4473. --------- Co-authored-by: Not Applicable <ReactorScram@users.noreply.github.com>	2024-07-24 01:10:50 +00:00
Gabi	5b0aaa6f81	fix(connlib): protect all sockets from routing loops (#5797 ) Currently, only connlib's UDP sockets for sending and receiving STUN & WireGuard traffic are protected from routing loops. This is was done via the `Sockets::with_protect` function. Connlib has additional sockets though: - A TCP socket to the portal. - UDP & TCP sockets for DNS resolution via hickory. Both of these can incur routing loops on certain platforms which becomes evident as we try to implement #2667. To fix this, we generalise the idea of "protecting" a socket via a `SocketFactory` abstraction. By allowing the different platforms to provide a specialised `SocketFactory`, anything Linux-based can give special treatment to the socket before handing it to connlib. As an additional benefit, this allows us to remove the `Sockets` abstraction from connlib's API again because we can now initialise it internally via the provided `SocketFactory` for UDP sockets. --------- Signed-off-by: Gabi <gabrielalejandro7@gmail.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-07-16 00:40:05 +00:00
Thomas Eizinger	a4a8221b8b	refactor(connlib): explicitly initialise `Tun` (#5839 ) Connlib's routing logic and networking code is entirely platform agnostic. The only platform-specific bit is how we interact with the TUN device. From connlib's perspective though, all it needs is an interface for reading and writing. How the device gets initialised and updated is client-business. For the most part, this is the same on all platforms: We call callbacks and the client updates the state accordingly. The only annoying bit here is that Android recreates the TUN interface on every update and thus our old file descriptor is invalid. The current design works around this by returning the new file descriptor on Android. This is a problematic design for several reasons: - It forces the callback handler to finish synchronously, and halting connlib until this is complete. - The synchronous nature also means we cannot replace the callbacks with events as events don't have a return value. To fix this, we introduce a new `set_tun` method on `Tunnel`. This moves the business of how the `Tun` device is created up to the client. The clients are already platform-specific so this makes sense. In a future iteration, we can move all the various `Tun` implementations all the way up to the client-specific crates, thus co-locating the platform-specific code. Initialising `Tun` from the outside surfaces another issue: The routes are still set via the `Tun` handle on Windows. To fix this, we introduce a `make_tun` function on `TunDeviceManager` in order for it to remember the interface index on Windows and being able to move the setting of routes to `TunDeviceManager`. This simplifies several of connlib's APIs which are now infallible. Resolves: #4473. --------- Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com> Co-authored-by: conectado <gabrielalejandro7@gmail.com>	2024-07-12 23:54:15 +00:00
Thomas Eizinger	960ce80680	refactor(connlib): move `TunDeviceManager` into `firezone-bin-shared` (#5843 ) The `TunDeviceManager` is a component that the leaf-nodes of our dependency tree need: the binaries. Thus, it is misplaced in the `connlib-shared` crate which is at the very bottom of the dependency tree. This is necessary to allow the `TunDeviceManager` to actually construct a `Tun` (which currently lives in `firezone-tunnel`). Related: #5839. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-07-11 23:42:33 +00:00

46 Commits