firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-04-05 07:06:08 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	5268756b60	feat(connlib): add placeholder for Internet Resource (#5900 ) In preparation for #2667, we add an `internet` variant to our list of possible resource types. This is backwards-compatible with existing clients and ensures that, once the portal starts sending Internet resources to clients, they won't fail to deserialise these messages. The portal will have a version check to not send this to older clients anyway but the sooner we can land this, the better. It simplifies the initial development as we start preparing for the next client release. Adding new fields to a JSON message is always backwards-compatible so we can extend this later with whatever we need.	2024-07-18 04:28:02 +00:00
Thomas Eizinger	4f4134b000	test(connlib): model gateway <> site <> resource relationship (#5871 ) Currently, the relationship between gateways, sites and resources is modeled in an ad-hoc fashion within `tunnel_test`. The correct relationship is: - The portal knows about all sites. - A resource can only be added for an existing site. - One or more gateways belong to a single site. To express this relationship in `tunnel_test`, we first sample between 1 and 3 sites. Then we sample between 1 and 3 gateways and assign them a site each. When adding new resources, we sample a site that the resource belongs to. Upon a connection intent, we sample a gateway from all gateways that belong to the site that the resource is defined in. In addition, this patch-set removes multi-site resources from the `tunnel_test`. As far as connlib's routing logic is concerned, we route packets to a resource on a selected gateway. How the portal selected the site of the gateway doesn't matter to connlib and thus doesn't need to be covered in these tests.	2024-07-17 22:41:47 +00:00
Gabi	7e963f74ca	chore(connlib): performance improvement for picking cidr resources (#5891 ) Extracted from #5840 Some cleanup on generating IPs and improve performance of picking a host within an IP range by doing some math instead of iterating through the ip range.	2024-07-17 06:24:34 +00:00
Thomas Eizinger	14abda01fd	refactor(connlib): polish DNS resource matching (#5866 ) In preparation for implementing #5056, I familiarized myself with the current code and ended up implementing a couple of refactorings.	2024-07-15 23:56:48 +00:00
Thomas Eizinger	a4a8221b8b	refactor(connlib): explicitly initialise `Tun` (#5839 ) Connlib's routing logic and networking code is entirely platform agnostic. The only platform-specific bit is how we interact with the TUN device. From connlib's perspective though, all it needs is an interface for reading and writing. How the device gets initialised and updated is client-business. For the most part, this is the same on all platforms: We call callbacks and the client updates the state accordingly. The only annoying bit here is that Android recreates the TUN interface on every update and thus our old file descriptor is invalid. The current design works around this by returning the new file descriptor on Android. This is a problematic design for several reasons: - It forces the callback handler to finish synchronously, and halting connlib until this is complete. - The synchronous nature also means we cannot replace the callbacks with events as events don't have a return value. To fix this, we introduce a new `set_tun` method on `Tunnel`. This moves the business of how the `Tun` device is created up to the client. The clients are already platform-specific so this makes sense. In a future iteration, we can move all the various `Tun` implementations all the way up to the client-specific crates, thus co-locating the platform-specific code. Initialising `Tun` from the outside surfaces another issue: The routes are still set via the `Tun` handle on Windows. To fix this, we introduce a `make_tun` function on `TunDeviceManager` in order for it to remember the interface index on Windows and being able to move the setting of routes to `TunDeviceManager`. This simplifies several of connlib's APIs which are now infallible. Resolves: #4473. --------- Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com> Co-authored-by: conectado <gabrielalejandro7@gmail.com>	2024-07-12 23:54:15 +00:00
Thomas Eizinger	960ce80680	refactor(connlib): move `TunDeviceManager` into `firezone-bin-shared` (#5843 ) The `TunDeviceManager` is a component that the leaf-nodes of our dependency tree need: the binaries. Thus, it is misplaced in the `connlib-shared` crate which is at the very bottom of the dependency tree. This is necessary to allow the `TunDeviceManager` to actually construct a `Tun` (which currently lives in `firezone-tunnel`). Related: #5839. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-07-11 23:42:33 +00:00
Thomas Eizinger	2013d6a2bf	chore(connlib): improve logging (#5836 ) Currently, the logging of fields in spans for encapsulate and decapsulate operations is a bit inconsistent between client and gateway. Logging the `from` field for every message is actually quite redundant because most of these logs are emitted within `snownet`'s `Allocation` which can add its own span to indicate, which relay we are talking to. For most other operations, it is much more useful to log the connection ID instead of IPs. This should make the logs a bit more succinct.	2024-07-11 23:38:19 +00:00
Thomas Eizinger	08182913a5	refactor(connlib): remove `CidrV4` and `CidrV6` types from callbacks (#5842 ) These are only necessary for the Android and Apple client. Other clients should not need to bother with these custom types. Required-for: #5843.	2024-07-11 14:25:26 +00:00
Thomas Eizinger	f39a57fa50	refactor(connlib): remove cyclic `From` impls (#5837 ) We have several representations of `ResourceDescription` within connlib. The ones within the `callbacks` module are meant for _presentation_ to the clients and thus contain additional information like the site status. The `From` impls deleted within the PR are only used within tests. We can rewrite those tests by asserting on the presented data instead. This is better because it means information about resources only flows in one direction: From connlib to the clients.	2024-07-11 14:21:33 +00:00
Reactor Scram	78f1c7c519	test(firezone-tunnel/windows): Test Windows upload speed in CI (#5607 ) Closes #5601 It looks like we can hit 100+ Mbps in theory. This covers Wintun, Tokio, and Windows OS overhead. It doesn't cover the cryptography or anything in connlib itself. The code is kinda messy but I'm not sure how to clean it up so I'll just leave it for review. This test should fail if there's any regressions in #5598. It fails if any packet is dropped or if the speed is under 100 Mbps ```[tasklist] ### Tasks - [x] Use `ip_packet::make` - [x] Switch to `cargo bench` - [x] Extract windows ARM PR - [x] Clean up wintun.dll install code - [x] Re-request review ```	2024-07-10 19:09:45 +00:00
Thomas Eizinger	0e6ac2040c	test(connlib): use two relays in `tunnel_test` (#5804 ) With the introduction of a routing table in #5786, we can very easily introduce an additional relay to `tunnel_test`. In production, we are always given two relays and thus, this mimics the production setup more closely.	2024-07-09 23:47:35 +00:00
Thomas Eizinger	d15c43b6f2	test(connlib): render IDs as hex u128 (#5803 ) This is a bit of a hack because features should never change behaviour. Unfortunately, we can't use `cfg(test)` here because the proptests live in a different crate and thus for the tests, we import the crate using `cfg(not(test))`. Our `proptest` feature is really only meant to be activated during testing so I think this is fine for now. The benefit is that the test logs are much more terse because proptest will shrink the IDs to `0`, `1` etc. With the upcoming addition of multiple gateways and multiple relays, we will have a lot more IDs in the logs. Thus, it is important that they stay legible.	2024-07-09 14:23:37 +00:00
Thomas Eizinger	9caca475dc	test(connlib): introduce routing table to `tunnel_test` (#5786 ) Currently, `tunnel_test` uses a rather naive approach when dispatching `Transmit`s. In particular, it checks client, gateway and relay separately whether they "want" a certain packet. In a real network, these packets are routed based on their IP. To mimic something similar, we introduce a `Host` abstraction that wraps each component: client, gateway and relay. Additionally, we introduce a `RoutingTable` where we can add and remove hosts. With these things in place, routing a `Transmit` is as easy as looking up the destination IP in the routing table and dispatching to the corresponding host. Our hosts are type-safe: client, gateway and relay have different types. Thus, we abstract over them using a `HostId` in order to know, which host a certain message is for. Following these patches, we can easily introduce multiple gateways and relays to this test by simply making more entries in this routing table. This will increase the test coverage of connlib. Lastly, this patch massively increases the performance of `tunnel_test`. It turns out that previously, we spent a lot of CPU cycles accessing "random" IPs from very large iterators. With this patch, we take a limited range of 100 IPs that we sample from, thus drastically increasing performance of this test. The configured 1000 testcases execute in 3s on my machine now (with opt-level 1 which is what we use in CI). --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2024-07-09 01:48:54 +00:00
Reactor Scram	f6e99752ec	fix(client): flush the OS' DNS cache whenever resources change (#5700 ) Closes #5052 On my dev VMs: - systemd-resolved = 15 ms to flush - Windows = 600 ms to flush I tested with the headless Clients on Linux and Windows and it fixes the issue. On Windows I didn't replicate the issue with the GUI Client, on Linux this patch also fixes it for the GUI Client.	2024-07-03 21:14:43 +00:00
Jamil	8655b711db	fix(connlib): Don't use `operatingSystemVersionString` on Apple OSes (#5628 ) The [HTTP 1.1 RFC](https://datatracker.ietf.org/doc/html/rfc2616) states that HTTP headers should be US-ASCII. This is not the case when the macOS Client is run from a host that has a non-English language selected as its system default due to the way we build the user agent. This PR fixes that by normalizing how we build the user agent by more granularly selecting which fields compose it, and not just relying on OS-provided version strings that may contain non-ASCII characters. fixes https://github.com/firezone/firezone/issues/5467 --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com>	2024-06-28 21:59:02 +00:00
Thomas Eizinger	6c842de83c	refactor(connlib): don't re-initialise `Tun` on config updates (#5392 ) Currently, connlib re-initialises the TUN device on Linux every time its configuration gets updated such as when roaming from one network to another. This is unnecessary. Instead, we can adopt the same approach as already used on MacOS, iOS and Windows and only initialise it if it doesn't exist yet. Doing so surfaces an interesting bug. Currently, attempting to re-initialise the TUN device fails with a warning: > connlib_client_shared::eventloop: Failed to set interface on tunnel: Resource busy (os error 16) See https://github.com/firezone/firezone/actions/runs/9656570163/job/26634409346#step:7:103 for an example. As a consequence, we never actually trigger the `on_set_interface_config` callback and thus never actually set the new IPs on the TUN device. Now that we _are_ calling this callback, we execute `TunDeviceManager::set_ips` which first clears all IPs from the device and then attaches the new ones. A consequence of this is that the Linux kernel will clear all routes associated with the device. This clashes with an optimisation we have in `TunDeviceManager` where we remember the previously set routes and don't set new ones if they are the same. This `HashSet` needs to be cleared upon setting new IPs in order to actually set the new routes correctly afterwards. Without that, we stop receiving traffic on the TUN device.	2024-06-25 22:30:31 +00:00
Thomas Eizinger	409039afde	chore(connlib): improve error messages in `TunDeviceManager` (#5530 )	2024-06-25 14:09:48 +00:00
Thomas Eizinger	bd989d4416	chore(connlib): improve logging for `set_routes` on Linux (#5529 ) Logging the routes in the span and in an event creates duplicate information so we remove the former. Additionally, we add a debug log in case we short-circuit the function.	2024-06-25 14:09:06 +00:00
Thomas Eizinger	eec0652abe	chore(connlib): shrink "packet not allowed" log (#5476 ) All allowed IPs can be a fair few which clutters the log. Remove the `HashSet` from the error and also remove the stuttering; the error already says "Packet not allowed".	2024-06-25 01:16:29 +00:00
Gabi	aea03a490c	feat(connlib): clients make use of DNS mangling on gateways (#5049 ) This PR is the "client-side" of things for #4994. Up until now, when a user wanted to connect to a DNS resource, we would establish a connection to the gateway and pass along the domain we are trying to access. The gateway would resolve that domain and send the response back to the client, allowing them to finally send a DNS response. Now, we instantly assign and respond with 4x A and 4x AAAA records to any query for one of our DNS resources. Upon the first IP packet for one of these "proxy IPs", we select a gateway, establish a connection and send our proxy IPs along. The gateway then performs the necessary mangling and NATing of all packets. See #5354 for details. Resolves: #4994. Resolves: #5491. --------- Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-06-24 23:42:15 +00:00
Reactor Scram	28378fe24e	refactor(headless-client): remove FIREZONE_PACKAGE_VERSION (#5487 ) Closes #5481 With this, I can connect to the staging portal without a build.rs or any extra env var setup <img width="387" alt="image" src="https://github.com/firezone/firezone/assets/13400041/9c080b36-3a76-49c7-b706-20723697edc7"> ```[tasklist] ### Next steps - [x] Split out a refactor PR for `ConnectArgs` (#5488) - [x] Try doing this for other Clients - [x] Check Gateway - [x] Check Tauri Client - [x] Change to `app_version` - [x] Open for review - [ ] Use `option_env` so that `FIREZONE_PACKAGE_VERSION` can still override the Cargo.toml version for local testing - [ ] Check Android Client - [ ] Check Apple Client ``` --------- Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-06-21 23:06:41 +00:00
Thomas Eizinger	14785eba9f	chore(connlib): tune logs around proxy IPs and DNS resources (#5439 ) Adds and tunes some logs around creating, using and disassociated proxy IPs for DNS resources.	2024-06-20 03:52:08 +00:00
Gabi	95f13c89c6	fix(connlib): don't treat pending connections as errors (#5433 ) When a user sends the first packet to a resource, we generate a "connection intent" and consult the portal, which gateway to use for this resource. This process is throttled to only generate a new intent every 2s. Once we know, which gateway to use for a certain resource, we initiate a connection via snownet. This involves an OFFER-ANSWER handshake with the gateway. A connection for which we have sent an offer and have not yet received an answer is what we call a "pending connection". In case the connection setup takes longer than 2s, we will generate another connection intent which can point to the same gateway that we are currently setting up a connection with. Currently, encountering a "pending connection" during another connection setup is treated as an error which results in some state being cleaned-up / removed. This is where the bug surfaces: If we remove the state for a resource as a result of a 2nd connection intent and then receive the response of the first one, we will be left with no state that knows about this resource. We fix this by refactoring `create_or_reuse_connection` to be atomic in regards to its state changes: All checks that fail the function are moved to the top which means there is no state to clean up in case of an error. Additionally, we model the case of a "pending connection" using an `Option` to not flood the logs with "pending connection" warnings as those are expected during normal operation. Fixes: #5385	2024-06-19 02:04:09 +00:00
Gabi	2ea6a5d07e	feat(gateway): NAT & mangling for DNS resources (#5354 ) As part of #4994, the IP translation and mangling of packets to and from DNS resources is moved to the gateway. This PR represents the "gateway-half" of the required changes. Eventually, the client will send a list of proxy IPs that it assigned for a certain DNS resource. The gateway assigns each proxy IP to a real IP and mangles outgoing and incoming traffic accordingly. There are a number of things that we need to take care of as part of that: - We need to implement NAT to correctly route traffic. Our NAT table maps from source port* and destination IP to an assigned port* and real IP. We say port* because that is only true for UDP and TCP. For ICMP, we use the identifier. - We need to translate between IPv4 and IPv6 in case a DNS resource e.g. only resolves to IPv6 addresses but the client gave out an IPv4 proxy address to the application. This translation is was added in #5364 and is now being used here. This PR is backwards-compatible because currently, clients don't send any IPs to the gateway. No proxy IPs means we cannot do any translation and thus, packets are simply routed through as is which is what the current clients expect. --------- Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2024-06-19 01:15:27 +00:00
Gabi	75faf25050	fix(connlib): accept null address_descriptions (#5366 ) Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>	2024-06-14 17:21:38 +00:00
Thomas Eizinger	489a14a0ed	test(connlib): directly sample from state instead of indexing (#5332 ) Currently, we use `sample::Index` and `sample::Selector` to deterministically select parts of our state. Originally, this was done because I did not yet fully understand, how `proptest-state-machine` works. The available transitions are always sampled from the current state, meaning we can directly use `sample::select` to pick an element like an IP address from a list. This has several advantages: - The transitions are more readable when debug-printed because they now contain the actual data that is being used. - I _think_ this results in better shrinking because `sample::select` will perform a binary search for the problematic value. - We can more easily implement transitions that _remove_ state. Currently, we cannot remove things from the `ReferenceState` because the system-under-test would also have to index into the `ReferenceState` as part of executing its transition. By directly embedding all necessary information in the transition, this is much simpler.	2024-06-13 00:07:02 +00:00
Jamil	7c5c7a856a	fix: Use correct component versions by overriding from FIREZONE_PACKAGE_VERSION (#5344 ) Now that #4397 is complete, we need a way to bake in the desired component version so that it's reported properly to the portal. This PR adds a global override, "FIREZONE_PACKAGE_VERSION" that can be optionally set to bake the version in. If left blank, the behavior is unchanged, "CARGO_PKG_VERSION" is used instead, which is populated from `connlib-shared`'s Cargo.toml. ## Problem <img width="520" alt="Screenshot 2024-06-12 at 11 34 45 AM" src="https://github.com/firezone/firezone/assets/167144/b04fcbe5-dcba-4a0d-b93f-7abd923b4f04"> <img width="439" alt="Screenshot 2024-06-12 at 11 34 36 AM" src="https://github.com/firezone/firezone/assets/167144/7b1828fe-4073-4a1f-8cbd-5e55ba241745">	2024-06-12 22:09:48 +00:00
Thomas Eizinger	d0efc55918	test(connlib): reduce number of local rejections (#5221 ) To make proptests efficient, it is important to generate the set of possible test cases algorithmically instead of filtering through randomly generated values. This PR makes the strategies for upstream DNS servers and IP networks more efficient by removing the filtering.	2024-06-05 21:44:19 +00:00
Thomas Eizinger	3f3ea96ca7	test(connlib): generate resources with wildcard and `?` addresses (#5209 ) Currently, `tunnel_test` only tests DNS resources with fully-qualified domain names. Firezone also supports wildcard domains in the forms of `*.example.com` and `?.example.com`. To include these in the tests, we generate a bunch of DNS records that include various subdomains for such wildcard DNS resources. When sampling DNS queries, we already take them from the pool of global DNS records which now also includes these subdomains, thus nothing else needed to be changed to support testing these resources.	2024-06-05 06:54:08 +00:00
Reactor Scram	deefabd8f8	refactor(firezone-tunnel): move routes and DNS control out of connlib and up to the Client (#5111 ) Refs #3636 (This pays down some of the technical debt from Linux DNS) Refs #4473 (This partially fulfills it) Refs #5068 (This is needed to make `FIREZONE_DNS_CONTROL` mandatory) As of dd6421: - On both Linux and Windows, DNS control and IP setting (i.e. `on_set_interface_config`) both move to the Client - On Windows, route setting stays in `tun_windows.rs`. Route setting in Windows requires us to know the interface index, which we don't know in the Client code. If we could pass opaque platform-specific data between the tunnel and the Client it would be easy. - On Linux, route setting moves to the Client and Gateway, which completely removes the `worker` task in `tun_linux.rs` - Notifying systemd that we're ready moves up to the headless Client / IPC service ```[tasklist] ### Before merging / notes - [x] Does DNS roaming work on Linux on `main`? I don't see where it hooks up. I think I only set up DNS in `Tun::new` (Yes, the `Tun` gets recreated every time we reconfigure the device) - [x] Fix Windows Clients - [x] Fix Gateway - [x] Make sure connlib doesn't get the DNS control method from the env var (will be fixed in #5068) - [x] De-dupe consts - [ ] ~~Add DNS control test~~ (failed) - [ ] Smoke test Linux - [ ] Smoke test Windows ```	2024-06-03 14:32:08 +00:00
Thomas Eizinger	ce929e1204	test(connlib): resolve DNS resources in `tunnel_test` (#5083 ) Currently, `tunnel_test` only sends ICMPs to CIDR resources. We also want to test certain properties in regards to DNS resources. In particular, we want to test: - Given a DNS resource, can we query it for an IP? - Can we send an ICMP packet to the resolved IP? - Is the mapping of proxy IP to upstream IP stable? To achieve this, we sample a list of `IpAddr` whenever we add a DNS resource to the state. We also add the transition `SendQueryToDnsResource`. As the name suggests, this one simulates a DNS query coming from the system for one of our resources. We simulate A and AAAA queries and take note of the addresses that connlib returns to us for the queries. Lastly, as part of `SendICMPPacketToResource`, we now may also sample from a list of IPs that connlib gave us for a domain and send an ICMP packet to that one. There is one caveat in this test that I'd like to point out: At the moment, the exact mapping of proxy IP to real IP is an implementation detail of connlib. As a result, I don't know which proxy IP I need to use in order to ping a particular "real" IP. This presents an issue in the assertions: Upon the first ICMP packet, I cannot assert what the expected destination is. Instead, I need to "remember" it. In case we send another ICMP packet to the same resource and happen to sample the same proxy IP, we can then assert that the mapping did not change.	2024-05-31 04:44:30 +00:00
Thomas Eizinger	974eb95dc5	test(connlib): reduce number of sites to 3 (#5152 ) Generating up to 10 can be quite verbose in the output. I think 3 should also be enough to hit all codepaths that need to deal with more than 1.	2024-05-29 02:00:27 +00:00
Thomas Eizinger	fbc13f6946	test(connlib): generate actual domain names as inputs (#5146 ) Extracted out of #5083.	2024-05-29 00:51:16 +00:00
Reactor Scram	2fb8d9199b	feat(gui-client): add resource details to linux and windows clients (#5142 ) Refs #3514 ```[tasklist] ### Issues - [x] Add special case if `address_description` is empty - [x] Submenus aren't showing up in GNOME - [ ] Accelerator keys don't seem work on Linux nor Windows - [ ] Can't get a Resource in staging to automatically open a URL even though other Resources can do this - [ ] Accelerator for Settings isn't even displayed on Linux - [ ] Submenus spawn halfway off-screen in KDE ``` # Linux ## GNOME menu height issue This happens when the menu, including an opened submenu, is taller than the screen. GNOME doesn't seem to scroll the root menu at all, so the "Quit" option gets cut off at the default low resolution of my VMs. It does allow submenus to scroll... but it computes their viewport size based on how much spare space there is between the height of the screen and the height of the root menu. So if the root menu is too big, you don't get to see the Resource submenus. What a mess. If we put all the Resources into their own submenu it might work, but that's a big deviation from other Clients. We can probably live with it for now if a typical customer has, say, 10 Resources and a 1080p screen. More Resources or smaller screens will be a problem. Long-term we're replacing all this anyway. <img width="386" alt="image" src="https://github.com/firezone/firezone/assets/13400041/bb2e0677-372a-441b-805c-2d6714d245e6"> <img width="372" alt="image" src="https://github.com/firezone/firezone/assets/13400041/3bbdf2f3-1231-4488-a293-61c373ca0021"> ## No activity <img width="381" alt="image" src="https://github.com/firezone/firezone/assets/13400041/d50533bf-686e-44e0-ba01-fe1b6ef745cf"> ## Gateway connected <img width="508" alt="image" src="https://github.com/firezone/firezone/assets/13400041/e5e0b5e4-153a-4d03-a6a1-f8f2da7bf442"> # Windows ## No activity <img width="568" alt="image" src="https://github.com/firezone/firezone/assets/13400041/046e9786-278f-4a2c-a1c8-7c536fcb8442"> ## Gateway connected <img width="562" alt="image" src="https://github.com/firezone/firezone/assets/13400041/5484810a-e766-43a6-8245-191181c08d5b">	2024-05-28 23:42:03 +00:00
Thomas Eizinger	92676f0f53	test(connlib): simulate IO in state machine tests (#4728 ) This is similar to #4097 and #4585 but for the entire `ClientState` and `GatewayState`. We also do it in the context of a property-based test with the vision that we can deterministically explore a large space of state transitions and see where our main property breaks: Being able to send an ICMP packet from the client to the gateway. In other words, we now correctly pass all the `Transmit`s back and forth between the components as if they would receive it from the network. Due to the nature of property-based tests, this already exercises a very large input space. For example, if the client does not have an IPv6 socket and the gateway doesn't have an IPv4 socket, this test already checks whether we then correctly fall back to using a relay (because the allocation we make on the relay is the only network path where the STUN requests pass through). What this does not (yet) do is set up a proper network topology. The `dispatch_transmit` function will happily "route" a `Transmit` from e.g. the client to the gateway even if they are in different subnets. In other words, these tests assume that the actual network itself works and we can exchange UDP packets between the components. For now, we only send ICMPs to CIDR resources. As a next step, we can extend this to DNS resources by sending DNS queries for our DNS resources and then sending an ICMP to the resolved IP.	2024-05-22 23:10:58 +00:00
Thomas Eizinger	49a965a686	chore(connlib): remove unused `ConnlibError::Snownet` variant (#5078 )	2024-05-22 04:39:48 +00:00
Reactor Scram	b510041494	chore(connlib): fix copy-paste typo in comment about DNS (#5053 ) Closes #5051	2024-05-21 18:15:20 +00:00
Gabi	361aafb746	chore(connlib): upgrade domain version from 0.9 to 0.10 (#5028 )	2024-05-20 20:54:22 +00:00
Gabi	a7d35cd5f1	feat(connlib): report resource status to client (#4931 ) This PR introduces site's `Status`. That's used to report to the client the status, either, unknown, online or offline, mostly as a hint to users as what's wrong with a connection. This are the criteria for an online or offline resource * If all sites related to a resource are offline the resource is considered offline, since there's no gateway that can respond to that resource's connection * If any site is online the resource is online, since that same peer can be used to reach that resource * Any other case is unknown Right now resources are single site so it doesn't matter too much but tracking online/offline per-site instead of per-gateway or resource seems like the better long-term solution. The way to "find out" the site's status is: * If a response to a connection details is offline, all sites related to that resource must be offline otherwise there would've been a gateway in the response * At the point we connect to a gateway, the site that corresponds to that gateway must be online * When a connection to a peer stops it's considered unknown again Fixes #4738	2024-05-15 15:33:04 +00:00
Gabi	c46967e1d6	fix(connlib): resource filter deserialization (#4910 ) There was an error on how resource filters were deserialized in the gateway: * we always assumed that there would be the ports included but the portal sends no port down when the "all" range is allowed * also we didn't support the resource_updated message, this fixes it, and resources allow-list can be changes in-flight	2024-05-08 00:16:06 +00:00
Gabi	0c7c96dd07	chore(connlib): pass to client new fields (#4900 ) Fixes #4885	2024-05-07 21:14:29 +00:00
Gabi	68ece0a940	feat(connlib): traffic filtering (#4779 ) This implements traffic filtering on the gateway. Filters are set on the portal, per-resource, in an allow-list manner. If no filters exist for a given resource all packets are allowed, otherwise only packets that matches port/protocol for the filters are allowed, otherwise they are dropped. Filters can be either TCP, UDP or ICMP. For the first 2 multiple ports can be given. Furthermore, multiple filters can exists for the same resource. To be able to add and remove filters with the same IP/CIDR we keep around the whole list of filters for any given peer using an ID map and recalculate the IP each time something is added is removed. This allows us to remove filters and simply recalculate the allowlist for each IP. Furthermore, for any IP, all rules apply, meaning if there are multiple IPs that apply for a resource all port/protocol combinations for that IP will apply. This works well right now for DNS resources, since access is requested by DNS name, then the resource for that DNS name will arrive at the gateway, and the port filtering will apply given that resource(and any other resource with the same IP). However, since the client has no idea of the filters, it can't request the resource access based on the port/protocol combination and we are still using the most specific("longest match") IP. This will mean that for overlapping CIDR resources, only the rules for the most specific will be used, even if the gateway supports applying them all, since it will not have the other resources. This will be solved in #4789. It can also lead to some weirdness, let's say that you have 10.0.0.0/24 -> TCP/80 and 10.0.0.0/16 -> TCP/443 for your user. The user tries to access 10.0.0.1, and will then only be allowed port 80. At some point the user might access 10.1.0.1 and it will be allowed port 443. But from that point on, the user will be allowed to access 80 and 443 in 10.0.0.1 because the rules correctly work on the gateway, the problem is the client side. Again, #4789 will fix this. Left for next PRs (in tentative order!): - #4792 - #4789 Depends on: #4773. Resolves #2030. Resolves #4791. --------- Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>	2024-05-07 19:47:49 +00:00
Reactor Scram	a011a443e7	fix(headless-client): clean up and exit gracefully when `on_disconnect` called (#4785 ) Calling `std::process::exit` won't let the DNS deactivation code runs. For some control methods (systemd-resolved) this doesn't matter. For etc-resolvconf and Windows, we are responsible for cleaning up DNS. ```[tasklist] - [x] Replicate the issue - [x] Fix it - [x] Remove the fault injection code ``` Closes #4784	2024-04-25 22:48:45 +00:00
Thomas Eizinger	51089b89e7	feat(connlib): smoothly migrate relayed connections (#4568 ) Whenever we receive a `relays_presence` message from the portal, we invalidate the candidates of all now disconnected relays and make allocations on the new ones. This triggers signalling of new candidates to the remote party and migrates the connection to the newly nominated socket. This still relies on #4613 until we have #4634. Resolves: #4548. --------- Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-04-20 06:16:35 +00:00
Thomas Eizinger	0f7e80642d	chore(snownet): don't update remote socket from WG activity (#4615 ) Resolves: #4613.	2024-04-20 00:15:19 +00:00
Thomas Eizinger	95219376b9	test(connlib): assert connection intents using property-based state machine test (#4597 ) Opening this in a basic version that asserts sending of connection intents to resource IPs. To do this, we add some boilerplate that sets up the state machine test in general. Together with the [work](`d575dc3866/rust/connlib/snownet/tests/lib.rs (L296-L824)`) that I've done on the `snownet` tests, this can then be extended to describe the entire state machine of connlib and letting `proptest` search for inputs & combinations that break stuff. Some more `Transition`s that I'd expect we can implement: - Add DNS resource - Reconnect (i.e. roam networks) - Remove resource The public API of `Tunnel` isn't actually very large: We add and remove resources, set upstream DNS servers and call `reconnect`. I think the bet here is that we can implement the reference state machine in a very simple way. For example, once we have added a resource and handled the connection-intent, we should be able to send an ICMP packet through the tunnel. I've already worked out how to pass `Transmit`s back and forth between relay, client and gateway (see linked `snownet` tests above). If we port that to this state machine test, we can actually exercise all the code paths that are required to encapsulate / decapsulate those packets whilst asserting against something simple like "packet pops out at the other end". Because the setup of the test is also a proptest-strategy, we can even add the network topology as a variable by configuring the `Firewall` (see `snownet` tests) dynamically with or without blocking rules and thus force the entire tunnel through an (in-memory) relay. Related: #4589.	2024-04-19 02:31:08 +00:00
Thomas Eizinger	bfe07d7ebd	chore(connlib): upsert relays from "init" message (#4567 ) This is another step towards #4548. The portal now includes a list of relays as part of the "init" message. Any time we receive an "init", we will now upsert those relays based on their ID. This requires us to change our internal bookkeeping of relays from indexing them by address to indexing by ID. To ensure that this works correctly, the unit tests are rewritten to use the new `upsert_relays` API. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-04-15 21:30:49 +00:00
Reactor Scram	53968063a5	fix(windows): patch some DNS leaks (#4530 ) Fixes #4488 ```[tasklist] # Before merging - [x] There's one call site that won't compile on Linux. Make this cross-platform. - [x] Does the rule get removed every time when you quit gracefully? - [x] Will this NRPT rule prevent connlib from re-resolving the portal IP if it needs to? - [x] Test network switching. Does this work worse, better, or the same? - [ ] Is the Windows DNS cache flushed exactly when it needs to be? ``` - After connlib connects to the portal, we add an NRPT rule asking Windows to send all DNS queries to our sentinels. This should also be called whenever the interface is re-configured, which might change the sentinel IPs - When exiting gracefully, we delete the rule to restore normal DNS behavior without having to back up and restore the other IPs - We also delete the rule at startup so that if Firezone crashes or misbehaves, restarting it should restore normal DNS - We also flush the system-wide DNS cache whenever we claim different routes. This may flush too often, and it may also miss some flushes that we should do. It needs double-checking. - There is still a gap when changing networks, DNS can leak there, but I don't think it's worse than before.	2024-04-15 21:10:30 +00:00
Reactor Scram	2c9b6c9b3a	refactor(headless-client): use Tokio codec instead of hand-rolled length-delimited codec (#4606 ) The ongoing yak shave towards #3713 Closes #4514 and saves about 30 lines of code, thanks for the suggestion Thomas	2024-04-15 15:19:33 +00:00
Thomas Eizinger	5e1e31b782	refactor(connlib): add property-based tests for adding and removing of resources (#4503 ) Also includes some refactoring around how we update DNS servers and the interface config to allow for some tidy up of those tests. Resolves: #4355.	2024-04-11 06:29:35 +00:00

1 2 3

135 Commits