Commit Graph

135 Commits

Author SHA1 Message Date
Thomas Eizinger
5268756b60 feat(connlib): add placeholder for Internet Resource (#5900)
In preparation for #2667, we add an `internet` variant to our list of
possible resource types. This is backwards-compatible with existing
clients and ensures that, once the portal starts sending Internet
resources to clients, they won't fail to deserialise these messages.

The portal will have a version check to not send this to older clients
anyway but the sooner we can land this, the better. It simplifies the
initial development as we start preparing for the next client release.

Adding new fields to a JSON message is always backwards-compatible so we
can extend this later with whatever we need.
2024-07-18 04:28:02 +00:00
Thomas Eizinger
4f4134b000 test(connlib): model gateway <> site <> resource relationship (#5871)
Currently, the relationship between gateways, sites and resources is
modeled in an ad-hoc fashion within `tunnel_test`. The correct
relationship is:

- The portal knows about all sites.
- A resource can only be added for an existing site.
- One or more gateways belong to a single site.

To express this relationship in `tunnel_test`, we first sample between 1
and 3 sites. Then we sample between 1 and 3 gateways and assign them a
site each. When adding new resources, we sample a site that the resource
belongs to. Upon a connection intent, we sample a gateway from all
gateways that belong to the site that the resource is defined in.

In addition, this patch-set removes multi-site resources from the
`tunnel_test`. As far as connlib's routing logic is concerned, we route
packets to a resource on a selected gateway. How the portal selected the
site of the gateway doesn't matter to connlib and thus doesn't need to
be covered in these tests.
2024-07-17 22:41:47 +00:00
Gabi
7e963f74ca chore(connlib): performance improvement for picking cidr resources (#5891)
Extracted from  #5840

Some cleanup on generating IPs and improve performance of picking a host
within an IP range by doing some math instead of iterating through the
ip range.
2024-07-17 06:24:34 +00:00
Thomas Eizinger
14abda01fd refactor(connlib): polish DNS resource matching (#5866)
In preparation for implementing #5056, I familiarized myself with the
current code and ended up implementing a couple of refactorings.
2024-07-15 23:56:48 +00:00
Thomas Eizinger
a4a8221b8b refactor(connlib): explicitly initialise Tun (#5839)
Connlib's routing logic and networking code is entirely platform
agnostic. The only platform-specific bit is how we interact with the TUN
device. From connlib's perspective though, all it needs is an interface
for reading and writing. How the device gets initialised and updated is
client-business.

For the most part, this is the same on all platforms: We call callbacks
and the client updates the state accordingly. The only annoying bit here
is that Android recreates the TUN interface on every update and thus our
old file descriptor is invalid. The current design works around this by
returning the new file descriptor on Android. This is a problematic
design for several reasons:

- It forces the callback handler to finish synchronously, and halting
connlib until this is complete.
- The synchronous nature also means we cannot replace the callbacks with
events as events don't have a return value.

To fix this, we introduce a new `set_tun` method on `Tunnel`. This moves
the business of how the `Tun` device is created up to the client. The
clients are already platform-specific so this makes sense. In a future
iteration, we can move all the various `Tun` implementations all the way
up to the client-specific crates, thus co-locating the platform-specific
code.

Initialising `Tun` from the outside surfaces another issue: The routes
are still set via the `Tun` handle on Windows. To fix this, we introduce
a `make_tun` function on `TunDeviceManager` in order for it to remember
the interface index on Windows and being able to move the setting of
routes to `TunDeviceManager`.

This simplifies several of connlib's APIs which are now infallible.

Resolves: #4473.

---------

Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: conectado <gabrielalejandro7@gmail.com>
2024-07-12 23:54:15 +00:00
Thomas Eizinger
960ce80680 refactor(connlib): move TunDeviceManager into firezone-bin-shared (#5843)
The `TunDeviceManager` is a component that the leaf-nodes of our
dependency tree need: the binaries. Thus, it is misplaced in the
`connlib-shared` crate which is at the very bottom of the dependency
tree.

This is necessary to allow the `TunDeviceManager` to actually construct
a `Tun` (which currently lives in `firezone-tunnel`).

Related: #5839.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-07-11 23:42:33 +00:00
Thomas Eizinger
2013d6a2bf chore(connlib): improve logging (#5836)
Currently, the logging of fields in spans for encapsulate and
decapsulate operations is a bit inconsistent between client and gateway.
Logging the `from` field for every message is actually quite redundant
because most of these logs are emitted within `snownet`'s `Allocation`
which can add its own span to indicate, which relay we are talking to.

For most other operations, it is much more useful to log the connection
ID instead of IPs.

This should make the logs a bit more succinct.
2024-07-11 23:38:19 +00:00
Thomas Eizinger
08182913a5 refactor(connlib): remove CidrV4 and CidrV6 types from callbacks (#5842)
These are only necessary for the Android and Apple client. Other clients
should not need to bother with these custom types.

Required-for: #5843.
2024-07-11 14:25:26 +00:00
Thomas Eizinger
f39a57fa50 refactor(connlib): remove cyclic From impls (#5837)
We have several representations of `ResourceDescription` within connlib.
The ones within the `callbacks` module are meant for _presentation_ to
the clients and thus contain additional information like the site
status.

The `From` impls deleted within the PR are only used within tests. We
can rewrite those tests by asserting on the presented data instead.

This is better because it means information about resources only flows
in one direction: From connlib to the clients.
2024-07-11 14:21:33 +00:00
Reactor Scram
78f1c7c519 test(firezone-tunnel/windows): Test Windows upload speed in CI (#5607)
Closes #5601
It looks like we can hit 100+ Mbps in theory. This covers Wintun, Tokio,
and Windows OS overhead. It doesn't cover the cryptography or anything
in connlib itself.

The code is kinda messy but I'm not sure how to clean it up so I'll just
leave it for review.

This test should fail if there's any regressions in #5598.

It fails if any packet is dropped or if the speed is under 100 Mbps

```[tasklist]
### Tasks
- [x] Use `ip_packet::make`
- [x] Switch to `cargo bench`
- [x] Extract windows ARM PR
- [x] Clean up wintun.dll install code
- [x] Re-request review
```
2024-07-10 19:09:45 +00:00
Thomas Eizinger
0e6ac2040c test(connlib): use two relays in tunnel_test (#5804)
With the introduction of a routing table in #5786, we can very easily
introduce an additional relay to `tunnel_test`. In production, we are
always given two relays and thus, this mimics the production setup more
closely.
2024-07-09 23:47:35 +00:00
Thomas Eizinger
d15c43b6f2 test(connlib): render IDs as hex u128 (#5803)
This is a bit of a hack because features should never change behaviour.
Unfortunately, we can't use `cfg(test)` here because the proptests live
in a different crate and thus for the tests, we import the crate using
`cfg(not(test))`.

Our `proptest` feature is really only meant to be activated during
testing so I think this is fine for now.

The benefit is that the test logs are much more terse because proptest
will shrink the IDs to `0`, `1` etc. With the upcoming addition of
multiple gateways and multiple relays, we will have a lot more IDs in
the logs. Thus, it is important that they stay legible.
2024-07-09 14:23:37 +00:00
Thomas Eizinger
9caca475dc test(connlib): introduce routing table to tunnel_test (#5786)
Currently, `tunnel_test` uses a rather naive approach when dispatching
`Transmit`s. In particular, it checks client, gateway and relay
separately whether they "want" a certain packet. In a real network,
these packets are routed based on their IP.

To mimic something similar, we introduce a `Host` abstraction that wraps
each component: client, gateway and relay. Additionally, we introduce a
`RoutingTable` where we can add and remove hosts. With these things in
place, routing a `Transmit` is as easy as looking up the destination IP
in the routing table and dispatching to the corresponding host.

Our hosts are type-safe: client, gateway and relay have different types.
Thus, we abstract over them using a `HostId` in order to know, which
host a certain message is for. Following these patches, we can easily
introduce multiple gateways and relays to this test by simply making
more entries in this routing table. This will increase the test coverage
of connlib.

Lastly, this patch massively increases the performance of `tunnel_test`.
It turns out that previously, we spent a lot of CPU cycles accessing
"random" IPs from very large iterators. With this patch, we take a
limited range of 100 IPs that we sample from, thus drastically
increasing performance of this test. The configured 1000 testcases
execute in 3s on my machine now (with opt-level 1 which is what we use
in CI).

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
2024-07-09 01:48:54 +00:00
Reactor Scram
f6e99752ec fix(client): flush the OS' DNS cache whenever resources change (#5700)
Closes #5052

On my dev VMs:
- systemd-resolved = 15 ms to flush
- Windows = 600 ms to flush

I tested with the headless Clients on Linux and Windows and it fixes the
issue. On Windows I didn't replicate the issue with the GUI Client, on
Linux this patch also fixes it for the GUI Client.
2024-07-03 21:14:43 +00:00
Jamil
8655b711db fix(connlib): Don't use operatingSystemVersionString on Apple OSes (#5628)
The [HTTP 1.1 RFC](https://datatracker.ietf.org/doc/html/rfc2616) states
that HTTP headers should be US-ASCII. This is not the case when the
macOS Client is run from a host that has a non-English language selected
as its system default due to the way we build the user agent.

This PR fixes that by normalizing how we build the user agent by more
granularly selecting which fields compose it, and not just relying on
OS-provided version strings that may contain non-ASCII characters.

fixes https://github.com/firezone/firezone/issues/5467

---------

Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
2024-06-28 21:59:02 +00:00
Thomas Eizinger
6c842de83c refactor(connlib): don't re-initialise Tun on config updates (#5392)
Currently, connlib re-initialises the TUN device on Linux every time its
configuration gets updated such as when roaming from one network to
another. This is unnecessary. Instead, we can adopt the same approach as
already used on MacOS, iOS and Windows and only initialise it if it
doesn't exist yet.

Doing so surfaces an interesting bug. Currently, attempting to
re-initialise the TUN device fails with a warning:

> connlib_client_shared::eventloop: Failed to set interface on tunnel:
Resource busy (os error 16)

See
https://github.com/firezone/firezone/actions/runs/9656570163/job/26634409346#step:7:103
for an example. As a consequence, we never actually trigger the
`on_set_interface_config` callback and thus never actually set the new
IPs on the TUN device.

Now that we _are_ calling this callback, we execute
`TunDeviceManager::set_ips` which first clears all IPs from the device
and then attaches the new ones. A consequence of this is that the Linux
kernel will clear all routes associated with the device. This clashes
with an optimisation we have in `TunDeviceManager` where we remember the
previously set routes and don't set new ones if they are the same.

This `HashSet` needs to be cleared upon setting new IPs in order to
actually set the new routes correctly afterwards. Without that, we stop
receiving traffic on the TUN device.
2024-06-25 22:30:31 +00:00
Thomas Eizinger
409039afde chore(connlib): improve error messages in TunDeviceManager (#5530) 2024-06-25 14:09:48 +00:00
Thomas Eizinger
bd989d4416 chore(connlib): improve logging for set_routes on Linux (#5529)
Logging the routes in the span and in an event creates duplicate
information so we remove the former. Additionally, we add a debug log in
case we short-circuit the function.
2024-06-25 14:09:06 +00:00
Thomas Eizinger
eec0652abe chore(connlib): shrink "packet not allowed" log (#5476)
All allowed IPs can be a fair few which clutters the log. Remove the
`HashSet` from the error and also remove the stuttering; the error
already says "Packet not allowed".
2024-06-25 01:16:29 +00:00
Gabi
aea03a490c feat(connlib): clients make use of DNS mangling on gateways (#5049)
This PR is the "client-side" of things for #4994. Up until now, when a
user wanted to connect to a DNS resource, we would establish a
connection to the gateway and pass along the domain we are trying to
access. The gateway would resolve that domain and send the response back
to the client, allowing them to finally send a DNS response.

Now, we instantly assign and respond with 4x A and 4x AAAA records to
any query for one of our DNS resources. Upon the first IP packet for one
of these "proxy IPs", we select a gateway, establish a connection and
send our proxy IPs along. The gateway then performs the necessary
mangling and NATing of all packets. See #5354 for details.

Resolves: #4994.
Resolves: #5491.

---------

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-06-24 23:42:15 +00:00
Reactor Scram
28378fe24e refactor(headless-client): remove FIREZONE_PACKAGE_VERSION (#5487)
Closes #5481 

With this, I can connect to the staging portal without a build.rs or any
extra env var setup

<img width="387" alt="image"
src="https://github.com/firezone/firezone/assets/13400041/9c080b36-3a76-49c7-b706-20723697edc7">


```[tasklist]
### Next steps
- [x] Split out a refactor PR for `ConnectArgs` (#5488)
- [x] Try doing this for other Clients
- [x] Check Gateway
- [x] Check Tauri Client
- [x] Change to `app_version`
- [x] Open for review
- [ ] Use `option_env` so that `FIREZONE_PACKAGE_VERSION` can still override the Cargo.toml version for local testing
- [ ] Check Android Client
- [ ] Check Apple Client
```

---------

Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
2024-06-21 23:06:41 +00:00
Thomas Eizinger
14785eba9f chore(connlib): tune logs around proxy IPs and DNS resources (#5439)
Adds and tunes some logs around creating, using and disassociated proxy
IPs for DNS resources.
2024-06-20 03:52:08 +00:00
Gabi
95f13c89c6 fix(connlib): don't treat pending connections as errors (#5433)
When a user sends the first packet to a resource, we generate a
"connection intent" and consult the portal, which gateway to use for
this resource. This process is throttled to only generate a new intent
every 2s.

Once we know, which gateway to use for a certain resource, we initiate a
connection via snownet. This involves an OFFER-ANSWER handshake with the
gateway. A connection for which we have sent an offer and have not yet
received an answer is what we call a "pending connection".

In case the connection setup takes longer than 2s, we will generate
another connection intent which can point to the same gateway that we
are currently setting up a connection with.

Currently, encountering a "pending connection" during another connection
setup is treated as an error which results in some state being
cleaned-up / removed. This is where the bug surfaces: If we remove the
state for a resource as a result of a 2nd connection intent and then
receive the response of the first one, we will be left with no state
that knows about this resource.

We fix this by refactoring `create_or_reuse_connection` to be atomic in
regards to its state changes: All checks that fail the function are
moved to the top which means there is no state to clean up in case of an
error. Additionally, we model the case of a "pending connection" using
an `Option` to not flood the logs with "pending connection" warnings as
those are expected during normal operation.

Fixes: #5385
2024-06-19 02:04:09 +00:00
Gabi
2ea6a5d07e feat(gateway): NAT & mangling for DNS resources (#5354)
As part of #4994, the IP translation and mangling of packets to and from
DNS resources is moved to the gateway. This PR represents the
"gateway-half" of the required changes.

Eventually, the client will send a list of proxy IPs that it assigned
for a certain DNS resource. The gateway assigns each proxy IP to a real
IP and mangles outgoing and incoming traffic accordingly. There are a
number of things that we need to take care of as part of that:

- We need to implement NAT to correctly route traffic. Our NAT table
maps from source port* and destination IP to an assigned port* and real
IP. We say port* because that is only true for UDP and TCP. For ICMP, we
use the identifier.
- We need to translate between IPv4 and IPv6 in case a DNS resource e.g.
only resolves to IPv6 addresses but the client gave out an IPv4 proxy
address to the application. This translation is was added in #5364 and
is now being used here.

This PR is backwards-compatible because currently, clients don't send
any IPs to the gateway. No proxy IPs means we cannot do any translation
and thus, packets are simply routed through as is which is what the
current clients expect.

---------

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
2024-06-19 01:15:27 +00:00
Gabi
75faf25050 fix(connlib): accept null address_descriptions (#5366)
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
2024-06-14 17:21:38 +00:00
Thomas Eizinger
489a14a0ed test(connlib): directly sample from state instead of indexing (#5332)
Currently, we use `sample::Index` and `sample::Selector` to
deterministically select parts of our state. Originally, this was done
because I did not yet fully understand, how `proptest-state-machine`
works.

The available transitions are always sampled from the current state,
meaning we can directly use `sample::select` to pick an element like an
IP address from a list. This has several advantages:

- The transitions are more readable when debug-printed because they now
contain the actual data that is being used.
- I _think_ this results in better shrinking because `sample::select`
will perform a binary search for the problematic value.
- We can more easily implement transitions that _remove_ state.
Currently, we cannot remove things from the `ReferenceState` because the
system-under-test would also have to index into the `ReferenceState` as
part of executing its transition. By directly embedding all necessary
information in the transition, this is much simpler.
2024-06-13 00:07:02 +00:00
Jamil
7c5c7a856a fix: Use correct component versions by overriding from FIREZONE_PACKAGE_VERSION (#5344)
Now that #4397 is complete, we need a way to bake in the desired
component version so that it's reported properly to the portal.

This PR adds a global override, "FIREZONE_PACKAGE_VERSION" that can be
optionally set to bake the version in. If left blank, the behavior is
unchanged, "CARGO_PKG_VERSION" is used instead, which is populated from
`connlib-shared`'s Cargo.toml.

## Problem

<img width="520" alt="Screenshot 2024-06-12 at 11 34 45 AM"
src="https://github.com/firezone/firezone/assets/167144/b04fcbe5-dcba-4a0d-b93f-7abd923b4f04">
<img width="439" alt="Screenshot 2024-06-12 at 11 34 36 AM"
src="https://github.com/firezone/firezone/assets/167144/7b1828fe-4073-4a1f-8cbd-5e55ba241745">
2024-06-12 22:09:48 +00:00
Thomas Eizinger
d0efc55918 test(connlib): reduce number of local rejections (#5221)
To make proptests efficient, it is important to generate the set of
possible test cases algorithmically instead of filtering through
randomly generated values.

This PR makes the strategies for upstream DNS servers and IP networks
more efficient by removing the filtering.
2024-06-05 21:44:19 +00:00
Thomas Eizinger
3f3ea96ca7 test(connlib): generate resources with wildcard and ? addresses (#5209)
Currently, `tunnel_test` only tests DNS resources with fully-qualified
domain names. Firezone also supports wildcard domains in the forms of
`*.example.com` and `?.example.com`.

To include these in the tests, we generate a bunch of DNS records that
include various subdomains for such wildcard DNS resources.

When sampling DNS queries, we already take them from the pool of global
DNS records which now also includes these subdomains, thus nothing else
needed to be changed to support testing these resources.
2024-06-05 06:54:08 +00:00
Reactor Scram
deefabd8f8 refactor(firezone-tunnel): move routes and DNS control out of connlib and up to the Client (#5111)
Refs #3636 (This pays down some of the technical debt from Linux DNS)
Refs #4473 (This partially fulfills it)
Refs #5068 (This is needed to make `FIREZONE_DNS_CONTROL` mandatory)

As of dd6421:

- On both Linux and Windows, DNS control and IP setting (i.e.
`on_set_interface_config`) both move to the Client
- On Windows, route setting stays in `tun_windows.rs`. Route setting in
Windows requires us to know the interface index, which we don't know in
the Client code. If we could pass opaque platform-specific data between
the tunnel and the Client it would be easy.
- On Linux, route setting moves to the Client and Gateway, which
completely removes the `worker` task in `tun_linux.rs`
- Notifying systemd that we're ready moves up to the headless Client /
IPC service

```[tasklist]
### Before merging / notes
- [x] Does DNS roaming work on Linux on `main`? I don't see where it hooks up. I think I only set up DNS in `Tun::new` (Yes, the `Tun` gets recreated every time we reconfigure the device)
- [x] Fix Windows Clients
- [x] Fix Gateway
- [x] Make sure connlib doesn't get the DNS control method from the env var (will be fixed in #5068)
- [x] De-dupe consts
- [ ] ~~Add DNS control test~~ (failed)
- [ ] Smoke test Linux
- [ ] Smoke test Windows
```
2024-06-03 14:32:08 +00:00
Thomas Eizinger
ce929e1204 test(connlib): resolve DNS resources in tunnel_test (#5083)
Currently, `tunnel_test` only sends ICMPs to CIDR resources. We also
want to test certain properties in regards to DNS resources. In
particular, we want to test:

- Given a DNS resource, can we query it for an IP?
- Can we send an ICMP packet to the resolved IP?
- Is the mapping of proxy IP to upstream IP stable?

To achieve this, we sample a list of `IpAddr` whenever we add a DNS
resource to the state. We also add the transition
`SendQueryToDnsResource`. As the name suggests, this one simulates a DNS
query coming from the system for one of our resources. We simulate A and
AAAA queries and take note of the addresses that connlib returns to us
for the queries.

Lastly, as part of `SendICMPPacketToResource`, we now may also sample
from a list of IPs that connlib gave us for a domain and send an ICMP
packet to that one.

There is one caveat in this test that I'd like to point out: At the
moment, the exact mapping of proxy IP to real IP is an implementation
detail of connlib. As a result, I don't know which proxy IP I need to
use in order to ping a particular "real" IP. This presents an issue in
the assertions: Upon the first ICMP packet, I cannot assert what the
expected destination is. Instead, I need to "remember" it. In case we
send another ICMP packet to the same resource and happen to sample the
same proxy IP, we can then assert that the mapping did not change.
2024-05-31 04:44:30 +00:00
Thomas Eizinger
974eb95dc5 test(connlib): reduce number of sites to 3 (#5152)
Generating up to 10 can be quite verbose in the output. I think 3 should
also be enough to hit all codepaths that need to deal with more than 1.
2024-05-29 02:00:27 +00:00
Thomas Eizinger
fbc13f6946 test(connlib): generate actual domain names as inputs (#5146)
Extracted out of #5083.
2024-05-29 00:51:16 +00:00
Reactor Scram
2fb8d9199b feat(gui-client): add resource details to linux and windows clients (#5142)
Refs #3514 

```[tasklist]
### Issues
- [x] Add special case if `address_description` is empty
- [x] Submenus aren't showing up in GNOME
- [ ] Accelerator keys don't seem work on Linux nor Windows
- [ ] Can't get a Resource in staging to automatically open a URL even though other Resources can do this
- [ ] Accelerator for Settings isn't even displayed on Linux
- [ ] Submenus spawn halfway off-screen in KDE
```

# Linux

## GNOME menu height issue

This happens when the menu, including an opened submenu, is taller than
the screen. GNOME doesn't seem to scroll the root menu at all, so the
"Quit" option gets cut off at the default low resolution of my VMs. It
does allow submenus to scroll... but it computes their viewport size
based on how much spare space there is between the height of the screen
and the height of the root menu. So if the root menu is too big, you
don't get to see the Resource submenus.

What a mess.

If we put all the Resources into their own submenu it might work, but
that's a big deviation from other Clients. We can probably live with it
for now if a typical customer has, say, 10 Resources and a 1080p screen.
More Resources or smaller screens will be a problem.

Long-term we're replacing all this anyway.

<img width="386" alt="image"
src="https://github.com/firezone/firezone/assets/13400041/bb2e0677-372a-441b-805c-2d6714d245e6">

<img width="372" alt="image"
src="https://github.com/firezone/firezone/assets/13400041/3bbdf2f3-1231-4488-a293-61c373ca0021">

## No activity

<img width="381" alt="image"
src="https://github.com/firezone/firezone/assets/13400041/d50533bf-686e-44e0-ba01-fe1b6ef745cf">

## Gateway connected

<img width="508" alt="image"
src="https://github.com/firezone/firezone/assets/13400041/e5e0b5e4-153a-4d03-a6a1-f8f2da7bf442">

# Windows

## No activity

<img width="568" alt="image"
src="https://github.com/firezone/firezone/assets/13400041/046e9786-278f-4a2c-a1c8-7c536fcb8442">

## Gateway connected

<img width="562" alt="image"
src="https://github.com/firezone/firezone/assets/13400041/5484810a-e766-43a6-8245-191181c08d5b">
2024-05-28 23:42:03 +00:00
Thomas Eizinger
92676f0f53 test(connlib): simulate IO in state machine tests (#4728)
This is similar to #4097 and #4585 but for the entire `ClientState` and
`GatewayState`. We also do it in the context of a property-based test
with the vision that we can deterministically explore a large space of
state transitions and see where our main property breaks: Being able to
send an ICMP packet from the client to the gateway.

In other words, we now correctly pass all the `Transmit`s back and forth
between the components as if they would receive it from the network. Due
to the nature of property-based tests, this already exercises a very
large input space. For example, if the client does not have an IPv6
socket and the gateway doesn't have an IPv4 socket, this test already
checks whether we then correctly fall back to using a relay (because the
allocation we make on the relay is the only network path where the STUN
requests pass through).

What this does not (yet) do is set up a proper network topology. The
`dispatch_transmit` function will happily "route" a `Transmit` from e.g.
the client to the gateway even if they are in different subnets. In
other words, these tests assume that the actual network itself works and
we can exchange UDP packets between the components.

For now, we only send ICMPs to CIDR resources. As a next step, we can
extend this to DNS resources by sending DNS queries for our DNS
resources and then sending an ICMP to the resolved IP.
2024-05-22 23:10:58 +00:00
Thomas Eizinger
49a965a686 chore(connlib): remove unused ConnlibError::Snownet variant (#5078) 2024-05-22 04:39:48 +00:00
Reactor Scram
b510041494 chore(connlib): fix copy-paste typo in comment about DNS (#5053)
Closes #5051
2024-05-21 18:15:20 +00:00
Gabi
361aafb746 chore(connlib): upgrade domain version from 0.9 to 0.10 (#5028) 2024-05-20 20:54:22 +00:00
Gabi
a7d35cd5f1 feat(connlib): report resource status to client (#4931)
This PR introduces site's `Status`. That's used to report to the client
the status, either, unknown, online or offline, mostly as a hint to
users as what's wrong with a connection.

This are the criteria for an online or offline resource

* If all sites related to a resource are offline the resource is
considered offline, since there's no gateway that can respond to that
resource's connection
* If any site is online the resource is online, since that same peer can
be used to reach that resource
* Any other case is unknown

Right now resources are single site so it doesn't matter too much but
tracking online/offline per-site instead of per-gateway or resource
seems like the better long-term solution.

The way to "find out" the site's status is:

* If a response to a connection details is offline, all sites related to
that resource must be offline otherwise there would've been a gateway in
the response
* At the point we connect to a gateway, the site that corresponds to
that gateway must be online
* When a connection to a peer stops it's considered unknown again

Fixes #4738
2024-05-15 15:33:04 +00:00
Gabi
c46967e1d6 fix(connlib): resource filter deserialization (#4910)
There was an error on how resource filters were deserialized in the
gateway:

* we always assumed that there would be the ports included but the
portal sends no port down when the "all" range is allowed
* also we didn't support the resource_updated message, this fixes it,
and resources allow-list can be changes in-flight
2024-05-08 00:16:06 +00:00
Gabi
0c7c96dd07 chore(connlib): pass to client new fields (#4900)
Fixes #4885
2024-05-07 21:14:29 +00:00
Gabi
68ece0a940 feat(connlib): traffic filtering (#4779)
This implements traffic filtering on the gateway. Filters are set on the
portal, per-resource, in an allow-list manner.

If no filters exist for a given resource all packets are allowed,
otherwise only packets that matches port/protocol for the filters are
allowed, otherwise they are dropped.

Filters can be either TCP, UDP or ICMP. For the first 2 multiple ports
can be given. Furthermore, multiple filters can exists for the same
resource.

To be able to add and remove filters with the same IP/CIDR we keep
around the whole list of filters for any given peer using an ID map and
recalculate the IP each time something is added is removed.

This allows us to remove filters and simply recalculate the allowlist
for each IP.

Furthermore, for any IP, all rules apply, meaning if there are multiple
IPs that apply for a resource all port/protocol combinations for that IP
will apply.

This works well right now for DNS resources, since access is requested
by DNS name, then the resource for that DNS name will arrive at the
gateway, and the port filtering will apply given that resource(and any
other resource with the same IP).

However, since the client has no idea of the filters, it can't request
the resource access based on the port/protocol combination and we are
still using the most specific("longest match") IP. This will mean that
for overlapping CIDR resources, only the rules for the most specific
will be used, even if the gateway supports applying them all, since it
will not have the other resources. This will be solved in #4789.

It can also lead to some weirdness, let's say that you have 10.0.0.0/24
-> TCP/80 and 10.0.0.0/16 -> TCP/443 for your user.

The user tries to access 10.0.0.1, and will then only be allowed port
80. At some point the user might access 10.1.0.1 and it will be allowed
port 443. But from that point on, the user will be allowed to access 80
and 443 in 10.0.0.1 because the rules correctly work on the gateway, the
problem is the client side. Again, #4789 will fix this.

Left for next PRs (in tentative order!):

- #4792 
- #4789 

Depends on: #4773.
Resolves #2030.
Resolves #4791.

---------

Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
2024-05-07 19:47:49 +00:00
Reactor Scram
a011a443e7 fix(headless-client): clean up and exit gracefully when on_disconnect called (#4785)
Calling `std::process::exit` won't let the DNS deactivation code runs.
For some control methods (systemd-resolved) this doesn't matter. For
etc-resolvconf and Windows, we are responsible for cleaning up DNS.

```[tasklist]
- [x] Replicate the issue
- [x] Fix it
- [x] Remove the fault injection code
```

Closes #4784
2024-04-25 22:48:45 +00:00
Thomas Eizinger
51089b89e7 feat(connlib): smoothly migrate relayed connections (#4568)
Whenever we receive a `relays_presence` message from the portal, we
invalidate the candidates of all now disconnected relays and make
allocations on the new ones. This triggers signalling of new candidates
to the remote party and migrates the connection to the newly nominated
socket.

This still relies on #4613 until we have #4634.

Resolves: #4548.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-04-20 06:16:35 +00:00
Thomas Eizinger
0f7e80642d chore(snownet): don't update remote socket from WG activity (#4615)
Resolves: #4613.
2024-04-20 00:15:19 +00:00
Thomas Eizinger
95219376b9 test(connlib): assert connection intents using property-based state machine test (#4597)
Opening this in a basic version that asserts sending of connection
intents to resource IPs. To do this, we add some boilerplate that sets
up the state machine test in general. Together with the
[work](d575dc3866/rust/connlib/snownet/tests/lib.rs (L296-L824))
that I've done on the `snownet` tests, this can then be extended to
describe the entire state machine of connlib and letting `proptest`
search for inputs & combinations that break stuff.

Some more `Transition`s that I'd expect we can implement:

- Add DNS resource
- Reconnect (i.e. roam networks)
- Remove resource

The public API of `Tunnel` isn't actually very large: We add and remove
resources, set upstream DNS servers and call `reconnect`. I think the
bet here is that we can implement the reference state machine in a very
simple way. For example, once we have added a resource and handled the
connection-intent, we should be able to send an ICMP packet through the
tunnel. I've already worked out how to pass `Transmit`s back and forth
between relay, client and gateway (see linked `snownet` tests above). If
we port that to this state machine test, we can actually exercise all
the code paths that are required to encapsulate / decapsulate those
packets whilst asserting against something simple like "packet pops out
at the other end".

Because the setup of the test is also a proptest-strategy, we can even
add the network topology as a variable by configuring the `Firewall`
(see `snownet` tests) dynamically with or without blocking rules and
thus force the entire tunnel through an (in-memory) relay.

Related: #4589.
2024-04-19 02:31:08 +00:00
Thomas Eizinger
bfe07d7ebd chore(connlib): upsert relays from "init" message (#4567)
This is another step towards #4548. The portal now includes a list of
relays as part of the "init" message. Any time we receive an "init", we
will now upsert those relays based on their ID. This requires us to
change our internal bookkeeping of relays from indexing them by address
to indexing by ID.

To ensure that this works correctly, the unit tests are rewritten to use
the new `upsert_relays` API.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
2024-04-15 21:30:49 +00:00
Reactor Scram
53968063a5 fix(windows): patch some DNS leaks (#4530)
Fixes #4488 

```[tasklist]
# Before merging
- [x] There's one call site that won't compile on Linux. Make this cross-platform.
- [x] Does the rule get removed every time when you quit gracefully?
- [x] Will this NRPT rule prevent connlib from re-resolving the portal IP if it needs to?
- [x] Test network switching. Does this work worse, better, or the same?
- [ ] Is the Windows DNS cache flushed exactly when it needs to be?
```

- After connlib connects to the portal, we add an NRPT rule asking
Windows to send **all** DNS queries to our sentinels. This should also
be called whenever the interface is re-configured, which might change
the sentinel IPs
- When exiting gracefully, we delete the rule to restore normal DNS
behavior without having to back up and restore the other IPs
- We also delete the rule at startup so that if Firezone crashes or
misbehaves, restarting it should restore normal DNS
- We also flush the system-wide DNS cache whenever we claim different
routes. This may flush too often, and it may also miss some flushes that
we should do. It needs double-checking.
- There is still a gap when changing networks, DNS can leak there, but I
don't think it's worse than before.
2024-04-15 21:10:30 +00:00
Reactor Scram
2c9b6c9b3a refactor(headless-client): use Tokio codec instead of hand-rolled length-delimited codec (#4606)
The ongoing yak shave towards #3713

Closes #4514 and saves about 30 lines of code, thanks for the suggestion
Thomas
2024-04-15 15:19:33 +00:00
Thomas Eizinger
5e1e31b782 refactor(connlib): add property-based tests for adding and removing of resources (#4503)
Also includes some refactoring around how we update DNS servers and the
interface config to allow for some tidy up of those tests.

Resolves: #4355.
2024-04-11 06:29:35 +00:00