mirror of
https://github.com/outbackdingo/firezone.git
synced 2026-01-27 18:18:55 +00:00
With this patch, we sample a list of DNS resources on each test run and create a "TCP service" for each of their addresses. Using this list of resources, we then change the `SendTcpPayload` transition to `ConnectTcp` and establish TCP connections using `smoltcp` to these services. For now, we don't send any data on these connections but we do set the keep-alive interval to 5s, meaning `smoltcp` itself will keep these connections alive. We also set the timeout to 30s and after each transition in a test-run, we assert that all TCP sockets are still in their expected state: - `ESTABLISHED` for most of them. - `CLOSED` for all sockets where we ended up sampling an IPv4 address but the DNS resource only supports IPv6 addresses (or vice-versa). In these cases, we use the ICMP error to sent by the Gateway to assert that the socket is `CLOSED`. Unfortunately, `smoltcp` currently does not handle ICMP messages for its sockets, so we have to call `abort` ourselves. Overall, this should assert that regardless of whether we roam networks, switch relays or do other kind of stuff with the underlying connection, the tunneled TCP connection stays alive. In order to make this work, I had to tweak the timeouts when we are on-demand refreshing allocations. This only happens in one particular case: When we are being given new relays by the portal, we refresh all _other_ relays to make sure they are still present. In other words, all relays that we didn't remove and didn't just add but still had in-memory are refreshed. This is important for cases where we are network-partitioned from the portal whilst relays are deployed or reset their state otherwise. Instead of the previous 8s max elapsed time of the exponential backoff like we have it for other requests, we now only use a single message with a 1s timeout there. With the increased ICE timeout of 15s, a TCP connection with a 30s timeout would otherwise not survive such an event. This is because it takes the above mentioned 8s for us to remove a non-functioning relay, all whilst trying to establish a new connection (which also incurs its own ICE timeout then). With the reduced timeout on the on-demand refresh of 1s, we detect the disappeared relay much quicker and can immediately establish a new connection via one of the new ones. As always with reduced timeouts, this can create false-positives if the relay doesn't reply within 1s for some reason. Resolves: #9531
Connlib
Firezone's connectivity library shared by all clients.
Building Connlib
You shouldn't need to build connlib directly; it's typically built as a dependency of one of the other Firezone components. See READMEs in those directories for relevant instructions.