mirror of
https://github.com/outbackdingo/firezone.git
synced 2026-01-27 18:18:55 +00:00
## Context The Gateway implements a stateful NAT that translates the destination IP and source protocol of every packet that targets a DNS resource IP. This is necessary because the IPs for DNS resources are generated on the client without actually performing a DNS lookup, instead it always generates 4 IPv4 and 4 IPv6 addresses. On the Gateway, these IPs are then assigned in a round-robin fashion to the actual IPs that the domain resolves to, necessitating a NAT64/46 translation in case a domain only resolves to IPs of one family. A domain may resolve to a set of IPs but not all of these IPs may be routable. Whilst an arguably poor practise of the domain administrator, routing problems can occur for all kinds of reasons and are well handled on the wider Internet. When an IP packet cannot be routed further, the current routing node generates an ICMP error describing the routing failure and sends it back to the original sender. ICMP is a layer 4 protocol itself, same as TCP and UDP. As such, sending out a UDP packet may result in receiving an ICMP response. In order to allow the sender to learn, which packet failed to route, the ICMP error embeds parts of the original packet in its payload [0] [1]. The Gateway's NAT table uses parts of the layer 4 protocol as part of its key; the UDP and TCP source port and the ICMP echo request identifier (further referred to as "source protocol"). An ICMP error message doesn't have any of these, meaning the lookup in the NAT table currently fails and the ICMP error is silently dropped. A lot of software implements a happy-eyeballs approach and probs for IPv6 and IPv4 connectivity simulataneously. The absence of the ICMP errors confuses that algorithm as it detects the packet loss and starts retransmits instead of giving up. ## Solution Upon receiving an ICMP error on the Gateway, we now extract the partially embedded packet in the ICMP error payload. We use the destination IP and source protocol of _that_ packet for the lookup in the NAT table. This returns us the original (client-assigned) destination IP and source protocol. In order for the Gateway's NAT to be transparent, we need to patch the packet embedded in the ICMP error to use the original destination and source protocol. We also have to account for the fact that the original packet may have been translated with NAT64/46 and translate it back. Finally, we generate an ICMP error with the appropriate code and embed the patched packet in its payload. ## Test implementation To test that this works for all kind of combinations, we extend `tunnel_test` to sample a list of unreachable IPs from all IPs sampled for DNS resources. Upon receiving a packet for one of these IPs, the Gateway will send an ICMP error back instead of invoking its regular echo reply logic. On the client-side, upon receiving an ICMP error, we extract the originally failed packet from the body and treat it as a successful response. This may seem a bit hacky at first but is actually how operating systems would treat ICMP errors as well. For example, a `TcpSocket::connect` call (triggering a TCP SYN packet) may fail with an IO error if we receive an ICMP error packet. Thus, in a way, the original packet got answered, just not with what we expected. In addition, by treating these ICMP errors as responses to the original packet, we automatically perform other assertions on them, like ensuring that they come from the right IP address, that there are no unexpected packets etc. ## Test alternatives It is tricky to solve this in other ways in the test suite because at the time of generating a packet for a DNS resource, we don't know the actual IP that is being targeted by a certain proxy IP unless we'd start reimplementing the round-robin algorithm employed by the Gateway. To "test" the transparency of the NAT, we'd like to avoid knowing about these implementation details in the test. ## Future work In this PR, we currently only deal with "Destination Unreachable" ICMP errors. There are other ICMP messages such as ICMPv6's `PacketTooBig` or `ParameterProblem`. We should eventually handle these as well. They are being deferred because translating those between the different IP versions is only partially implemented and would thus require more work. The most pressing need is to translate destination unreachable errors to enable happy-eyeballs algorithms to work correctly. Resolves: #5614. Resolves: #6371. [0]: https://www.rfc-editor.org/rfc/rfc792 [1]: https://www.rfc-editor.org/rfc/rfc4443#section-3.1