Files
firezone/.github/workflows
Thomas Eizinger 9073bddaef fix(gateway): translate ICMP destination unreachable errors (#7398)
## Context

The Gateway implements a stateful NAT that translates the destination IP
and source protocol of every packet that targets a DNS resource IP. This
is necessary because the IPs for DNS resources are generated on the
client without actually performing a DNS lookup, instead it always
generates 4 IPv4 and 4 IPv6 addresses. On the Gateway, these IPs are
then assigned in a round-robin fashion to the actual IPs that the domain
resolves to, necessitating a NAT64/46 translation in case a domain only
resolves to IPs of one family.

A domain may resolve to a set of IPs but not all of these IPs may be
routable. Whilst an arguably poor practise of the domain administrator,
routing problems can occur for all kinds of reasons and are well handled
on the wider Internet.

When an IP packet cannot be routed further, the current routing node
generates an ICMP error describing the routing failure and sends it back
to the original sender. ICMP is a layer 4 protocol itself, same as TCP
and UDP. As such, sending out a UDP packet may result in receiving an
ICMP response. In order to allow the sender to learn, which packet
failed to route, the ICMP error embeds parts of the original packet in
its payload [0] [1].

The Gateway's NAT table uses parts of the layer 4 protocol as part of
its key; the UDP and TCP source port and the ICMP echo request
identifier (further referred to as "source protocol"). An ICMP error
message doesn't have any of these, meaning the lookup in the NAT table
currently fails and the ICMP error is silently dropped.

A lot of software implements a happy-eyeballs approach and probs for
IPv6 and IPv4 connectivity simulataneously. The absence of the ICMP
errors confuses that algorithm as it detects the packet loss and starts
retransmits instead of giving up.

## Solution

Upon receiving an ICMP error on the Gateway, we now extract the
partially embedded packet in the ICMP error payload. We use the
destination IP and source protocol of _that_ packet for the lookup in
the NAT table. This returns us the original (client-assigned)
destination IP and source protocol. In order for the Gateway's NAT to be
transparent, we need to patch the packet embedded in the ICMP error to
use the original destination and source protocol. We also have to
account for the fact that the original packet may have been translated
with NAT64/46 and translate it back. Finally, we generate an ICMP error
with the appropriate code and embed the patched packet in its payload.

## Test implementation

To test that this works for all kind of combinations, we extend
`tunnel_test` to sample a list of unreachable IPs from all IPs sampled
for DNS resources. Upon receiving a packet for one of these IPs, the
Gateway will send an ICMP error back instead of invoking its regular
echo reply logic. On the client-side, upon receiving an ICMP error, we
extract the originally failed packet from the body and treat it as a
successful response.

This may seem a bit hacky at first but is actually how operating systems
would treat ICMP errors as well. For example, a `TcpSocket::connect`
call (triggering a TCP SYN packet) may fail with an IO error if we
receive an ICMP error packet. Thus, in a way, the original packet got
answered, just not with what we expected.

In addition, by treating these ICMP errors as responses to the original
packet, we automatically perform other assertions on them, like ensuring
that they come from the right IP address, that there are no unexpected
packets etc.

## Test alternatives

It is tricky to solve this in other ways in the test suite because at
the time of generating a packet for a DNS resource, we don't know the
actual IP that is being targeted by a certain proxy IP unless we'd start
reimplementing the round-robin algorithm employed by the Gateway. To
"test" the transparency of the NAT, we'd like to avoid knowing about
these implementation details in the test.

## Future work

In this PR, we currently only deal with "Destination Unreachable" ICMP
errors. There are other ICMP messages such as ICMPv6's `PacketTooBig` or
`ParameterProblem`. We should eventually handle these as well. They are
being deferred because translating those between the different IP
versions is only partially implemented and would thus require more work.
The most pressing need is to translate destination unreachable errors to
enable happy-eyeballs algorithms to work correctly.

Resolves: #5614.
Resolves: #6371.

[0]: https://www.rfc-editor.org/rfc/rfc792
[1]: https://www.rfc-editor.org/rfc/rfc4443#section-3.1
2024-12-02 23:07:41 +00:00
..