When the Gateway is handed an IP packet for a DNS resource that it
cannot route, it sends back an ICMP unreachable error. According to RFC
792 [0] (for ICMPv4) and RFC 4443 [1] (for ICMPv6), parts of the
original packet should be included in the ICMP error payload to allow
the sending party to correlate, what could not be sent.
For ICMPv4, the RFC says:
```
Internet Header + 64 bits of Data Datagram
The internet header plus the first 64 bits of the original
datagram's data. This data is used by the host to match the
message to the appropriate process. If a higher level protocol
uses port numbers, they are assumed to be in the first 64 data
bits of the original datagram's data.
```
For ICMPv6, the RFC says:
```
As much of invoking packet as possible without the ICMPv6 packet exceeding the minimum IPv6 MTU
```
[0]: https://datatracker.ietf.org/doc/html/rfc792
[1]: https://datatracker.ietf.org/doc/html/rfc4443#section-3.1
ECN information is helpful to allow the congestion controllers to more
easily fine-tune their send and receive windows. When a Firezone Client
receives an IP packet where the ECN bits signal an ECN capable
transport, we mirror this bit on the UDP datagram that carries the
encrypted IP packet.
When receiving a datagram with ECN bits set, the Gateway will then apply
these bits to the decrypted IP packet and pass it along towards its
destination.
This implementation is unfortunately a bit too naive. Not all devices on
the Internet support ECN and therefore, we may receive a datagram that
has its ECN bits cleared when the ECN bits on the inner IP packet still
signal an ECN capable transport. In this case, we should _not_ override
the ECN bits and instead pass the IP packet along as is. Network devices
along the path between Gateway and Resource may still use these ECN bits
to signal congestion.
We fix this by making the `with_ecn` function on `IpPacket` private. It
is not meant to be used outside of the module. We supersede it with a
`with_ecn_from_transport` function that implements the above logic.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Validating checksums can be expensive so this is off-by-default. The
intent is to turn it on in our staging environment so we can detects
bugs in our packet handling code during testing.
When setting ECN information on an IP packet, the header changes and
therefore, we need to update the IP checksum. MacOS attempts to open TCP
connections with ECN information but will fallback to non-ECT if it
detects packet loss. Failing to update the checksums caused the packet
to get dropped at the remote TCP stack and therefore triggered a
retransmission on the MacOS side.
Related: #8899
We have been using buffer pools for a while all over `connlib` as a way
to efficiently use heap-allocated memory. This PR harmonizes the usage
of buffer pools across the codebase by introducing a dedicated
`bufferpool` crate. This crate offers a convenient and easy-to-use API
for all the things we (currently) need from buffer pools. As a nice
bonus of having it all in one place, we can now also track metrics of
how many buffers we have currently allocated.
An example output from the local metrics exporter looks like this:
```
Name : system.buffer.count
Description : The number of buffers allocated in the pool.
Unit : {buffers}
Type : Sum
Sum DataPoints
Monotonic : false
Temporality : Cumulative
DataPoint #0
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 96
Attributes :
-> system.buffer.pool.name: udp-socket-v6
-> system.buffer.pool.buffer_size: 65535
DataPoint #1
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 7
Attributes :
-> system.buffer.pool.buffer_size: 131600
-> system.buffer.pool.name: gso-queue
DataPoint #2
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 128
Attributes :
-> system.buffer.pool.name: udp-socket-v4
-> system.buffer.pool.buffer_size: 65535
DataPoint #3
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 8
Attributes :
-> system.buffer.pool.buffer_size: 1336
-> system.buffer.pool.name: ip-packet
DataPoint #4
StartTime : 2025-04-29 12:41:25.278436
EndTime : 2025-04-29 12:42:25.278088
Value : 9
Attributes :
-> system.buffer.pool.buffer_size: 1336
-> system.buffer.pool.name: snownet
```
Resolves: #8385
Despite our efforts in #8912, the current implementation still does not
do enough to maintain packet ordering across GSO batches.
At present, we very aggressively batch packets of the same length
together. This however is too eager when we consider packet flows such
as the following:
```
9:03:49.585143 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [.], seq 1:1229, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1228
09:03:49.585151 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [P.], seq 1229:2063, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 834
09:03:49.585157 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [P.], seq 2063:3094, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1031
09:03:49.585187 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [.], seq 3094:4322, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1228
09:03:49.585188 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [P.], seq 4322:5156, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 834
09:03:49.585227 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [.], seq 5156:6384, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1228
09:03:49.585228 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [P.], seq 6384:7612, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1228
09:03:49.585230 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [P.], seq 7612:8249, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 637
09:03:49.585846 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [.], seq 8249:9477, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1228
09:03:49.585851 IP 10.128.15.241.3000 > 100.69.109.138.53474: Flags [P.], seq 9477:10705, ack 524, win 249, options [nop,nop,TS val 3862031964 ecr 1928356896], length 1228
```
As we can see here, the remote sends us packet batches of varying
lengths:
- 1228, 834
- 1031
- 1228, 834
- 1228, 1228, 637
- 1228, 1228
1228 represents a "full" TCP packet so any packet following a
full-packet SHOULD be grouped together into a GSO batch.
Currently, we are batching all the 1228 packets together and we ignore
the fact that there were actually smaller sized packets inbetween those
that belong together.
To mitigate this, we refactor the `GsoQueue` to remove the
`segment_size` from the binning key of our map and instead only group
batches by their source, destination and ECN information. Within such a
connection, we then create an ordered list of batches. A new batch is
started if the length differs or we have previously pushed a packet that
isn't of the length of the batch, therefore signalling the end of the
batch.
The result here looks very promising (this is loading
`blog.firezone.dev` via the `lynx` browser from within the
headless-client docker container, so going through a Gateway running
this PR):
|main|this PR|
|---|---|
|||
Related: #8899
When determining, how to NAT a certain packet, we need to identify
whether it is a UDP, TCP or ICMP packet and extract the relevant port or
identifier from it. When parsing these packets, we may run into a
situation where the IP number says that the packet is TCP but it is
actually malformed and we cannot parse the port from it.
In such situations, we end up constructing a `UnsupportedProtocol` error
that then confusingly states the we don't support the TCP protocol (or
UDP / ICMP if those are malformed).
The parsing error here is currently silently discarded as part of the
`.ok()` combinator when constructing the relevant slice. To make these
logs easier to understand, we now add an `inspect_err` call prior to
this the prints, why the packet could not be parsed.
Long-term, I am planning to refactor our IP packet model to eagerly
parse the layer 3 + 4 headers. This will also be necessary to implement
segmentation offloading on the TUN device. Doing so will improve
situations like because we will either pass through the malformed packet
(if at least the header is intact) or drop it much earlier already. In
either case, accessing things like port numbers will be infallible as
part of the processing code.
At present, the Gateway implements a NAT64 conversion that can convert
IPv4 packets to IPv6 and vice versa. Doing this efficiently creates a
fair amount of complexity within our `ip-packet` crate. In addition,
routing ICMP errors back through our NAT is also complicated by this
because we may have to translate the packet embedded in the ICMP error
as well.
The NAT64 module was originally conceived as a result of the new stub
resolver-based DNS architecture. When the Client resolves IPs for a
domain, it doesn't know whether the domain will actually resolve to IPv4
AND IPv6 addresses so it simply assigns 4 of each to every domain. Thus,
when receiving an IPv6 packet for such a DNS resource, the Gateway may
only have IPv4 addresses available and can therefore not route the
packet (unless it translates it).
This problem is not novel. In fact, an IP being unroutable or a
particular route disappearing happens all the time on the Internet. ICMP
was conceived to handle this problem and it is doing a pretty good job
at it. We can make use of that and simply return an ICMP unreachable
error back to the client whenever it picks an IP that we cannot map to
one that we resolved.
In this PR, we leave all of the NAT64 code intact and only add a
feature-flag that - when active - sends aforementioned ICMP error. While
offline (and thus also for our tests), the feature-flag evaluates to
false. It is however set to `true` in the backend, meaning on staging
and later in production, we will send these ICMP errors.
Once this is rolled out and indeed proving to be working as intended, we
can simplify our codebase and rip out the NAT64 module. At that point,
we will also have to adapt the test-suite.
From the perspective of any application, Firezone is a layer-3 network
and will thus use the host's networking stack to form IP packets for
whichever application protocol is in use (UDP, TCP, etc). These packets
then get encapsulated into UDP packets by Firezone and sent to a
Gateway.
As a result of this design, the IP header seen by the networking stacks
of the Client and the receiving service are not visible to any
intermediary along the network path of the Client and Gateway.
In case this network path is congested and middleboxes such as routers
need to drop packets, they will look at the ECN bits in the IP header
(of the UDP packet generated by a Client or Gateway) and flip a bit in
case the previous value indicated support for ECN (`0x01` or `0x10`).
When received by a network stack that supports ECN, seeing `0x11` means
that the network path is congested and that it must reduce its
send/receive windows (or otherwise throttle the connection).
At present, this doesn't work with Firezone because of the
aforementioned encapsulation of IP packets. To support ECN, we need to
therefore:
- Copy ECN bits from a received IP packet to the datagram that
encapsulates it: This ensures that if the Client's network stack support
ECN, we mirror that support on the wire.
- Copy ECN bits from a received datagram to the IP packet the is sent to
the TUN device: This ensures that if the "Congestion Experienced" bit
get set along the network path between Client and Gateway, we reflect
that accordingly on the IP packet emitted by the TUN device.
Resolves: #3758
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
Within Firezone's Rust codebase, we use the `etherparse` crate
extensively to parse network packets. To provide a more ergonomic API,
this is all encapsulated in our `ip-packet` crate.
For #7518, we need to write an eBPF kernel that parses and manipulates
network packets. Etherparse itself doesn't provide any facilities to
manipulate network packets. That is an open feature request:
https://github.com/JulianSchmid/etherparse/issues/9. For the packet
manipulation that we are doing in `connlib`, we already wrote certain
extensions to the `etherparse` crate but today, those are all within the
`ip-packet` crate.
In order to reuse that within the eBPF kernel, we cannot just depend on
`ip-packet` directly because eBPF is a no-std and no-alloc environment,
thus no crate in the dependency tree is allowed to depend on Rust's
std-lib. `etherparse` itself actually has an `std` feature flag that we
can turn off. Introducing the same in `ip-packet` would require a lot of
conditional-compilation gates using `#[cfg]`. it is much easier to just
introduce a new crate that houses all our in-house extensions to
`etherparse`. Eventually, we can hopefully upstream those which is
another motivator to separate this out.
A sizeable chunk of Firezone's Rust components deal with parsing,
manipulating and emitting DNS queries and responses. The API surface of
DNS is quite large and to make handling of all corner-cases easier, we
depend on the `domain` library to do the heavy-lifting for us.
For better or worse, `domain` follows a lazy-parsing approach. Thus,
creating a new DNS message doesn't actually verify that it is in fact
valid. Within Firezone, we make several assumptions around DNS messages,
such as that they will only ever contain a single question.
Historically, DNS allows for multiple questions per query but in
practise, nobody uses that.
Due to how we handle DNS in Firezone, manipulating these messages
happens in multiple places. That combined with the lazy-parsing approach
from `domain` warrants having our own `dns-types` library that wraps
`domain` and provides us with types that offer the interface we need in
the rest of the codebase.
Resolves: #7019
Whilst we are establishing a connection, the host network stack may run
into timeouts and retransmit packets. Buffering these copies doesn't
make any sense because we are then just flooding the remote with e.g. 4
TCP SYNs for the same connection.
This check is O(N) with the number of buffered packets. Those are at
most a few dozens so there shouldn't be a need for anything more
efficient.
Within `connlib`, we read batches of IP packets and process them at
once. Each encrypted packet is appended to a buffer shared with other
packets of the same length. Once the batch is successfully processed,
all of these buffers are written out using GSO to the network. This
allows UDP operations to be much more efficient because not every packet
has to traverse the entire syscall hierarchy of the operating system.
Until now, these buffers got re-allocated on every batch. This is pretty
wasteful and leads to a lot of repeated allocations. Measurements show
that most of the time, we only have a handful of packets with different
segments lengths _per batch_. For example, just booting up the
headless-client and running a speedtest showed that only 5 of these
buffers are were needed at one time.
By introducing a buffer pool, we can reuse these buffers between batches
and avoid reallocating them.
Related: #7747.
IPv6 treats fragmentation and MTU errors differently than IPv4. Rather
than requiring fragmentation on each hop of a routing path,
fragmentation needs to happen at the packet source and failure to route
a packet triggers an ICMPv6 `PacketTooBig` error.
These need to be translated back through our NAT64 implementation of the
Gateway. Due to the size difference in the headers of IPv4 and IPv6, the
available MTU to the IPv4 packet is 20 bytes _less_ than the MTU
reported by the ICMP error. IPv6 headers are always 40 bytes, meaning if
the MTU is reported as e.g. 1200 on the IPv6 side, we need to only offer
1180 to the IPv4 end of the application. Once the new MTU is then
honored, the packets translated by our NAT64 implementation will still
conform to the required MTU of 1200, despite the overhead introduced by
the translation.
Resolves: #7515.
In case an upstream DNS server responds with a payload that exceeds the
available buffer space of an IP packet, we need to truncate the
response. Currently, this truncation uses the **wrong** constant to
check for the maximum allowed length. Instead of the
`MAX_DATAGRAM_PAYLOAD`, we actually need to check against a limit that
is less than the MTU as the IP layer and the UDP layer both add an
overhead.
To fix this, we introduce such a constant and provide additional
documentation on the remaining ones to hopefully avoid future errors.
Certain packets cannot be translated as part of NAT64/46. The RFC says
to "Silently drop" those. Currently, we log all errors that happens
during the translation and don't follow this guideline.
Most of these "silently drop" errors are related to ICMP types that
cannot be represented in the other version such as ICMPv6 Neighbor
Solicitation.
To fix this, we introduce a new error type in the `ip_packet` module:
`ImpossibleTranslation`. For convenience reasons, we carry that one
through all layers as an `anyhow::Error` and test at the very top of the
event-loop, whether the root-cause of the error is such a failed
translation. If so, we ignore the error and move on. This isn't as
type-safe as it could be but it is much easier to implement.
Additionally, the risk of a bug here (i.e. if we stop emitting this
error within the IP packet translation layer) is merely that the log
will pop up again.
Resolves: #7516.
In order to achieve concurrency within `connlib`, we needed to create a
way for IP packets to own the piece of memory they are sitting in. This
allows us to concurrently read IP packets and them batch-process them
(as opposed to have a dedicated buffer and reference it). At the moment,
those IP packets are defined on the stack. With a size of ~1300 bytes
that isn't very large but still causes _some_ amount of copying.
We can avoid this copying by relying on a buffer pool:
1. When reading a new IP packet, we request a new buffer from the pool.
2. When the IP packet gets dropped, the buffer gets returned to the
pool.
This allows us to reuse an allocation for a packet once it finished
processing, resulting in less CPU time spent on copying around memory.
This causes us to make more _individual_ heap-allocations in the
beginning: Each packet is being processed by `connlib` is allocated on
the heap somewhere. At some point during the lifetime of the tunnel,
this will settle in an ideal state where we have allocated enough slots
to cover new packets whilst also reusing memory from packets that
finished processing already.
The actual `IpPacket` data type is now just a pointer. As a result, the
channels to and from the TUN thread (where we were holding multiple of
these packets) are now significantly smaller, leading to roughly the
same memory usage overall.
In my local testing on Linux, the client still only uses about ~15MB of
RAM even with multiple concurrent speedtests running.
As per the RFC, the IPv6 traffic class should be 1-to-1 translated to
the IPv4 DSCP value. However, it appears that not all values here are
valid. In particular, when attempting to reach GitHub over IPv6, we
receive an IPv6 packet that has a traffic class value of 72 which is
out-of-range for the IPv4 DSCP value, resulting in the following error
on the Gateway:
```
Failed to translate packet: NAT64 failed: Error '72' is too big to be a 'IPv4 DSCP (Differentiated Services Code Point)' (maximum allowed value is '63')
```
The bigger scope of this issue is that this causes the ICMP packets
returned to the client to be dropped which means that `ssh` spawned by
`git` doesn't learn that the IPv6 address assigned by Firezone is not
actually routable.
Related: #7476.
## Context
The Gateway implements a stateful NAT that translates the destination IP
and source protocol of every packet that targets a DNS resource IP. This
is necessary because the IPs for DNS resources are generated on the
client without actually performing a DNS lookup, instead it always
generates 4 IPv4 and 4 IPv6 addresses. On the Gateway, these IPs are
then assigned in a round-robin fashion to the actual IPs that the domain
resolves to, necessitating a NAT64/46 translation in case a domain only
resolves to IPs of one family.
A domain may resolve to a set of IPs but not all of these IPs may be
routable. Whilst an arguably poor practise of the domain administrator,
routing problems can occur for all kinds of reasons and are well handled
on the wider Internet.
When an IP packet cannot be routed further, the current routing node
generates an ICMP error describing the routing failure and sends it back
to the original sender. ICMP is a layer 4 protocol itself, same as TCP
and UDP. As such, sending out a UDP packet may result in receiving an
ICMP response. In order to allow the sender to learn, which packet
failed to route, the ICMP error embeds parts of the original packet in
its payload [0] [1].
The Gateway's NAT table uses parts of the layer 4 protocol as part of
its key; the UDP and TCP source port and the ICMP echo request
identifier (further referred to as "source protocol"). An ICMP error
message doesn't have any of these, meaning the lookup in the NAT table
currently fails and the ICMP error is silently dropped.
A lot of software implements a happy-eyeballs approach and probs for
IPv6 and IPv4 connectivity simulataneously. The absence of the ICMP
errors confuses that algorithm as it detects the packet loss and starts
retransmits instead of giving up.
## Solution
Upon receiving an ICMP error on the Gateway, we now extract the
partially embedded packet in the ICMP error payload. We use the
destination IP and source protocol of _that_ packet for the lookup in
the NAT table. This returns us the original (client-assigned)
destination IP and source protocol. In order for the Gateway's NAT to be
transparent, we need to patch the packet embedded in the ICMP error to
use the original destination and source protocol. We also have to
account for the fact that the original packet may have been translated
with NAT64/46 and translate it back. Finally, we generate an ICMP error
with the appropriate code and embed the patched packet in its payload.
## Test implementation
To test that this works for all kind of combinations, we extend
`tunnel_test` to sample a list of unreachable IPs from all IPs sampled
for DNS resources. Upon receiving a packet for one of these IPs, the
Gateway will send an ICMP error back instead of invoking its regular
echo reply logic. On the client-side, upon receiving an ICMP error, we
extract the originally failed packet from the body and treat it as a
successful response.
This may seem a bit hacky at first but is actually how operating systems
would treat ICMP errors as well. For example, a `TcpSocket::connect`
call (triggering a TCP SYN packet) may fail with an IO error if we
receive an ICMP error packet. Thus, in a way, the original packet got
answered, just not with what we expected.
In addition, by treating these ICMP errors as responses to the original
packet, we automatically perform other assertions on them, like ensuring
that they come from the right IP address, that there are no unexpected
packets etc.
## Test alternatives
It is tricky to solve this in other ways in the test suite because at
the time of generating a packet for a DNS resource, we don't know the
actual IP that is being targeted by a certain proxy IP unless we'd start
reimplementing the round-robin algorithm employed by the Gateway. To
"test" the transparency of the NAT, we'd like to avoid knowing about
these implementation details in the test.
## Future work
In this PR, we currently only deal with "Destination Unreachable" ICMP
errors. There are other ICMP messages such as ICMPv6's `PacketTooBig` or
`ParameterProblem`. We should eventually handle these as well. They are
being deferred because translating those between the different IP
versions is only partially implemented and would thus require more work.
The most pressing need is to translate destination unreachable errors to
enable happy-eyeballs algorithms to work correctly.
Resolves: #5614.
Resolves: #6371.
[0]: https://www.rfc-editor.org/rfc/rfc792
[1]: https://www.rfc-editor.org/rfc/rfc4443#section-3.1
One of Rust's promises is "if it compiles, it works". However, there are
certain situations in which this isn't true. In particular, when using
dynamic typing patterns where trait objects are downcast to concrete
types, having two versions of the same dependency can silently break
things.
This happened in #7379 where I forgot to patch a certain Sentry
dependency. A similar problem exists with our `tracing-stackdriver`
dependency (see #7241).
Lastly, duplicate dependencies increase the compile-times of a project,
so we should aim for having as few duplicate versions of a particular
dependency as possible in our dependency graph.
This PR introduces `cargo deny`, a linter for Rust dependencies. In
addition to linting for duplicate dependencies, it also enforces that
all dependencies are compatible with an allow-list of licenses and it
warns when a dependency is referred to from multiple crates without
introducing a workspace dependency. Thanks to existing tooling
(https://github.com/mainmatter/cargo-autoinherit), transitioning all
dependencies to workspace dependencies was quite easy.
Resolves: #7241.
Using the clippy lint `unwrap_used`, we can automatically lint against
all uses of `.unwrap()` on `Result` and `Option`. This turns up quite a
few results actually. In most cases, they are invariants that can't
actually be hit. For these, we change them to `Option`. In other cases,
they can actually be hit. For example, if the user supplies an invalid
log-filter.
Activating this lint ensures the compiler will yell at us every time we
use `.unwrap` to double-check whether we do indeed want to panic here.
Resolves: #7292.
Currently, we don't report very detailed errors when we fail to parse
certain IP packets. With this patch, we use `Result` in more places and
also extend the validation of IP packets to:
a) enforce a length of at most 1280 bytes. This should already be the
case due to our MTU but bad things may happen if that is off for some
reason
b) validate the entire IP packet instead of just its header
Bumps [test-strategy](https://github.com/frozenlib/test-strategy) from
0.3.1 to 0.4.0.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c683eb3cf6"><code>c683eb3</code></a>
Version 0.4.0.</li>
<li><a
href="17706bcd1c"><code>17706bc</code></a>
Update MSRV to 1.70.0.</li>
<li><a
href="90a5efbf00"><code>90a5efb</code></a>
Update dependencies.</li>
<li><a
href="cff2ede71f"><code>cff2ede</code></a>
Changed the strategy generated by <code>#[filter(...)]</code> to reduce
`Too many local ...</li>
<li><a
href="34cc6d2545"><code>34cc6d2</code></a>
Update expected compile error message.</li>
<li><a
href="a4427e2d98"><code>a4427e2</code></a>
Update CI settings.</li>
<li><a
href="ecb7dbae04"><code>ecb7dba</code></a>
Clippy.</li>
<li><a
href="637f29e9c8"><code>637f29e</code></a>
Made it so an error occurs when an unsupported attribute is specified
for enu...</li>
<li><a
href="6d66057bb0"><code>6d66057</code></a>
Use <code>test</code> instead of <code>check</code> with <code>cargo
hack --rust-version</code>.</li>
<li><a
href="cee2ebbfe6"><code>cee2ebb</code></a>
Fix CI settings.</li>
<li>Additional commits viewable in <a
href="https://github.com/frozenlib/test-strategy/compare/v0.3.1...v0.4.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Currently, the Gateway's state machine functions for processing packets
use type-signature that only return `Option`. Any errors while
processing packets are logged internally. This makes it difficult to
consistently log these errors.
We refactor these functions to return `Result<Option<T>>` in most cases,
indicating that they may fail for various reasons and also sometimes
succeed without producing an output.
This allows us to consistently log these errors in the event-loop.
Logging them on WARN or ERROR would be too spammy though. In order to
still be alerted about some of these, we use the `telemetry_event!`
macro which samples them at a rate of 1%. This will alert us about cases
that happen often and allows us to handle them explicitly.
Once this is deployed to staging, I will monitor the alerts in Sentry to
ensure we won't get spammed with events from customers on the next
release.
When encrypting packets, we need to reserve a buffer within which
boringtun will encrypt the IP packet. Unfortunately, `boringtun` panics
if that buffer is not big enough which essentially brings all of
`connlib` down.
Really, we should never see a packet that is too large and ideally, we
enforce this at compile-time by creating different variants of
`IpPacket` that are sized accordingly. That is a large refactoring so
until then, we simply discard them instead of panicking.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Bumps [etherparse](https://github.com/JulianSchmid/etherparse) from
0.15.0 to 0.16.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/JulianSchmid/etherparse/releases">etherparse's
releases</a>.</em></p>
<blockquote>
<h2>v0.16.0 Add IP Packet Defragmentation Support</h2>
<h2>What's Changed</h2>
<ul>
<li>typo by <a
href="https://github.com/ugur-a"><code>@ugur-a</code></a> in <a
href="https://redirect.github.com/JulianSchmid/etherparse/pull/106">JulianSchmid/etherparse#106</a></li>
<li>Add etherparse-defrag by <a
href="https://github.com/JulianSchmid"><code>@JulianSchmid</code></a>
in <a
href="https://redirect.github.com/JulianSchmid/etherparse/pull/92">JulianSchmid/etherparse#92</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/ugur-a"><code>@ugur-a</code></a> made
their first contribution in <a
href="https://redirect.github.com/JulianSchmid/etherparse/pull/106">JulianSchmid/etherparse#106</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/JulianSchmid/etherparse/compare/v0.15.0...v0.16.0">https://github.com/JulianSchmid/etherparse/compare/v0.15.0...v0.16.0</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/JulianSchmid/etherparse/blob/master/changelog.md">etherparse's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog:</h1>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="93c7f0bb13"><code>93c7f0b</code></a>
Resolved clippy warnings</li>
<li><a
href="447c592aab"><code>447c592</code></a>
Increment proptest crate version</li>
<li><a
href="00c04f7dbe"><code>00c04f7</code></a>
Resolved clippy warning</li>
<li><a
href="b6d98e5100"><code>b6d98e5</code></a>
Extended tests for frag pool</li>
<li><a
href="0a58fa5e64"><code>0a58fa5</code></a>
Corrected fragment reconstruction</li>
<li><a
href="74739e5a4f"><code>74739e5</code></a>
Correct ip defrag pool new return type</li>
<li><a
href="c0741f51f3"><code>c0741f5</code></a>
Applying rust fmt & add return_buf to ip defrag pool</li>
<li><a
href="31c8e84f4b"><code>31c8e84</code></a>
Update proptest and mark some tests as not relevant for miri</li>
<li><a
href="29894ab462"><code>29894ab</code></a>
Further work on defragmentation</li>
<li><a
href="9464a0f363"><code>9464a0f</code></a>
Adapt readme to defrag module</li>
<li>Additional commits viewable in <a
href="https://github.com/JulianSchmid/etherparse/compare/v0.15.0...v0.16.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
If edns0 doesn't work correctly DNS servers might respond with messages
bigger than our maximum udp size.
In that case we need to truncate those messages when forwarding the
respond back to the interface and expect the OS to retry with TCP.
Otherwise we aren't able to allocate a packet big enough for this.
Fixes#7121
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
My theory for this issue is that we receive a UDP DNS response from an
upstream server that is bigger than our MTU and thus forwarding it
fails.
This PR doesn't fix that issue by itself but only mitigates the actual
panic. To properly fix the underlying issue, we need to parse the DNS
message. Truncate it and set the TC bit.
Related: #7121.
Currently, tests only send ICMP packets back and forth, to expand our
coverage and later on permit us cover filters and resource picking this
PR implements sending UDP and TCP packets as part of that logic too.
To make this PR simpler in this stage TCP packets don't track an actual
TCP connection, just that they are forwarded back and forth, this will
be fixed in a future PR by emulating TCP sockets.
We also unify how we handle CIDR/DNS/Non Resources to reduce the number
of transitions.
Fixes#7003
---------
Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
At present, `connlib` only supports DNS over UDP on port 53. Responses
over UDP are size-constrained on the IP MTU and thus, not all DNS
responses fit into a UDP packet. RFC9210 therefore mandates that all DNS
resolvers must also support DNS over TCP to overcome this limitation
[0].
Handling UDP packets is easy, handling TCP streams is more difficult
because we need to effectively implement a valid TCP state machine.
Building on top of a lot of earlier work (linked in issue), this is
relatively easy because we can now simply import
`dns_over_tcp::{Client,Server}` which do the heavy lifting of sending
and receiving the correct packets for us.
The main aspects of the integration that are worth pointing out are:
- We can handle at most 10 concurrent DNS TCP connections _per defined
resolver_. The assumption here is that most applications will first
query for DNS records over UDP and only fall back to TCP if the response
is truncated. Additionally, we assume that clients will close the TCP
connections once they no longer need it.
- Errors on the TCP stream to an upstream resolver result in `SERVFAIL`
responses to the client.
- All TCP connections to upstream resolvers get reset when we roam, all
currently ongoing queries will be answered with `SERVFAIL`.
- Upon network reset (i.e. roaming), we also re-allocate new local ports
for all TCP sockets, similar to our UDP sockets.
Resolves: #6140.
[0]: https://www.ietf.org/rfc/rfc9210.html#section-3-5
The last released version of `smoltcp` is `0.11.0`. That version is
almost a year old. Since then, an important "bug" got fixed in the IPv6
handling code of `smoltcp`.
In order to route packets to our interface, we define a dummy IPv4 and
IPv6 address and create catch-all routes with our interface as the
gateway. Together with `set_any_ip(true)`, the makes `smoltcp` accept
any packet we pass it to. This is necessary because we don't directly
connect `smoltcp` to the TUN device but rather have an `InMemoryDevice`
where we explicitly feed certain packets to it.
In the last released version, `smoltcp` only performs the above logic
for IPv4. For IPv6, the additional check for "do we have a route that
this packet matches" doesn't exist and thus no IPv6 traffic is accepted
by `smoltcp`.
Extracted out of #6944.
Currently, in our tests, traffic that is targeted at a resource is
handled "in-line" on the gateway. This doesn't really represent how the
real world works. In the real world, the gateway uses the IP forwarding
functionality of the Linux kernel and the corresponding NAT to send the
IP packet to the actual resource.
We don't want to implement this forwarding and NAT in the tests.
However, our testing harness is about to get more sophisticated. We will
be sending TCP DNS queries with #6944 and we want to test TCP and its
traffic filters with #7003.
The state of those TCP sockets needs to live _somewhere_.
If we "correctly" model this and introduce some kind of `HashMap` with
`dyn Resource` in `TunnelTest`, then we will have to actually implement
NAT for those packets to ensure that e.g. the SYN-ACK of a TCP handshake
makes it back to the correct(!) gateway.
That is rather cumbersome.
This PR suggests taking a shortcut there by associating the resources
with each gateway individually. At present, all we have are UDP DNS
servers. Those don't actually have any connection state themselves but
putting them in place gives us a framework for where we can put
connection-specific state. Most importantly, these resources MUST NOT
hold application-specific state. Instead, that state needs to be kept in
`ReferenceState` or `TunnelState` and passed in by reference, as we do
here for the DNS records.
This has effectively the same behaviour as correctly translating IP
packets back and forth between resources and gateways. The packets
"emitted" by a particular `UdpDnsServerResource` will always go back to
the correct gateway.
This splits out the actual DNS server from #6944 into a separate crate.
At present, it only contains a DNS server. Later, we will likely add a
DNS client to it as well because the proptests and connlib itself will
need a user-space DNS TCP client.
The implementation uses `smoltcp` but that is entirely encapsulated. The
`Server` struct exposes only a high-level interface for
- feeding inbound packets as well as retrieving outbound packets
- retrieving parsed DNS queries and sending DNS responses
Related: #6140.
This incorporates the feedback from #6939 after a discussion with
@conectado. We agreed that the protocol should be more event-based,
where each message has its own event type. Events MAY appear in pairs or
other cardinality combinations, meaning semantically they could be seen
as requests and responses. In general though, due to the unreliable
nature of IP, it is better to view them as events. Events are typically
designed to be idempotent which is important to make this protocol work.
Using events also means it is not as easy to fall into the "trap" of
modelling requests / responses on the control protocol level.
At present, `connlib` utilises the portal as a signalling layer for any
kind of control message that needs to be exchanged between clients and
gateways. For anything regard to connectivity, this is crucial: Before
we have a direct connection to the gateway, we don't really have a
choice other than using the portal as a "relay" to e.g. exchange address
candidates for ICE.
However, once a direct connection has been established, exchanging
information directly with the gateway is faster and removes the portal
as a potential point of failure for the data plane.
For DNS resources, `connlib` intercepts all DNS requests on the client
and assigns its own IPs within the CG-NAT range to all domains that are
configured as resources. Thus, all packets targeting DNS resources will
have one of these IPs set as their destination. The gateway needs to
learn about all the IPs that have been assigned to a certain domain by
the client and perform NAT. We call this concept "DNS resource NAT".
Currently, the domain + the assigned IPs are sent together with the
`allow_access` or `request_connection` message via the portal. The new
control protocol defined in #6732 purposely excludes this information
and only authorises traffic to the entire resource which could also be a
wildcard-DNS resource.
To exchange the assigned IPs for a certain domain with the gateway, we
introduce our own p2p control protocol built on top of IP. All control
protocol messages are sent through the tunnel and thus encrypted at all
times. They are differentiated from regular application traffic as
follows:
- IP src is set to the unspecified IPv6 address (`::`)
- IP dst is set to the unspecified IPv6 address (`::`)
- IP protocol is set to reserved (`0xFF`)
The combination of all three should never appear as regular traffic.
To ensure forwards-compatibility, the control protocol utilises a fixed
8-byte header where the first byte denotes the message kind. In this
current design, there is no concept of a request or response in the
wire-format. Each message is unidirectional and the fact that the two
messages we define in here appear in tandem is purely by convention. We
use the IPv6 payload length to determine the total length of the packet.
The payloads are JSON-encoded. Message types are free to chose whichever
encoding they'd like.
This protocol is sent through the WireGuard tunnel, meaning we are
effectively limited by our device MTU of 1280, otherwise we'd have to
implement fragmentation. For the messages of setting up the DNS resource
NAT, we are below this limit:
- UUIDs are 16 bytes
- Domain names are at most 255 bytes
- IPv6 addresses are 16 bytes * 4
- IPv4 addressers are 4 bytes * 4
Including the JSON serialisation overhead, this results in a total
maximum payload size of 402 bytes, which is well below our MTU.
Finally, another thing to consider here is that IP is unreliable,
meaning each use of this protocol needs to make sure that:
- It is resilient against message re-ordering
- It is resilient against packet loss
The details of how this is ensured for setting up the DNS resource NAT
is left to #6732.
The `len` specified in the constructor of `IpPacket` is user-provided.
Technically, that one can be longer than the actual packet. To make sure
we only ever pass out the precise payload of the IP packet, we read the
length from the IP header and cut the slice at the specified length.
For #6461, we will build a control protocol on top of IP that runs
through the WireGuard tunnel. Reading the exact length of the payload is
important for that.
When debugging DNS-related issues, it is useful to see all DNS queries
that go into `connlib` and the responses that we generate. Analogous to
the `wire::net` and `wire::dev` TRACE logs, we introduce `wire::dns`
which logs incoming queries and the responses on TRACE. The output looks
like this:
```
2024-10-02T00:16:47.522847Z TRACE wire::dns::qry: A caldav.fastmail.com qid=55845
2024-10-02T00:16:47.522926Z TRACE wire::dns::qry: AAAA caldav.fastmail.com qid=56277
2024-10-02T00:16:47.531347Z TRACE wire::dns::res: AAAA caldav.fastmail.com => [] qid=56277 rcode=NOERROR
2024-10-02T00:16:47.538984Z TRACE wire::dns::res: A caldav.fastmail.com => [103.168.172.46 | 103.168.172.61] qid=55845 rcode=NOERROR
2024-10-02T00:16:47.580237Z TRACE wire::dns::qry: HTTPS cloudflare-dns.com qid=21518
2024-10-02T00:16:47.580338Z TRACE wire::dns::qry: A example.org qid=35459
2024-10-02T00:16:47.580364Z TRACE wire::dns::qry: AAAA example.org qid=60073
2024-10-02T00:16:47.580699Z TRACE wire::dns::qry: AAAA ipv4only.arpa qid=17280
2024-10-02T00:16:47.580782Z TRACE wire::dns::qry: A ipv4only.arpa qid=47215
2024-10-02T00:16:47.581134Z TRACE wire::dns::qry: A detectportal.firefox.com qid=34970
2024-10-02T00:16:47.581261Z TRACE wire::dns::qry: AAAA detectportal.firefox.com qid=39505
2024-10-02T00:16:47.609502Z TRACE wire::dns::res: AAAA example.org => [2606:2800:21f:cb07:6820:80da:af6b:8b2c] qid=60073 rcode=NOERROR
2024-10-02T00:16:47.609640Z TRACE wire::dns::res: AAAA ipv4only.arpa => [] qid=17280 rcode=NOERROR
2024-10-02T00:16:47.610407Z TRACE wire::dns::res: A ipv4only.arpa => [192.0.0.170 | 192.0.0.171] qid=47215 rcode=NOERROR
2024-10-02T00:16:47.617952Z TRACE wire::dns::res: HTTPS cloudflare-dns.com => [1 alpn=h3,h2 ipv4hint=104.16.248.249,104.16.249.249 ipv6hint=2606:4700::6810:f8f9,2606:4700::6810:f9f9] qid=21518 rcode=NOERROR
2024-10-02T00:16:47.631124Z TRACE wire::dns::res: A example.org => [93.184.215.14] qid=35459 rcode=NOERROR
2024-10-02T00:16:47.640286Z TRACE wire::dns::res: AAAA detectportal.firefox.com => [detectportal.prod.mozaws.net. | prod.detectportal.prod.cloudops.mozgcp.net. | 2600:1901:0:38d7::] qid=39505 rcode=NOERROR
2024-10-02T00:16:47.641332Z TRACE wire::dns::res: A detectportal.firefox.com => [detectportal.prod.mozaws.net. | prod.detectportal.prod.cloudops.mozgcp.net. | 34.107.221.82] qid=34970 rcode=NOERROR
2024-10-02T00:16:48.737608Z TRACE wire::dns::qry: AAAA myfiles.fastmail.com qid=52965
2024-10-02T00:16:48.737710Z TRACE wire::dns::qry: A myfiles.fastmail.com qid=5114
2024-10-02T00:16:48.745282Z TRACE wire::dns::res: AAAA myfiles.fastmail.com => [] qid=52965 rcode=NOERROR
2024-10-02T00:16:49.027932Z TRACE wire::dns::res: A myfiles.fastmail.com => [103.168.172.46 | 103.168.172.61] qid=5114 rcode=NOERROR
2024-10-02T00:16:49.190054Z TRACE wire::dns::qry: HTTPS github.com qid=64696
2024-10-02T00:16:49.190171Z TRACE wire::dns::qry: A github.com qid=11912
2024-10-02T00:16:49.190502Z TRACE wire::dns::res: A github.com => [100.96.0.1 | 100.96.0.2 | 100.96.0.3 | 100.96.0.4] qid=11912 rcode=NOERROR
2024-10-02T00:16:49.190619Z TRACE wire::dns::qry: A github.com qid=63366
2024-10-02T00:16:49.190730Z TRACE wire::dns::res: A github.com => [100.96.0.1 | 100.96.0.2 | 100.96.0.3 | 100.96.0.4] qid=63366 rcode=NOERROR
```
As with the other filters, seeing both queries and responses can be
achieved with `RUST_LOG=wire::dns=trace`. If you are only interested in
the responses, you can activate a more specific log filter using
`RUST_LOG=wire::dns::res=trace`. All responses also print the original
query that they are answering.
Resolves: #6862.
Currently, `connlib` is entirely single-threaded. This allows us to
reuse a single buffer for processing IP packets and makes reasoning of
the packet processing code very simple. Being single-threaded also means
we can only make use of a single CPU core and all operations have to be
sequential.
Analyzing `connlib` using `perf` shows that we spend 26% of our CPU time
writing packets to the TUN interface [0]. Because we are
single-threaded, `connlib` cannot do anything else during this time. If
we could offload the writing of these packets to a different thread,
`connlib` could already process the next packet while the current one is
writing.
Packets that we send to the TUN interface arrived as an encrypted WG
packet over UDP and get decrypted into a - currently - shared buffer.
Moving the writing to a different thread implies that we have to have
more of these buffer that the next packet(s) can be decrypted into.
To avoid IP fragmentation, we set the maximum IP MTU to 1280 bytes on
the TUN interface. That actually isn't very big and easily fits into a
stackframe. The default stack size for threads is 2MB [1].
Instead of creating more buffers and cycling through them, we can also
simply stack-allocate our IP packets. This incurs some overhead from
copying packets but it is only ~3.5% [2] (This was measured without a
separate thread). With stack-allocated packets, almost all
lifetime-annotations go away which in itself is already a welcome
ergonomics boost. Stack-allocated packets also means we can simply spawn
a new thread for the packet processing. This thread is connected with
two channel to connlib's main thread. The capacity of 1000 packets will
at most consume an additional 3.5 MB of memory which is fine even on our
most-constrained devices such as iOS.
[0]: https://share.firefox.dev/3z78CzD
[1]: https://doc.rust-lang.org/std/thread/#stack-size
[2]: https://share.firefox.dev/3Bf4zlaResolves: #6653.
Resolves: #5541.
When encountering a PTR query, `connlib` checks if the query is for a
Firezone-managed resource and resolve it to the correct IP. If it isn't
for a DNS resource, we should forward the query to the upstream
resolver.
This isn't what is currently happening though. Instead of forwarding the
query, we bail early from `StubResolver::handle` and thus attempt to
route the packet through the tunnel. This however fails because the DNS
query was targeted at `connlib`'s stub resolver address which never
corresponds to a resource IP.
When TRACE logs where activated, this resulted in several entries such
as
> Unknown resource dst=100.100.111.1
To ensure this doesn't regress, we now generate PTR and MX record
queries in `tunnel_test`. We don't assert the response of those but we
do assert that we always get a response. The inclusion of MX records
asserts that unknown query types get correctly forwarded.
Resolves: #6749.
Currently, checking whether a packet is a DNS query has multiple silent
exit paths. This makes it DNS problems difficult to debug because the
packets will be treated as if they have to get routed through the
tunnel.
This is also something we should fix but that isn't done in this PR: If
we know that a packet is for connlib's DNS stub resolver, we should
never route it through the tunnel. Currently, this isn't possible to
express with the type signature of our DNS module and requires more
refactoring.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
The `expect` attribute is similar to `allow` in that it will silence a
particular lint. In addition to `allow` however, `expect` will fail as
soon as the lint is no longer emitted. This ensures we don't end up with
stale `allow` attributes in our codebase. Additionally, it provides a
way of adding a `reason` to document, why the lint is being suppressed.
As the final step in removing `pnet_packet`, we need to introduce `-Mut`
equivalent slices for UDP, TCP and ICMP packets. As a starting point,
introducing `UpdHeaderSliceMut` and `TcpHeaderSliceMut` is fairly
trivial. The ICMP variants are a bit trickier because those are
different for IPv4 and IPv6. Additionally, ICMP for IPv4 is quite
complex because it can have a variable header length. Additionally. for
both variants, the values in byte range 5-8 are semantically different
depending on the ICMP code.
This requires us to design an API that balances ergonomics and
correctness. Technically, an ICMP identifier and sequence can only be
set if the ICMP code is "echo request" or "echo reply". However, adding
an additional parsing step to guarantee this in the type system is quite
verbose.
The trade-off implemented in this PR allows to us to directly write to
the byte 5-8 using the `set_identifier` and `set_sequence` functions. To
catch errors early, this functions have debug-assertions built in that
ensure that the packet is indeed an ICMP echo packet.
Resolves: #6366.
Currently, we have two structs for representing IP packets: `IpPacket`
and `MutableIpPacket`. As the name suggests, they mostly differ in
mutability. This design was originally inspired by the `pnet_packet`
crate which we based our `IpPacket` on. With subsequent iterations, we
added more and more functionality onto our `IpPacket`, like NAT64 &
NAT46 translation. As a result of that, the `MutableIpPacket` is no
longer directly based on `pnet_packet` but instead just keeps an
internal buffer.
This duplication can be resolved by merging the two structs into a
single `IpPacket`. We do this by first replacing all usages of
`IpPacket` with `MutableIpPacket`, deleting `IpPacket` and renaming
`MutableIpPacket` to `IpPacket`. The final design now has different
`self`-receivers: Some functions take `&self`, some `&mut self` and some
consume the packet using `self`.
This results in a more ergonomic usage of `IpPacket` across the codebase
and deletes a fair bit of code. It also takes us one step closer towards
using `etherparse` for all our IP packet interaction-needs. Lastly, I am
currently exploring a performance-optimisation idea that stack-allocates
all IP packets and for that, the current split between `IpPacket` and
`MutableIpPacket` does not really work.
Related: #6366.
This PR introduces the `etherparse` dependency for parsing and
generating IP packets.
Using `etherparse`, we can implement the NAT46 & NAT64 implementations
for the gateway more elegantly because it allows us to parse the IP and
protocol headers into a static and much richer representation. The
conversion to the IPv4/IPv6 equivalent is then just a question of
transforming one data structure into another and writing it to the
correct place in the buffer.
We extract this functionality into dedicated `nat64` and `nat46`
modules.
Furthermore, we implement the various functions in `ip_packet::make`
using `etherparse` too. Following that, we also overhaul the NAT
translation tests that we have in `ip_packet::proptests`. Those now use
the more low-level `consume_to_ipX` APIs which makes the tests more
ergonomic to write.
In the future, we should upstream `Ipv4HeaderSliceMut` and
`Ipv6HeaderSliceMut` to `etherparse`.
Moving all of this functionality to `etherparse` will make it easier to
write tests that involve more IP packets as well as customise the
behaviour of our NAT.
Related: #5614.
Related: #6371.
Related: #6353.
In the `tunnel_test` test suite, we send ICMP requests with arbitrary
sequence numbers and identifiers. Due to the NAT implementation of the
gateway, the sequence number and identifier chosen by the client are not
necessarily the same as the ones sent to the resource. Thus, it is
impossible to correlate the ICMP packets sent by the client with the
ones arriving at the gateway.
Currently, our test suite thus relies on the ordering of packets to
match them up and assert properties on them, like whether they target
the correct resource. As soon as we want to send multiple packets
concurrently, this order is not necessarily stable.
ICMP echo requests can contain an arbitrary payload. We utilise this
payload to embed a random u64 that acts as the unique identifier of an
ICMP request. This allows us to correlate the packets arriving at the
gateway with the ones sent by the client, making the test suite more
robust and ready for handling concurrent ICMP packets.