Currently, the relays eBPF module only supports routing from IPv4 to
IPv4 as well as IPv6 to IPv6. In general, TURN servers can also route
from IPv4 to IPv6 and vice versa. Our userspace routing supports that
but doing the same in the eBPF code is a bit more involved. We'd need to
move around the headers a bit more (IPv4 and IPv6 headers are different
in size), as well as configure the respective "source" address for each
interface. Currently, we simply take the destination address of the
incoming packet as the new source address. When routing across IP
versions, that doesn't work.
To gain some more insight into how often this happens, we add these
additional maps and populate them. This allows us to emit a dedicated
log message whenever we encounter a packet for such a mapping.
First, we always do check for an entry in the maps that we can handle.
If we can't we check the other map and special-case the error.
Otherwise, we fall back to the previous "no entry" error. We shouldn't
really see these "no entry" errors anymore now, unless someone starts
probing our relays for active channels.
Our idle connection detection works based on incoming and outgoing
packets, whichever one happened later. If we have not received or sent
packets for longer than `MAX_IDLE`, we transition into idle mode where
we configure our ICE agent to only send binding requests every 60
seconds.
Our ICE timeout in non-idle mode is just north of 10 seconds (the
formula is a bit tricky so don't have the accurate number). This can
cause a problem whenever a Gateway disappears. We leave the idle mode as
soon as we send a packet through the Gateway. Thus, what we intended to
happen is that, as long as you keep trying to connect to the Gateway, we
will leave the idle mode, increase our rate of STUN bindings through the
ICE agent and detect within ~10s that the Gateway is gone.
What actually happens is that, IF whatever resource you are trying to
talk to is a DNS resource (which is very likely) and the application
starts off with a DNS query, then we will reset the local DNS resource
NAT state and ping the Gateway to set up the NAT again (we do this to
ensure we don't have stale DNS entries on the Gateway). This message is
only sent once and all other packets are buffered. Thus, the connection
will go back to idle before the newly sent STUN binding requests can
determine that the connection is actually broken.
Resolves: #8551
At present, the eBPF code assumes that the incoming packet needs to be
sent back to the same MAC address that it came from. This is only true
if there is at least one IP layer hop in-between the relay and the
Client / Gateway. When setting up Firezone in my local LAN to debug the
eBPF code, all components are within the same subnet and thus can send
packets directly to each other, without having to go through the router.
In such a scenario, simply swapping the Ethernet addresses is not
correct.
As part of witnessing traffic coming in via the network, we can build up
a mapping of IP to MAC address. This mapping can then later be used to
set the correct MAC address for a given destination IP. All of this
functions entirely without interaction from userspace.
Unless you are running in a LAN environment, most if not all IPs will
point to the same MAC address (the one of the next IP layer hop, i.e.
the router). For the very first packet that we want to relay, we will
not have a MAC address for the destination IP. This doesn't matter
though, we simply pass that packet up to userspace and handle it there.
Pretty much all communication on the Internet is bi-directional because
you need some kind of ACK. As soon as we receive the first ACK, e.g. the
response to a binding request, we will learn the MAC address for the
given target IP and the eBPF router can kick in for all packets going
forward.
Related: #7518
When reading through these modules, it's helpful to know that the actual
sync data update doesn't occur more often than 10 minutes due to a
database check.
The adapter itself isn't enabled in the UI on prod, but the background
job to sync mock data was. This prevents the job from being started and
emitting log noise into production logs.
The UDP checksum also includes the entire payload. Removing and adding
bytes to the payload therefore needs to be reflected in the checksum
update that we perform. When we add the channel data header, we need to
add the bytes to the checksum and when we remove them, they need to be
removed.
Related: #7518
Currently, the eBPF code isn't consistent in how it handles XDP actions.
For some cases, we return errors and then map them to `XDP_PASS` or
`XDP_DROP`. For others, we return `Ok(XDP_PASS)`. This is unnecessarily
hard to understand.
We refactor the eBPF kernel to ALWAYS use `Error`s for all code-paths
that don't end in `XDP_TX`, i.e. when we successfully modified the
packet and want to send it back out.
In addition, we also change the way we log these errors. Not all errors
are equal and most `XDP_PASS` actions don't need to be logged. Those
packets are simply passing through.
Finally, we also introduce new checks in case any calls to the eBPF
helper functions fail.
Related: #7518
The lychee action now has a `workingDirectory` argument that makes it
search for a `.lycheeignore` file in that directory. We can use this to
remove the `.lycheeignore` file from our top-level repository tree,
uncluttering that a bit.
At present, the eBPF program would try to pre-allocate around 800MB of
memory for all entries in the maps. This would allow for 1 million
channel mappings. We don't need that many to begin with. Reducing the
max number of channels down to 65536 reduces our memory usage to less
than 100MB.
Related: #7518
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Bumps [autoprefixer](https://github.com/postcss/autoprefixer) from
10.4.20 to 10.4.21.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/postcss/autoprefixer/releases">autoprefixer's
releases</a>.</em></p>
<blockquote>
<h2>10.4.21</h2>
<ul>
<li>Fixed old <code>-moz-</code> prefix for
<code>:placeholder-shown</code> (by <a
href="https://github.com/Marukome0743"><code>@Marukome0743</code></a>).</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/postcss/autoprefixer/blob/main/CHANGELOG.md">autoprefixer's
changelog</a>.</em></p>
<blockquote>
<h2>10.4.21</h2>
<ul>
<li>Fixed old <code>-moz-</code> prefix for
<code>:placeholder-shown</code> (by <a
href="https://github.com/Marukome0743"><code>@Marukome0743</code></a>).</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="541295c0e6"><code>541295c</code></a>
Release 10.4.21 version</li>
<li><a
href="8d555f7e5e"><code>8d555f7</code></a>
Update dependencies and sort imports</li>
<li><a
href="5c2421e82a"><code>5c2421e</code></a>
Update Node.js and pnpm on CI</li>
<li><a
href="af9cb5f365"><code>af9cb5f</code></a>
fix: replace <code>:-moz-placeholder-shown</code> with
<code>:-moz-placeholder</code> (<a
href="https://redirect.github.com/postcss/autoprefixer/issues/1532">#1532</a>)</li>
<li>See full diff in <a
href="https://github.com/postcss/autoprefixer/compare/10.4.20...10.4.21">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Perf events are designed to be an extremely efficient way of
transferring data from an eBPF kernel to the user-space program. In
order to monitor, how much traffic we are actually relaying via eBPF, we
introduce a dedicated `STATS` map that is a `PerfEventArray`.
The events from that array are read asynchronously in user-space and fed
into our OTEL metrics. They will show up in our Google Cloud metrics as
`data_relayed_ebpf_bytes`. We already have a metric for the total
relayed bytes. That counter is renamed to `data_relayed_userspace_bytes`
so we can clearly differentiate the two.
This fills in the boilerplate for handling IPv6 packets in the eBPF
code. Unfortunately, we cannot add an integration test for this because
IPv6 doesn't have a checksum and thus doesn't allow the UDP checksum to
be set to 0. Because Linux (and other OSs too I'd assume) offload UDP
checksumming to the NIC yet on the loopback interface, the packets never
get to the NIC, our eBPF code sees only a partial checksum and can thus
updates the checksum incorrectly.
Related: #7518
Related: #8502
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Bumps [resolv-conf](https://github.com/hickory-dns/resolv-conf) from
0.7.0 to 0.7.1.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/hickory-dns/resolv-conf/commits">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.28 to 4.5.34.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/releases">clap's
releases</a>.</em></p>
<blockquote>
<h2>v4.5.34</h2>
<h2>[4.5.34] - 2025-03-27</h2>
<h3>Fixes</h3>
<ul>
<li><em>(help)</em> Don't add extra blank lines with
<code>flatten_help(true)</code> and subcommands without arguments</li>
</ul>
<h2>v4.5.33</h2>
<h2>[4.5.33] - 2025-03-26</h2>
<h3>Fixes</h3>
<ul>
<li><em>(error)</em> When showing the usage of a suggestion for an
unknown argument, don't show the group</li>
</ul>
<h2>v4.5.32</h2>
<h2>[4.5.32] - 2025-03-10</h2>
<h3>Features</h3>
<ul>
<li>Add <code>Error::remove</code></li>
</ul>
<h3>Documentation</h3>
<ul>
<li><em>(cookbook)</em> Switch from <code>humantime</code> to
<code>jiff</code></li>
<li><em>(tutorial)</em> Better cover required vs optional</li>
</ul>
<h3>Internal</h3>
<ul>
<li>Update <code>pulldown-cmark</code></li>
</ul>
<h2>v4.5.31</h2>
<h2>[4.5.31] - 2025-02-24</h2>
<h3>Features</h3>
<ul>
<li>Add <code>ValueParserFactory</code> for
<code>Saturating<T></code></li>
</ul>
<h2>v4.5.30</h2>
<h2>[4.5.30] - 2025-02-17</h2>
<h3>Fixes</h3>
<ul>
<li><em>(assert)</em> Allow <code>num_args(0..=1)</code> to be used with
<code>SetTrue</code></li>
<li><em>(assert)</em> Clean up rendering of <code>takes_values</code>
assertions</li>
</ul>
<h2>v4.5.29</h2>
<h2>[4.5.29] - 2025-02-11</h2>
<h3>Fixes</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/blob/master/CHANGELOG.md">clap's
changelog</a>.</em></p>
<blockquote>
<h2>[4.5.34] - 2025-03-27</h2>
<h3>Fixes</h3>
<ul>
<li><em>(help)</em> Don't add extra blank lines with
<code>flatten_help(true)</code> and subcommands without arguments</li>
</ul>
<h2>[4.5.33] - 2025-03-26</h2>
<h3>Fixes</h3>
<ul>
<li><em>(error)</em> When showing the usage of a suggestion for an
unknown argument, don't show the group</li>
</ul>
<h2>[4.5.32] - 2025-03-10</h2>
<h3>Features</h3>
<ul>
<li>Add <code>Error::remove</code></li>
</ul>
<h3>Documentation</h3>
<ul>
<li><em>(cookbook)</em> Switch from <code>humantime</code> to
<code>jiff</code></li>
<li><em>(tutorial)</em> Better cover required vs optional</li>
</ul>
<h3>Internal</h3>
<ul>
<li>Update <code>pulldown-cmark</code></li>
</ul>
<h2>[4.5.31] - 2025-02-24</h2>
<h3>Features</h3>
<ul>
<li>Add <code>ValueParserFactory</code> for
<code>Saturating<T></code></li>
</ul>
<h2>[4.5.30] - 2025-02-17</h2>
<h3>Fixes</h3>
<ul>
<li><em>(assert)</em> Allow <code>num_args(0..=1)</code> to be used with
<code>SetTrue</code></li>
<li><em>(assert)</em> Clean up rendering of <code>takes_values</code>
assertions</li>
</ul>
<h2>[4.5.29] - 2025-02-11</h2>
<h3>Fixes</h3>
<ul>
<li>Change <code>ArgMatches::args_present</code> so not-present flags
are considered not-present (matching the documentation)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5d2cdac3e6"><code>5d2cdac</code></a>
chore: Release</li>
<li><a
href="f1c10ebe58"><code>f1c10eb</code></a>
docs: Update changelog</li>
<li><a
href="a4d1a7fe2b"><code>a4d1a7f</code></a>
chore(ci): Take a break from template updates</li>
<li><a
href="e95ed396c4"><code>e95ed39</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5775">#5775</a>
from vivienm/master</li>
<li><a
href="18f8d4c3f5"><code>18f8d4c</code></a>
chore(deps): Update Rust Stable to v1.82 (<a
href="https://redirect.github.com/clap-rs/clap/issues/5788">#5788</a>)</li>
<li><a
href="f35d8e09fb"><code>f35d8e0</code></a>
Merge pull request <a
href="https://redirect.github.com/clap-rs/clap/issues/5787">#5787</a>
from epage/template</li>
<li><a
href="1389d7d689"><code>1389d7d</code></a>
chore: Update from '_rust/main' template</li>
<li><a
href="dbc9faa79d"><code>dbc9faa</code></a>
chore(ci): Initialize git for template update</li>
<li><a
href="3dac2f3683"><code>3dac2f3</code></a>
chore(ci): Get history for template update</li>
<li><a
href="e1f77dacf1"><code>e1f77da</code></a>
chore(ci): Fix branch for template update</li>
<li>Additional commits viewable in <a
href="https://github.com/clap-rs/clap/compare/clap_complete-v4.5.28...clap_complete-v4.5.34">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [windows-service](https://github.com/mullvad/windows-service-rs)
from 0.7.0 to 0.8.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/mullvad/windows-service-rs/blob/main/CHANGELOG.md">windows-service's
changelog</a>.</em></p>
<blockquote>
<h2>[0.8.0] - 2025-02-19</h2>
<h3>Added</h3>
<ul>
<li>Add missing ServiceAccess flags <code>READ_CONTROL</code>,
<code>WRITE_DAC</code> and <code>WRITE_OWNER</code>.</li>
</ul>
<h3>Changed</h3>
<ul>
<li>Upgrade <code>windows-sys</code> dependency to 0.59 and bump the
MSRV to 1.60.0.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ffaaf80ae3"><code>ffaaf80</code></a>
Bump version to 0.8.0 and add changelog</li>
<li><a
href="c6afc56e86"><code>c6afc56</code></a>
Bump windows-sys version to 0.59</li>
<li><a
href="96efa4ee71"><code>96efa4e</code></a>
Merge commit '9dc8af8'</li>
<li><a
href="9dc8af8513"><code>9dc8af8</code></a>
Add missing standard access rights</li>
<li>See full diff in <a
href="https://github.com/mullvad/windows-service-rs/compare/v0.7.0...v0.8.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
When a customer signs up for Starter or Team, we don't enable tax
calculation by default. This means customers can upgrade to Team, start
paying invoices, and we won't collect taxes.
This creates a management issue and possible tax liability since I need
to manually reconcile these.
Instead, since we have Stripe Tax configured on our account, we can
enable automatic tax calculation when the subscription is created. Any
products (Starter/Team/Enterprise) therefore in the subscription will
automatically collect tax appropriately.
In most cases in the US, the tax rate is 0. In EU transactions, for B2B
sales, the tax rate for us is also 0 (reverse charge basis). If we sell
a Team subscription to an individual, however, we need to collect VAT.
There doesn't seem to be a way to block consumer EU transactions in
Stripe, so we'll likely need to register for VAT in the EU if we cross
the reporting threshold.
Bumps
[actions/upload-artifact](https://github.com/actions/upload-artifact)
from 4.6.1 to 4.6.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/upload-artifact/releases">actions/upload-artifact's
releases</a>.</em></p>
<blockquote>
<h2>v4.6.2</h2>
<h2>What's Changed</h2>
<ul>
<li>Update to use artifact 2.3.2 package & prepare for new
upload-artifact release by <a
href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/upload-artifact/pull/685">actions/upload-artifact#685</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/upload-artifact/pull/685">actions/upload-artifact#685</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/upload-artifact/compare/v4...v4.6.2">https://github.com/actions/upload-artifact/compare/v4...v4.6.2</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="ea165f8d65"><code>ea165f8</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/upload-artifact/issues/685">#685</a>
from salmanmkc/salmanmkc/3-new-upload-artifacts-release</li>
<li><a
href="08396203c1"><code>0839620</code></a>
Prepare for new release of actions/upload-artifact with new toolkit
cache ver...</li>
<li>See full diff in <a
href="4cec3d8aa0...ea165f8d65">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This PR implements the "reverse path" of handling TURN traffic, i.e. UDP
datagrams that arrive on an allocation port and need to be wrapped in a
channel-data message to be sent to the TURN client.
In order to achieve that, I had to rewrite most of the TURN code to not
use the `etherparse` crate. I couldn't quite figure out the details but
the eBPF verifier rejected my code in mysterious ways that I didn't
understand. Commenting out random code-paths seemed to make it happy but
all code-paths combined caused an error. Eventually, I decided that we
simply have to use less abstractions to implement the same logic.
All the "parsing" code is now using types inspired by `network-types`.
The only modification here is that we use byte-arrays within our structs
in order to directly receive them in big-endian ordering.
`network-types` uses `u16`s and `u32`s which get interpreted as
little-endian on x86. Instead of converting around between the
endianness, constructing those values where we want them using the right
endianness is deemed much simpler. I opened an issue with upstream which
- if accepted - will allow us to remove our own structs and instead
depend on upstream again.
I also had to aggressively add `#[inline(always)]` to several functions,
otherwise the compiler would not optimise away our function calls,
causing the linker and / or eBPF verifier to fail.
This PR also fixes numerous bugs that I've found in the already existing
eBPF code. The number of bugs makes me question how this has been
working so far at all!
- We did not swap the Ethernet source and destination MAC address when
re-routing the packet. The integration-test didn't catch this because it
only operates on the loopback interface. Further testing on staging
should allow us to confirm that this is indeed working now.
- The UDP checksum update did not incorporate the new src and dst port.
The integration-test didnt' catch that because it has UDP checksumming
disabled. We need to have that disabled in the test because UDP
checksumming is typically offloaded to the NIC and packets on the
loopback interface never leave the device.
Related: https://github.com/vadorovsky/network-types/issues/32.
Related: #7518
As part of iterating on #8496, the API of `relay::Server` had changed
and I had commented out the regression tests to move quicker. In later
iterations, those API changes were reverted but I forgot to uncomment
them.
A regression was introduced in d0f0de0f8d
whereupon we started using the updated policy record for broadcasting
the `delete_policy` and `expire_flows` events. This caused a security
issue because if the actor group changed from `Everyone` to `thomas`,
for example, we'd only expire flows and broadcast policy removal (i.e.
resource removal) events for `thomas`, and `Everyone` would still have
access granted by the old policy.
To fix this, we broadcast the destructive events to the old policy, so
that its `actor_group_id` and `resource_id` are used, and not the new
policy's.
Fixes#8549
As part of working on https://github.com/aya-rs/aya/pull/1228, which I
am depending on in here I had to force-push which will break CI. Opening
this to fix it.
It seems that this cannot be higher than the number of vCPUs in the
instance.
```
Instance 'relay-7h8s' creation failed: Invalid value for field 'resource.networkInterfaces[0].queueCount': '4'. Networking queue number is invalid: '4'. (when acting as '85623168602@cloudservices.gserviceaccount.com')
```
The `gve` driver defaults to setting the active queue count equal to the
max queue count.
We need this to be half or lower for XDP eBPF programs to load.
Related: #8538
By default, GCP VMs have a max RX/TX queue count of `1`. While this is a
fine default, it causes XDP programs to fail to load onto the virtual
NIC with the following error:
```
gve 0000:00:04.0 eth0: XDP load failed: The number of configured RX queues 1 should be equal to the number of configured TX queues 1 and the number of configured RX/TX queues should be less than or equal to half the maximum number of RX/TX queues 1
```
To fix this, we can bump the maximum queue count to `2` (the max support
by gVNIC is 16), allowing the current queue count of `1` to satisfy the
condition.
## Abstract
This pull-request implements the first stage of off-loading routing of
TURN data channel messages to the kernel via an eBPF XDP program. In
particular, the eBPF kernel implemented here **only** handles the
decapsulation of IPv4 data channel messages into their embedded UDP
payload. Implementation of other data paths, such as the receiving of
UDP traffic on an allocation and wrapping it in a TURN channel data
message is deferred to a later point for reasons explained further down.
As it stands, this PR implements the bare minimum for us to start
experimenting and benefiting from eBPF. It is already massive as it is
due to the infrastructure required for actually doing this. Let's dive
into it!
## A refresher on TURN channel-data messages
TURN specifies a channel-data message for relaying data between two
peers. A channel data message has a fixed 4-byte header:
- The first two bytes specify the channel number
- The second two bytes specify the length of the encapsulated payload
Like all TURN traffic, channel data messages run over UDP by default,
meaning this header sits at the very front of the UDP payload. This will
be important later.
After making an allocation with a TURN server (i.e. reserving a port on
the TURN server's interfaces), a TURN client can bind channels on that
allocation. As such, channel numbers are scoped to a client's
allocation. Channel numbers are allocated by the client within a given
range (0x4000 - 0x4FFF). When binding a channel, the client specifies
the remote's peer address that they'd like the data sent on the channel
to be sent to.
Given this setup, when a TURN server receives a channel data message, it
first looks at the sender's IP + port to infer the allocation (a client
can only ever have 1 allocation at a time). Within that allocation, the
server then looks for the channel number and retrieves the target socket
address from that. The allocation itself is a port on the relay's
interface. With that, we can now "unpack" the payload of the channel
data message and rewrite it to the new receiver:
- The new source IP can be set from the old dst IP (when operating in
user-space mode this is irrelevant because we are working with the
socket API).
- The new source port is the client's allocation.
- The new destination IP is retrieved from the mapping retrieved via the
channel number.
- The new destination port is retrieved from the mapping retrieved via
the channel number.
Last but not least, all that is left is removing the channel data header
from the UDP payload and we can send out the packet. In other words, we
need to cut off the first 4 bytes of the UDP payload.
## User-space relaying
At present, we implement the above flow in user-space. This is tricky to
do because we need to bind _many_ sockets, one for each possible
allocation port (of which there can be 16383). The actual work to be
done on these packets is also extremely minimal. All we do is cut off
(or add on) the data-channel header. Benchmarks show that we spend
pretty much all of our time copying data between user-space and
kernel-space. Cutting this out should give us a massive increase in
performance.
## Implementing an eBPF XDP TURN router
eBPF has been shown to be a very efficient way of speeding up a TURN
server [0]. After many failed experiments (e.g. using TC instead of XDP)
and countless rabbit-holes, we have also arrived at the design
documented within the paper. Most notably:
- The eBPF program is entirely optional. We try to load it on startup,
but if that fails, we will simply use the user-space mode.
- Retaining the user-space mode is also important because under certain
circumstances, the eBPF kernel needs to pass on the packet, for example,
when receiving IPv4 packets with options. Those make the header
dynamically-sized which makes further processing difficult because the
eBPF verifier disallows indexing into the packet with data derived from
the packet itself.
- In order to add/remove the channel-data header, we shift the packet
headers backwards / forwards and leave the payload in place as the
packet headers are constant in size and can thus easily and cheaply be
copied out.
In order to perform the relaying flow explained above, we introduce maps
that are shared with user-space. These maps go from a tuple of
(client-socket, channel-number) to a tuple of (allocation-port,
peer-socket) and thus give us all the data necessary to rewrite the
packet.
## Integration with our relay
Last but not least, to actually integrate the eBPF kernel with our
relay, we need to extend the `Server` with two more events so we can
learn, when channel bindings are created and when they expire. Using
these events, we can then update the eBPF maps accordingly and therefore
influence the routing behaviour in the kernel.
## Scope
What is implemented here is only one of several possible data paths.
Implementing the others isn't conceptually difficult but it does
increase the scope. Landing something that already works allows us to
gain experience running it in staging (and possibly production).
Additionally, I've hit some issues with the eBPF verifier when adding
more codepaths to the kernel. I expect those to be possible to resolve
given sufficient debugging but I'd like to do so after merging this.
---
Depends-On: #8506
Depends-On: #8507
Depends-On: #8500Resolves: #8501
[0]: https://dl.acm.org/doi/pdf/10.1145/3609021.3609296
Unfortunately, the cwd I set for the action didn't seem to apply so it
checked the links for the entire repo instead which - together with the
`--base` setting, produces a lot of errors for relative links.
In addition, lychee doesn't currently support having the `.lycheeignore`
file in a subdirectory (see related link), meaning we unfortunately have
to put yet another dot file in the root of our repository.
Related: https://github.com/lycheeverse/lychee-action/issues/205
EdgeShark is extremely useful if you want to attach WireShark to a TUN
device within a container. So far, I've just run this ad-hoc next to our
setup whenever I needed to debug something but I think it is actually
worthwhile adding permanently so it is just there when you need it.
At present, the Gateway implements a NAT64 conversion that can convert
IPv4 packets to IPv6 and vice versa. Doing this efficiently creates a
fair amount of complexity within our `ip-packet` crate. In addition,
routing ICMP errors back through our NAT is also complicated by this
because we may have to translate the packet embedded in the ICMP error
as well.
The NAT64 module was originally conceived as a result of the new stub
resolver-based DNS architecture. When the Client resolves IPs for a
domain, it doesn't know whether the domain will actually resolve to IPv4
AND IPv6 addresses so it simply assigns 4 of each to every domain. Thus,
when receiving an IPv6 packet for such a DNS resource, the Gateway may
only have IPv4 addresses available and can therefore not route the
packet (unless it translates it).
This problem is not novel. In fact, an IP being unroutable or a
particular route disappearing happens all the time on the Internet. ICMP
was conceived to handle this problem and it is doing a pretty good job
at it. We can make use of that and simply return an ICMP unreachable
error back to the client whenever it picks an IP that we cannot map to
one that we resolved.
In this PR, we leave all of the NAT64 code intact and only add a
feature-flag that - when active - sends aforementioned ICMP error. While
offline (and thus also for our tests), the feature-flag evaluates to
false. It is however set to `true` in the backend, meaning on staging
and later in production, we will send these ICMP errors.
Once this is rolled out and indeed proving to be working as intended, we
can simplify our codebase and rip out the NAT64 module. At that point,
we will also have to adapt the test-suite.