- Attaching the standalone client needs to happen on `main` runs, like
the other clients
- GitHub can't seem to find the release. I suspect the
`GITHUB_REPOSITORY` var is unneeded.
Developer ID certificates are precious. Apple only allows a limited
number of them per account, and once generated, they cannot be revoked.
They are also not compatible with automatic signing and provisioning in
Xcode due in part to the above reasons.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
The current way this is implemented is a bit tricky to read. By
splitting out a dedicated function and adding some logging, it becomes
more apparent what we do here.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Bumps [time](https://github.com/time-rs/time) from 0.3.36 to 0.3.37.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/time-rs/time/releases">time's
releases</a>.</em></p>
<blockquote>
<h2>v0.3.37</h2>
<p>See the <a
href="https://github.com/time-rs/time/blob/main/CHANGELOG.md">changelog</a>
for details.</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/time-rs/time/blob/main/CHANGELOG.md">time's
changelog</a>.</em></p>
<blockquote>
<h2>0.3.37 [2024-12-03]</h2>
<h3>Added</h3>
<ul>
<li><code>Time::MAX</code>, equivalent to
<code>time!(23:59:59.999999999)</code></li>
<li><code>[year repr:century]</code> is now supported in format
descriptions. When used in conjunction with
<code>[year repr:last_two]</code>, there is sufficient information to
parse a date. Note that with the
<code>large-date</code> feature enabled, there is an ambiguity when
parsing the two back-to-back.</li>
<li>Parsing of <code>strftime</code>-style format descriptions, located
at
<code>time::format_description::parse_strftime_borrowed</code> and
<code>time::format_description::parse_strftime_owned</code></li>
<li><code>time::util::refresh_tz</code> and
<code>time::util::refresh_tz_unchecked</code>, which updates information
obtained via the <code>TZ</code> environment variable. This is
equivalent to the <code>tzset</code> syscall on Unix-like
systems, with and without built-in soundness checks, respectively.</li>
<li><code>Month::length</code> and <code>util::days_in_month</code>,
replacing <code>util::days_in_year_month</code>.</li>
<li>Expressions are permitted in
<code>time::serde::format_description!</code> rather than only paths.
This also
drastically improves diagnostics when an invalid value is provided.</li>
</ul>
<h3>Changed</h3>
<ul>
<li>
<p>Obtaining the system UTC offset on Unix-like systems should now
succeed when multi-threaded.
However, if the <code>TZ</code> environment variable is altered, the
program will not be aware of this until
<code>time::util::refresh_tz</code> or
<code>time::util::refresh_tz_unchecked</code> is called.
<code>refresh_tz</code> has the
same soundness requirements as obtaining the system UTC offset
previously did, with the
requirements still being automatically enforced.
<code>refresh_tz_unchecked</code> does not enforce these
requirements at the expense of being <code>unsafe</code>. Most programs
should not need to call either
function.</p>
<p>Due to this change, the <code>time::util::local_offset</code> module
has been deprecated in its entirety. The
<code>get_soundness</code> and <code>set_soundness</code> functions are
now no-ops.</p>
<p>Note that while calls <em>should</em> succeed, success is not
guaranteed in any situation. Downstream
users should always be prepared to handle the error case.</p>
</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>Floating point values are truncated, not rounded, when
formatting.</li>
<li>RFC3339 allows arbitrary separators between the date and time
components.</li>
<li>Serialization of negative <code>Duration</code>s less than one
second is now correct. It previously omitted
the negative sign.</li>
<li><code>From<js_sys::Date> for OffsetDateTime</code> now ensures
sub-millisecond values are not erroneously
returned.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="d4e39b306d"><code>d4e39b3</code></a>
v0.3.37 release</li>
<li><a
href="09439970e5"><code>0943997</code></a>
Fix CI failure</li>
<li><a
href="8b50f04ee0"><code>8b50f04</code></a>
Update lints</li>
<li><a
href="56f1db6dfa"><code>56f1db6</code></a>
Add <code>Month::length</code>, <code>days_in_month</code></li>
<li><a
href="03bcfe9f28"><code>03bcfe9</code></a>
Skip formatting some macros, update UI tests</li>
<li><a
href="4404638fe2"><code>4404638</code></a>
Permit exprs in <code>serde::format_description!</code></li>
<li><a
href="6b43b44060"><code>6b43b44</code></a>
strftime implementation</li>
<li><a
href="98569ffe5b"><code>98569ff</code></a>
Hide deprecations from docs</li>
<li><a
href="febf3a10de"><code>febf3a1</code></a>
Obtain local offset in multi-threaded situations</li>
<li><a
href="1e19827c5a"><code>1e19827</code></a>
Update rstest and rstest_reuse; bump MSRV to 1.67.1 (<a
href="https://redirect.github.com/time-rs/time/issues/716">#716</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/time-rs/time/compare/v0.3.36...v0.3.37">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Bumps [divan](https://github.com/nvzqz/divan) from 0.1.14 to 0.1.17.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/nvzqz/divan/blob/main/CHANGELOG.md">divan's
changelog</a>.</em></p>
<blockquote>
<h2>[0.1.17] - 2024-12-04</h2>
<h3>Changed</h3>
<ul>
<li>
<p>Set [MSRV] to 1.80 for [<code>LazyLock</code>] and new
<code>size_of</code> prelude import.</p>
</li>
<li>
<p>Reduced thread pool memory usage by many kilobytes by using
rendezvous
channels instead of array-based channels.</p>
</li>
</ul>
<h2>[0.1.16] - 2024-11-25</h2>
<h3>Added</h3>
<ul>
<li>
<p>Thread pool for reusing threads across multi-threaded benchmarks. The
result
is that when running Divan benchmarks under a sampling profiler, the
profiler's output will be cleaner and easier to understand. (<a
href="https://redirect.github.com/nvzqz/divan/issues/37">#37</a>)</p>
</li>
<li>
<p>Track the maximum number of allocations during a benchmark.</p>
</li>
</ul>
<h3>Changed</h3>
<ul>
<li>
<p>Make private <code>Arg::get</code> trait method not take
<code>self</code>, so that text editors
don't recommend using it. (<a
href="https://redirect.github.com/nvzqz/divan/issues/59">#59</a>)</p>
</li>
<li>
<p>Cache <code>BenchOptions</code> using <code>LazyLock</code> instead
of <code>OnceLock</code>, saving space and
simplifying the implementation.</p>
</li>
</ul>
<h2>[0.1.15] - 2024-10-31</h2>
<h3>Added</h3>
<ul>
<li>
<p>[<code>CyclesCount</code>] counter to display cycle throughput as
Hertz.</p>
</li>
<li>
<p>Track the maximum number of bytes allocated during a benchmark.</p>
</li>
</ul>
<h3>Removed</h3>
<ul>
<li>Remove <code>has_cpuid</code> polyfill due to it no longer being
planned for Rust, since
CPUID is assumed to be available on all old x86 Rust targets.</li>
</ul>
<h3>Fixed</h3>
<ul>
<li>
<p>List generic benchmark type parameter <code>A<4></code> before
<code>A<32></code>. (<a
href="https://redirect.github.com/nvzqz/divan/issues/64">#64</a>)</p>
</li>
<li>
<p>Improve precision by using <code>f64</code> when calculating
allocation count and sizes
for the median samples.</p>
</li>
<li>
<p>Multi-thread allocation counting in <code>sum_alloc_tallies</code> on
macOS was loading a
null pointer instead of the pointer initialized by
<code>sync_threads</code>.</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="899cd5ec69"><code>899cd5e</code></a>
Release v0.1.17</li>
<li><a
href="d450e6de4f"><code>d450e6d</code></a>
Set MSRV to 1.80</li>
<li><a
href="f5ff95b031"><code>f5ff95b</code></a>
Use <code>size_of</code> from the prelude</li>
<li><a
href="d214f882e2"><code>d214f88</code></a>
Allow <code>needless_lifetimes</code> Clippy lint</li>
<li><a
href="5499bc3058"><code>5499bc3</code></a>
Reduce thread pool memory usage</li>
<li><a
href="823b16001f"><code>823b160</code></a>
Add internal benchmark for <code>ThreadPool::broadcast</code></li>
<li><a
href="414ace96ad"><code>414ace9</code></a>
Use more consistent wording in thread pool docs</li>
<li><a
href="99a329b1d4"><code>99a329b</code></a>
add default for BenchArgs</li>
<li><a
href="9800477791"><code>9800477</code></a>
Release v0.1.16</li>
<li><a
href="11a44b8c98"><code>11a44b8</code></a>
Use <code>mach2</code> crate for <code>mach_thread_self</code>
example</li>
<li>Additional commits viewable in <a
href="https://github.com/nvzqz/divan/compare/v0.1.14...v0.1.17">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
The CI swift workflow needs to be updated to accommodate the macOS
standalone build. This required a decent amount of refactoring to make
the Apple build process more maintainable.
Unfortunately this PR ended up being a giant ball of yarn where pulling
on one thread tended to unravel things elsewhere, since building the
Apple artifacts involve multiple interconnected systems. Combined with
the slow iteration of running in CI, I wasn't able to split this PR into
easier to digest commits, so I've annotated the PR as much as I can to
explain what's changed.
The good news is that Apple release artifacts can now be easily built
from a developer's machine with simply
`scripts/build/macos-standalone.sh`. The only thing needed is the proper
provisioning profiles and signing certs installed.
Since this PR is so big already, I'll save the swift/apple/README.md
updates for another PR.
Firezone always attempts to handle IPv4 and IPv6. On Linux systems
without an IPv6 stack, attempts to add an IPv6 route may fail with "Not
supported (os error 95)". We don't need the IPv6 routes on those systems
as we will never receive IPv6 traffic. Therefore, we can safely ignore
these errors and not log them.
Because the macOS standalone app doesn't go through the same vetting
process as the App Store build, it's a good idea to smoke test it
occasionally. This PR adds instructions for doing so.
Standalone distribution requires using a different signing identity
(certificate), set of provisioning profiles, and (annoyingly) requires
the `-systemextension` suffix for our network extension capabilities.
This PR prepares the Xcode environment for building a Standalone app in
CI that will be notarized by matching certificates and provisioning
profiles in our Apple Developer account.
We are receiving multiple reports of message, especially error messages
from relays, where the message integrity check fails. To get more
information as to why, this patch extends this error message with the
attributes of the request and response message.
Firezone's authentication scheme uses deep-links to transfer the secret
token via the login-flow using the browser to the application. Such a
deep-link can be opened multiple times, even if we are already signed
in. In such a case, and in any other where we don't have a pending
sign-in request, we currently generate an error.
This is unnecessary as we can simply discard the token received from the
deep-link.
Since we no longer have access to the tunnel's group container from the
app process, we need to move reading the last VPN stop reason to an IPC
call. This call is used to show an alert to the user when their connlib
session has been disconnected due to 401: Unauthorized.
Fixes#7468
IPv6 treats fragmentation and MTU errors differently than IPv4. Rather
than requiring fragmentation on each hop of a routing path,
fragmentation needs to happen at the packet source and failure to route
a packet triggers an ICMPv6 `PacketTooBig` error.
These need to be translated back through our NAT64 implementation of the
Gateway. Due to the size difference in the headers of IPv4 and IPv6, the
available MTU to the IPv4 packet is 20 bytes _less_ than the MTU
reported by the ICMP error. IPv6 headers are always 40 bytes, meaning if
the MTU is reported as e.g. 1200 on the IPv6 side, we need to only offer
1180 to the IPv4 end of the application. Once the new MTU is then
honored, the packets translated by our NAT64 implementation will still
conform to the required MTU of 1200, despite the overhead introduced by
the translation.
Resolves: #7515.
Bumps [tokio-stream](https://github.com/tokio-rs/tokio) from 0.1.16 to
0.1.17.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="67355c6d23"><code>67355c6</code></a>
chore: prepare tokio-stream v0.1.17 (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7020">#7020</a>)</li>
<li><a
href="405d746d38"><code>405d746</code></a>
signal: remove oneshot channels from tests (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7015">#7015</a>)</li>
<li><a
href="e0d1293fac"><code>e0d1293</code></a>
ci: add instructions that explain how to fix spellcheck errors (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7016">#7016</a>)</li>
<li><a
href="480c010b01"><code>480c010</code></a>
signal: add <code>SignalKind::info</code> on illumos (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/6995">#6995</a>)</li>
<li><a
href="c032ea0203"><code>c032ea0</code></a>
ci: detect trailing whitespace (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7013">#7013</a>)</li>
<li><a
href="0b31c2f73d"><code>0b31c2f</code></a>
chore: prepare tokio-util v0.7.13 (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7012">#7012</a>)</li>
<li><a
href="129f9fc0c8"><code>129f9fc</code></a>
codec: fix incorrect handling of invalid utf-8 in
<code>LinesCodec::decode_eof</code> (#...</li>
<li><a
href="b5c227d51f"><code>b5c227d</code></a>
tracing: move tracing instrumentation tests into tokio tests (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7007">#7007</a>)</li>
<li><a
href="dcae2b9eb8"><code>dcae2b9</code></a>
ci: unfreeze FreeBSD from rustc 1.81 (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7009">#7009</a>)</li>
<li><a
href="bb9d57017e"><code>bb9d570</code></a>
chore: prepare Tokio v1.42.0 (<a
href="https://redirect.github.com/tokio-rs/tokio/issues/7005">#7005</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/tokio-rs/tokio/compare/tokio-stream-0.1.16...tokio-stream-0.1.17">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
In case we fail to update the TUN device with its IPs or set the routes,
we currently just log a warning and continue operation. That isn't
ideal, setting the correct IPs and routes is crucial for the Gateway to
operate correctly.
Additionally, the design of sending the `Interface` through a separately
spawned task is kind of clunky. Instead of having a channel, we
introduce another `FuturesSet` that allows us to process these `async`
tasks inline within the event loop.
Exporting logs on macOS in >= 1.4.0 presents a challenge because the app
process and tunnel process now write to different directories. They do
this because the network extension is now packaged as a system
extension, which means it runs as root.
Since the Apple App Store requires apps to be sandboxed, and we don't
wish to break App Store compatibility just yet, the tunnel process and
app process can't read or write to a common shared directory anymore.
This leaves IPC as the only reliable means for app <-> tunnel
communication. We already use the provided `sendProviderMessage`
function exposed by Network Extension for other operations, so we add
another message type to handle log export.
The log archive from the tunnel needs to be sent in chunks to reduce the
amount of total memory the export process consumes. To facilitate this,
a custom TunnelLogArchive class is used which handles the business logic
of keeping a pointer into the file we're chunking over.
As each chunk is read from the tunnel log archive, we send it using the
`sendProviderMessage`'s completionHandler callback. If the number of
sent bytes is less than our maximum chunk size, we know we're done.
On the app side, we read each chunk of data from the tunnel process
until a `done` boolean is signaled, at which point we close the archive
file we're writing to and complete the export.
Lastly, we need to compress the app and tunnel archives separately, then
compress the final result into a final archive path set by the user.
This is because it's impossible to compress two decoupled directories at
once.
Very little to any noticeable delay is added when performing this over
the previous approach. Exporting multiple GB of log data from the tunnel
took only a second or two on my M1. Presumably, having to read
uncompressed chunks and doing the compression on the app-side would
result in a couple orders of magnitude more IPC calls.
Refs #7071
Since we need to react to state on this instance changing, we need an
instance of TunnelManager to last the lifetime of the app process.
Thus, it makes more sense to use the `.shared` public static pattern to
maintain an instance of it we can use from anywhere instead of having to
store it in various other classes.
In #7477, we introduced a regression in our test suite for DNS queries
that are forwarded through the tunnel.
In order to be deterministic when users configure overlapping CIDR
resources, we use the sort order of all CIDR resource IDs to pick, which
one "wins". To make sure existing connections are not interrupted, this
rule does not apply when we already have a connection to a gateway for a
resource. In other words, if a new CIDR resource (e.g. resource `A`) is
added to connlib that has an overlapping route with another resource
(e.g. resource `B`) but we already have a connection to resource `B`, we
will continue routing traffic for this CIDR range to resource `B`,
despite `A` sorting "before" `B`.
The regression that we introduced was that we did not account for
resources being "connected" after forwarding a query through the tunnel
to it. As a result, in the found failure case, the test suite was
expecting to route the packet to resource `A` because it did not know
that we are connected to resource `B` at the time of processing the ICMP
packet.
Previously, we needed to track our own user state in order to set the
whole thing in Sentry. That was necessary because Sentry didn't allow us
to _retrieve_ the current user of the scope but always required the full
user to be set. This was changed
https://github.com/getsentry/sentry-rust/pull/715, which allows us to
remove some of that code and hopefully mitigating any sort of lingering
state when it comes to telemetry sessions.
Due to how we currently initialise telemetry in the IPC service, I think
we are missing out on events when it _exits_ due to an error because we
don't explicitly stop the telemetry session. We have alerts from a fair
few users in Sentry where the IPC service appears to stop / disappear
but there are no corresponding events for the IPC service.
In case an upstream DNS server responds with a payload that exceeds the
available buffer space of an IP packet, we need to truncate the
response. Currently, this truncation uses the **wrong** constant to
check for the maximum allowed length. Instead of the
`MAX_DATAGRAM_PAYLOAD`, we actually need to check against a limit that
is less than the MTU as the IP layer and the UDP layer both add an
overhead.
To fix this, we introduce such a constant and provide additional
documentation on the remaining ones to hopefully avoid future errors.
The macOS client starting in 1.4.0 uses a system extension for its
network extension package type. This process runs as root and does not
have access to the app's Group Container folder for reading / writing
log files directly, and vice-versa. This means the tunnel now writes its
logs to a separate directory as the GUI app process.
Since the logging functions of clearing logs, calculating their size,
and exporting them assume all the logs are in the same directory, we
need to introduce IPC handlers to ensure the GUI app can conveniently
still perform these functions when initiated by the user.
We already use the Network Extension API `sendProviderMessage` as our
IPC mechanism for adhoc, bi-directional communication through the
tunnel, so we add more handlers to this mechanism to support the logging
functions summarized above.
In this PR we only fix the log size calculation and clear log
functionality. Exporting logs is more involved and will be implemented
in another dedicated PR.
When a Firezone Client roams, we reset all network connections and
rebind our local sockets. Doing that enables us to start from a clean
state and establish new connections to Gateways. What we are currently
not clearing are in-flight DNS queries. Those are all very likely to
fail because our network connection is changing. There is no point in us
keeping those around. Additionally, as part of roaming, it may also be
that our upstream DNS server changes and thus, we may suddenly receive a
response from a DNS server that we no longer know about.
Clear all in-flight DNS queries on reset solves this.
Certain packets cannot be translated as part of NAT64/46. The RFC says
to "Silently drop" those. Currently, we log all errors that happens
during the translation and don't follow this guideline.
Most of these "silently drop" errors are related to ICMP types that
cannot be represented in the other version such as ICMPv6 Neighbor
Solicitation.
To fix this, we introduce a new error type in the `ip_packet` module:
`ImpossibleTranslation`. For convenience reasons, we carry that one
through all layers as an `anyhow::Error` and test at the very top of the
event-loop, whether the root-cause of the error is such a failed
translation. If so, we ignore the error and move on. This isn't as
type-safe as it could be but it is much easier to implement.
Additionally, the risk of a bug here (i.e. if we stop emitting this
error within the IP packet translation layer) is merely that the log
will pop up again.
Resolves: #7516.
When the portal connection in a relay fails, we currently stringify the
error early. This is unnecessary and we should instead retain the full
error chain for as long as possible.
Initially, when we receive a new candidate from a remote peer, we bind a
channel for each remote address on the relay that we sampled. This
ensures that every possible communication path is actually functioning.
In ICE, all candidates are tried against each other, meaning the remote
will attempt to send from each of their candidates to every one of ours,
including our relay candidates. To allow this traffic, a channel needs
to be bound first.
For various reasons, an allocation might become stale or needs to be
otherwise invalidated. In that case, all the channel bindings are lost
but there might still be an active connection that wants to utilise
them. In that case, we will see "No channel" warnings like
https://firezone-inc.sentry.io/issues/6036662614/events/f8375883fd3243a4afbb27c36f253e23/.
To fix this, we use the attempt to encode a message for a channel as an
intent to bind a new one. This is deemed safe because wanting to encode
a message to a peer as a channel data message means we want such a
channel to exist. The first message here is still dropped but that is
better than not establishing the channel at all.
At present, the relay's event-loop simply drops a UDP packet in case the
socket is not ready for writing. This is terrible for throughput because
it means the encapsulated packet within the WG payload needs to be
retransmitted by the source after a timeout. To avoid this, we instead
buffer the packet and suspend the event loop until it has been correctly
flushed out. This may still cause packet loss because the receive buffer
may overflow in the meantime. However, there is nothing we can do about
that because UDP itself doesn't have any backpressure.
The relay listens on many sockets at once via a separate worker thread
and an `mio` event-loop. In addition to the current subscription to
readable event, we now also subscribe to writable events.
At the very top of the relay's event-loop, we insert a `flush` function
that ensures all buffered packets have been written out and - in case
writing a packet fails - suspends the event-loop with a waker. If we
receive a new event for write-readiness, we wake the waker which will
trigger a new call to `Eventloop::poll` where we again try to flush the
pending packet. We don't bother with tracking exactly, which socket sent
the write-readiness and which socket we have still pending packets in.
Instead, we suspend the entire event-loop until all pending packets have
been flushed.
Resolves: #7519.
When deciding what to do with a certain DNS query, we check whether the
domain name in question corresponds to any of the (wildcard) DNS
resource addresses. If yes, we resolve it to the resource ID of that
resource. The source of those resource IDs is the `dns_resources` map.
If we have looked up a `ResourceId` in that map, it is impossible for it
to not be "known" which means the branch deleted in this PR is
completely redundant and already covered by the catch-all branch where
`maybe_resource` is `None`.
When a Gateway or Client is running in an environment without IPv4 or
IPv6 connectivity, our initial probes for sending packets to the relays
will fail with network unreachable. That isn't a very big concern and
happens a lot in the wild. There is no need to report these as telemetry
events.
Resolves: #7514.
For persistent applications like the IPC service, it is possible that
telemetry gets initialised with different parameters depending on what
the user logs in with. Currently, only the first one is persisted and
all consecutive ones are ignored, leading to events that may be wrongly
tagged for a certain user / environment.
To fix this, we only skip the init if we are still in the same
environment. Otherwise, the close the previous session and initialise a
new one.
Fixes: #7525.
In order to achieve concurrency within `connlib`, we needed to create a
way for IP packets to own the piece of memory they are sitting in. This
allows us to concurrently read IP packets and them batch-process them
(as opposed to have a dedicated buffer and reference it). At the moment,
those IP packets are defined on the stack. With a size of ~1300 bytes
that isn't very large but still causes _some_ amount of copying.
We can avoid this copying by relying on a buffer pool:
1. When reading a new IP packet, we request a new buffer from the pool.
2. When the IP packet gets dropped, the buffer gets returned to the
pool.
This allows us to reuse an allocation for a packet once it finished
processing, resulting in less CPU time spent on copying around memory.
This causes us to make more _individual_ heap-allocations in the
beginning: Each packet is being processed by `connlib` is allocated on
the heap somewhere. At some point during the lifetime of the tunnel,
this will settle in an ideal state where we have allocated enough slots
to cover new packets whilst also reusing memory from packets that
finished processing already.
The actual `IpPacket` data type is now just a pointer. As a result, the
channels to and from the TUN thread (where we were holding multiple of
these packets) are now significantly smaller, leading to roughly the
same memory usage overall.
In my local testing on Linux, the client still only uses about ~15MB of
RAM even with multiple concurrent speedtests running.
Similar to #7497, when we receive a `ConnectResult`, we can simply
silently bail out of the function and not change our state instead of
printing a loud warning.