Previously, we would lose one message to the portal upon failing to send
it. We now mitigate this in two ways:
1. We also check the error from `poll_ready` and don't even pop a
message off from our buffer.
2. If sending still fails, we re-queue it to the front of the buffer.
In certain scenarios as discovered in logs from #4058, this might have
caused a loss of the "answer" message from a gateway to the client,
resulting in a state mismatch where the gateway thinks the connection is
established and the client times out on waiting for the answer.
This updates connlib to follow the new guidelines described in #4262. I
only made the bare-minimum changes to the clients. With these changes
`reconnect` should only be called when the network interface actually
changed, meaning clients have to be updated to reflect that.
Currently, a failure during DNS resolution results in the client hanging
during the connection setup. Instead, we fall back to an empty list
which results in an empty DNS query result for the client.
That in turn will make most application consider the DNS request failed.
As far as I know, we don't currently retry these DNS requests, meaning a
user would have to sign-in and out again to fix this state.
Whilst not ideal, I think this is a better behaviour and what we
currently have where the initial connection just hangs.
Currently, we need to wait for the timeout of the current candidate pair
during `reconnect` before we nominate a new one. To speed this up, we
can preemptively invalidate candidates we have previously discovered via
our `Allocation`s, i.e. relay candidates and srflx candidates.
This is done to fix a bug where the file descriptor is unregistered from
the reactor after the new `Tun` struct is created if the old one is
dropped after.
Running as sudo / root causes a lot of problems for GUI programs, so
we're unwinding that. In this case we can go back to using Tauri's "open
URL" function, which is great.
Closes#4103
Refs #3713
Affects #3972 - I was finally able to debug it because it came up
constantly during this PR
Refs #3713
```[tasklist]
### Before merging
- [ ] Is 'firezone-client-tunnel' okay for the binary name?
- [ ] Using a library and building it as two binaries is correct, right? `cargo run -p firezone-client-tunnel` takes 1 second. `cargo run -p firezone-gui-client --bin firezone-client-tunnel` takes 1m42s because it builds all the GUI deps.
```
This was discovered as part of
https://github.com/firezone/firezone/pull/4213. When we reconnect to the
portal, we first need to join the correct room before sending any
messages to it. For example, as a client, we need to join the `client`
room before sending messages in it.
This implementation is meant to be a quick fix. The "proper" solution
would be to keep track of which rooms we have joined and reset that upon
reconnect. Introducing such a state machine is a much larger refactoring
that is likely not going to make much of a difference for now because we
only join a fixed number of rooms and that will usually succeed.
I thought this was going to use `cargo-deb` but it was actually easy
with the Tauri deb bundling we already use.
```[tasklist]
### Before merging
- [x] Make sure every file in the Tauri deb is also in our deb (e.g. icons)
```
Currently, an error returned by `Tunnel::poll_next_event` is only
logged. In other words, they are never fatal. This creates a tricky to
understand relationship on what kind of errors should be returned from
callbacks. Because connlib is used on multiple operating systems, it has
no idea how fatal a particular error is.
This PR removes all of these `Result` return values with the following
consequences:
- For Android, we now panic when a callback fails. This is a slight
change in behaviour. I believe that previously, any exception thrown by
a callback into Android was caught and returned as an error. Now, we
panic because in the FFI layer, we don't have any information on how
fatal the error is. For non-fatal errors, the Android app should simply
not throw an exception. The panics will cause the connlib task to be
shut down which triggers an `on_disconnect`.
- For Swift, there is no behaviour change. The FFI layer already did not
support `Result`s for those callbacks. I don't know how exceptions from
Swift are translated across the FFI layer but there is no change to what
we had before.
- For the Tauri client:
- I chose to log errors on ERROR level and continue gracefully for the
DNS resolvers.
- We panic in case the controller channel is full / closed. That should
really never happen in practice though unless we are currently shutting
down the app.
Resolves: #4064.
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.2 to 4.5.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/releases">clap's
releases</a>.</em></p>
<blockquote>
<h2>v4.5.3</h2>
<h2>[4.5.3] - 2024-03-15</h2>
<h3>Internal</h3>
<ul>
<li><em>(derive)</em> Update <code>heck</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/clap-rs/clap/blob/master/CHANGELOG.md">clap's
changelog</a>.</em></p>
<blockquote>
<h2>[4.5.3] - 2024-03-15</h2>
<h3>Internal</h3>
<ul>
<li><em>(derive)</em> Update <code>heck</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4e07b43858"><code>4e07b43</code></a>
chore: Release</li>
<li><a
href="8247c7ddf0"><code>8247c7d</code></a>
docs: Update changelog</li>
<li><a
href="677c52ce08"><code>677c52c</code></a>
chore: Update <code>heck</code> requirement (<a
href="https://redirect.github.com/clap-rs/clap/issues/5396">#5396</a>)</li>
<li>See full diff in <a
href="https://github.com/clap-rs/clap/compare/v4.5.2...v4.5.3">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
AppImages won't work with process splitting. (#3713)
As far as I can tell, they just produce one binary. Internally they use
FUSE or something to mount a squashfs image, but that image won't be
able to hook into systemd and run with root permissions and everything.
I don't think it's practical, and Tauri's AppImage bundling doesn't have
the features for it.
Even their deb bundler doesn't have any way to specify a path for a
daemon to be installed. The sidecar feature only seems intended for the
GUI app to call, not anything else on the system.
(There is such a thing as installing AppImages, but I don't think it's
worth pursuing - We should just do debs)
Closes#3699 if successful
Ref #3972
I don't understand why it started working. There's at least 3
possibilities:
- Some unrelated change in the last few weeks fixed it (Maybe bumping
Tauri to 1.6.1? https://github.com/firezone/firezone/pull/3881)
- It was a bug in the Github CI runner image that they fixed
- It's an awful race condition and adding `tracing::debug!` fixed it
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Previously, we used `SocketState::send` without wrapping it in
`UdpSocket::try_io`. This meant that tokio had no chance of clearing the
readiness flag on the socket when we actually failed to send a packet,
resulting in many log messages like this:
```
Tunnel error: Resource temporarily unavailable (os error 11)
```
This PR refactors how we send UDP packets and when we read IP packet
from the device. Instead of just polling for send-readiness, we flush
all buffered packets and _then_ check for send-readiness. That will only
succeed if we managed to send all buffered packets and the socket still
has space for more packets.
Typically, this buffer only has 1-2 packets. That is because we
currently only ever read a single packet from the device. See #4139 for
how this might change. It may have more packets when our `Allocation`s
emit some (like multiple channel bindings in a row). Because we enforce
further send-readiness before continuing, this buffer cannot grow
unbounded.
Resolves: #3931.
In the future we will want to refactor this to a builder pattern to
prevent the number of parameters from growing and have them clearer but
this works simply for now.
Found while discussing #4174
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
With the move to SANS-IO, we will be able to write deterministic unit
tests for the tunnel logic. To actually do that, `ClientState` and
`GatewayState` need to encapsulate all the logic that we want to test.
This PR does some minor refactoring on the functions on `ClientTunnel`
and moves several of them onto `ClientState`. It doesn't touch
`add_resources` and `remove_resource` because those depend on #4156.
Bumps [anyhow](https://github.com/dtolnay/anyhow) from 1.0.80 to 1.0.81.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/dtolnay/anyhow/releases">anyhow's
releases</a>.</em></p>
<blockquote>
<h2>1.0.81</h2>
<ul>
<li>Make backtrace support available when using -Dwarnings (<a
href="https://redirect.github.com/dtolnay/anyhow/issues/354">#354</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4aad4edebd"><code>4aad4ed</code></a>
Release 1.0.81</li>
<li><a
href="8be90917c6"><code>8be9091</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/anyhow/issues/354">#354</a>
from dtolnay/deadcode</li>
<li><a
href="a2eb7dd5e1"><code>a2eb7dd</code></a>
Make compatible with -Dwarnings</li>
<li>See full diff in <a
href="https://github.com/dtolnay/anyhow/compare/1.0.80...1.0.81">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This isn't really user-facing, so I marked it down from `feat` to
`chore`. Closes#3817
- If we exit gracefully, `/etc/resolv.conf` is reverted
- We always keep the `.before-firezone` backup in case we lose power and
the revert transaction is corrupted or rolled back
- We use a magic header to detect whether the last run was a crash or
not. If Firezone crashes and the user wants to modify their default DNS,
they need to delete that header so that Firezone won't accidentally
revert its backup and trash their change.
- All error variants for this module replaced with `anyhow::Error` since
they were never matched by callers.
I ran `cargo mutants` locally and it helped me validate the unit tests
and it picked up a `match` branch that I forgot to delete.
```[tasklist]
- [x] (Failed: Integration tests didn't like it) ~~Add the system default resolvers below Firezone's sentinels~~
- [x] `tracing::info` "Last run crashed" if we have to revert the file at startup
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Internal (Not private, just un-interesting to most users) docs and
research to explain the DNS control methods.
I think Jamil was right, we should revert `/etc/resolv.conf` on exit in
case it's used on some minimal Debian kitten. We can keep that and the
`systemd-resolved` method around to support desktop Ubuntu. Everything
else is going to be "When someone needs it".
This opens the door for unit testing `ClientState` with a
`GatewayState`, similarly as we have a test for a `ClientNode` and
`ServerNode` in `snownet`.
Before we can do that though, we need to move several functions from
`ClientTunnel` onto `ClientState`, i.e. essentially encapsulate
`ClientState` better. This is left to a future PR though to keep the
steps small.
Resolves: #3928.
Bumps
[follow-redirects](https://github.com/follow-redirects/follow-redirects)
from 1.15.5 to 1.15.6.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="35a517c586"><code>35a517c</code></a>
Release version 1.15.6 of the npm package.</li>
<li><a
href="c4f847f851"><code>c4f847f</code></a>
Drop Proxy-Authorization across hosts.</li>
<li><a
href="8526b4a1b2"><code>8526b4a</code></a>
Use GitHub for disclosure.</li>
<li>See full diff in <a
href="https://github.com/follow-redirects/follow-redirects/compare/v1.15.5...v1.15.6">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/firezone/firezone/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
I ended up calling it `reconnect` because that is really what we are
doing:
- We reconnect to the portal.
- We "reconnect" to all relays, i.e. refresh the allocations.
I decided **not** to use an ICE restart. An ICE restart clears the local
as well as the remote credentials, meaning we would need to run another
instance of the signalling protocol. The current control plane does not
support this and it is also unnecessary in our situation. In the case of
an actual network change (e.g. WiFI to cellular), refreshing of the
allocations will turn up new candidates as that is how we discovered our
original ones in the first place. Because we constantly operate in ICE
trickle mode, those will be sent to the remote via the control plane and
we start testing them.
As those new paths become available, str0m will automatically nominate
them in case the current one runs into an ICE timeout. Here is a
screen-recording of the Linux CLI client where `Session::refresh` is
triggered via the SIGHUP signal:
[Screencast from 2024-03-14
11-16-47.webm](https://github.com/firezone/firezone/assets/5486389/7171d199-f2a2-4b22-92c8-243494d5d6d8)
Provides the infrastructure for: #4028.
Fixes the compile warning in macOS for the `version-check` CI job.
Removes some error variants that were never matched on, folding them
into `anyhow::Error`s
Using the current thread in apple was causing a crashloop, since
connlib's thread was taken down by the network extension after
`WrappedSession::connect` returned.
Now we force the runtime to create the thread to prevent it from being
taken down.