The iperf3 server sometimes hangs, or takes a while to startup.
Rather than trying to reset the iperf3 state between performance tests,
this PR refactors them so they each run in their matrix job. This
ensures each performance test will run on a separate VM, unaffected by
previous test runs to eliminate the effect any residual network buffer
state can have on a particular test.
It also makes sure the server is listening with a `healthcheck`.
Currently, after the refresh timeout the gateway or client starts
looping forever, since after the refresh timeout is reached the
`poll_timeout` will always return `ALLOCATION_LIFETIME/2`(which would be
"now"), since `allocation_lifetime` is never updated.
To fix this, we need to check whether we currently have a refresh
request in flight before queuing a new one. Additionally, we need to
abort refreshing as soon as the lifetime of the allocation expires.
Related: #3617.
Related: #3631.
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Bumps [semver](https://github.com/dtolnay/semver) from 1.0.21 to 1.0.22.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/dtolnay/semver/releases">semver's
releases</a>.</em></p>
<blockquote>
<h2>1.0.22</h2>
<ul>
<li>Fix unused_imports warnings when compiled by rustc 1.78</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c8ad1bf6db"><code>c8ad1bf</code></a>
Release 1.0.22</li>
<li><a
href="f76db8d7f2"><code>f76db8d</code></a>
Resolve redundant import warning</li>
<li><a
href="f32b420f75"><code>f32b420</code></a>
Ignore incompatible_msrv clippy lint for conditionally compiled
code</li>
<li>See full diff in <a
href="https://github.com/dtolnay/semver/compare/1.0.21...1.0.22">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Why:
* On some clients, the web view that is opened to sign-in to Firezone is
left open and ends up getting stuck on the Sign In page with the
liveview loader on the top of the page also stuck and appearing as
though it is waiting for another response. This commit adds a sign-in
success page that is displayed upon successful sign-in and shows a
message to the user that lets them know they can close the window if
needed. If the client device is able to close the web view that was
opened, then the page will either very briefly be shown or will not be
visible at all due to how quickly the redirect happens.
Closes#3608
<img width="625" alt="Screenshot 2024-02-15 at 4 30 57 PM"
src="https://github.com/firezone/firezone/assets/2646332/eb6a5df6-4a4c-4e54-b57c-5da239069ea9">
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
So the cause of the flaky tests is that they aren't waiting long enough
for a connection to be established. Both the test in #3666 and the
`iperf` tests have a timeout of 10 seconds.
Connections _should_ be established **very quickly** in CI. However, I
have a few guesses as to why they might not be, essentially causing us
to have to wait for a timeout to re-initiate a connection request:
- Packets arrive out of order or too quickly for the WireGuard state
machine to establish a handshake.
- Too many ICE candidates gathered (the gateway has 3 interfaces)
This PR:
- Refactors the iperf tests to be a little easier to maintain
- Ensures `integration-tests` run for at least 30 seconds before timing
out
In any case, we can debug / optimize this further after snownet is
merged, which might just solve the problem completely.
Fixes#3330
The when the log zip file is created, it's created as an empty file in
the `logs` directory. The `logs` directory is then zipped, and written
to the zip file. The empty file remains as an artifact in the final zip
file.
This moves the location of the zip file outside of the `logs` directory,
where it will be cleaned up later.
This will prevent services from restarting out from under us during
tests.
Service restarts should be explicitly tested as integration tests.
Should fix#3666
If `FIREZONE_DNS_CONTROL` is set to `systemd-resolved`, then shell out
to `resolvectl` to request all system DNS queries to go to Firezone's
sentinel DNS server(s).
```[tasklist]
- [ ] Figure out how to stop the runner from using the Docker bridge iface
```
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Whilst debugging the performance tests in #3391, I found that we are
using a 4 year old version of `iperf` for the server. This, plus
restarting the server inbetween the performance runs resulted in flaky
tests. I am not sure how we arrived at #3303 but
[this](https://github.com/firezone/firezone/actions/runs/7926579022?pr=3391)
CI run succeeded with a big matrix using the newer iperf server and
without the restarts.
This is done to delay the candidate generation after the gateway has
already received the request.
Since we already know the candidates in most cases, an optimization in
the future to reduce the number of round-trips to the gateway we can add
the candidates to the request connection message.
Previously, this mapping was not stored within the tunnel so we had to
perform the resolution further up. This has changed and the tunnel
itself now knows about this mapping. Thus, we can easily move the actual
DNS resolution also into the tunnel, thereby reducing the API surface of
`Tunnel` because we don't need the `write_dns_lookup_response` function.
This is crucial because it is the last place where `Tunnel` is being
cloned in #3391. With this sorted out the way, we can remove all `Arc`s
and locks from `Tunnel` as part of #3391.
Trying to make sure I don't overlook anything. The possible combinations
of setups is like 100+, but these 6 will at least exercise everything
one time, and they're probably going to be the most common, right?
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Attempt at cleaning a couple things I missed in code review.
The old httpbin resource wasn't being used anyhow, so I just deduped
them and updated things in a couple other places that had drifted.
Hopefully this fixes the [flaky
CI](https://github.com/firezone/firezone/actions/runs/7918422653/job/21616835910)
Regardless of `FIREZONE_DNS_CONTROL`, always try to notify systemd that
we've started.
I had accidentally conflated the idea of running as a systemd service
with the idea of using systemd to control DNS. They're separate, but
I'll keep the service unit in here and always use `sd-notify` since it
should be harmless to use even in Alpine.
~~If `FIREZONE_DNS_CONTROL` is `systemd-resolved`, try to notify systemd
that we've finished startup and the tunnel is ready.~~
Also adds a CI test, including a systemd service file that is **not**
ready for general use.
Ready for review once it's green.
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Reverts #3622 I don't know why, but that change seemed to cause the
`/etc/resolv.conf` test to fail in CI and I was thinking of the "roll
back first" principle
https://cloud.google.com/blog/products/gcp/reliable-releases-and-rollbacks-cre-life-lessons
~~I also change one `ping` in CI to `until ping`. This was an earlier
attempt before I did the revert, and it seems safe to leave it in.~~
Running perf byitself should be enough to establish a connection, we
don't need to explicitly do that before.
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Initially, we thought that we need to replace the entire `Allocation` if
the credentials to the relay change. However, during testing it turned
out that the credentials will change every time the portal sends us new
credentials. Likely, the portal hashes some kind of nonce into the
password as well.
Consequently, throwing away the entire state of the `Allocation` is
wrong. Instead, we will simply try to refresh the allocation using the
new credentials. If the refresh fails, we will try to make a new
allocation. If that also fails unrecoverably, then we "suspend" the
allocation, i.e. the `Allocation` will not perform any further action by
itself.
In case we get a new `refresh` call (which happens every time we want to
use the `Allocation` for a connection), we restart things and try to
make a new one.
With #3391, constructing a new tunnel will no longer be `async` which
makes DNS resolution the only `async` component of
`set_peer_connection_request`. In general, adding resources as part of
setting up a connection is a duplicated of the logic within
`allow_access`.
We solve both of these problems at once by moving the DNS resolution out
of `connlib` into the `gateway` binary and perform it as part of the
eventloop during a connection setup.
Only user-facing if users are using the Docker image for the Linux
client.
I split off a module for `/etc/resolv.conf` since the code and unit
tests are about 300 lines and aren't related to the rest of the
`tun_linux.rs` code.
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Why:
* After reviewing the Okta docs closer, in order for an OAuth token to
have Okta API scopes attached to it, the Okta org authorization server
must be used, not a custom authorization server (which includes the
'default' authorization server). This means that the OAuth Authorization
URI that was previously being asked for in the Okta Adapter form won't
work for IDP sync to Firezone. This commit updates the form to accept
the Okta Account Domain (i.e. `<company>.okta.com`)
This improves maintenance because we can now use a regular matrix for
the integration tests and one can locally use tools like shellcheck or a
`bash-lsp` during development.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Without this alignment, accessing the `name` field reliably produces a
segfault:
```
Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x1d in tid 13835 (Thread-7), pid 13757 (irezone.android)
```
Interestingly, this only happens in release builds on 32-bit platforms.
Logging the returned name fixes it too which hints at some kind of
optimisation issue. Adding a padding is the most reliable fix.
Fixes: #3637.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
The Docker image for the client is opted in to this new feature. The
bare `linux-client-x64` exe is not. I don't know if users are using the
Docker images?
I wanted to use CLI args, but the DNS control code ("config" or
"control"? Or "SplitDNS"?) has to run at the end of `set_iface_config`,
which on Linux runs in a worker, so I couldn't figure out how to move it
into `on_set_interface_config` in the callbacks. Maybe there is a way,
but the env var results in a small diff.
`str0m` sends its own STUN keep-alives and @conectado has already
removed the logic that uses the wireguard keep-alives to detect stale
connections in
8234529cdf
as part of the integration of `snownet`.
We don't need two keep-alive mechanisms at once.
Fixes#3578Fixes#3551
The issue turned out to be a bunk Repository. Upon unraveling that ball
of yarn, I decided to clean up the Tunnel implementation altogether. It
uses the existing tunnel in-memory store for pushing updates to a
connected SessionActivity.
This PR includes many bug fixes as well.