Also refactored to extract an auth state machine. The auth logic
previously was scattered throughout the GUI module, which would make it
hard to audit. Because of the refactoring I was able to add some simple
unit tests.
... and move its methods into ResourceDescription.
This was a TODO from some pull request in the last few days. I assume
the goal is to share this function between all clients if needed. It
doesn't reduce the number of lines of code, since I could have removed
ResourceDisplay and done this on-the-fly when building the systray menu,
as an alternative.
Fixes#2470, now for linux it looks like:
```
Alpine Linux/3.19.0 (x86_64;5.15.133.1-microsoft-standard-WSL2;) connlib/1.0.0
```
For macos it looks like:
```
Mac OS/13.4.1 (arm64;22.5.0;) connlib/1.0.0
```
and this is how it looks on android:
```
Android/Unknown 6.1.23-android14-4-00257-g7e35917775b8-ab9964412 connlib/1.0.0
```
note: seems like in android emulator at least we can't get the
architecture so easily
Should fix#2880
The way I do it is after ~10 seconds dropping the
`gateway_awaiting_connection` and let the client try the connection
again, depending on upper layer, I think this is fine since the cases
where this happens is unlikely.
It's hard to test thoroughly but I'll test with bad-condition
simulators, [pumba](https://github.com/alexei-led/pumba) seems
promising. In the meantime I'm still creating the PR so that I can have
it reviewed.
Edit: Using Pumba with different % of packet loss things seems to go
well, and connections are actually established even if the packets are
loss. (Making a note that we should integrate pumba with our CI)
Fixes#2948
So it seems that it's easiest just to use an old-fashioned semver
string. This means we'll need to keep a version matrix in the docs of
which components are supported and for how long, but it's better than
having different version schemes for different Firezone components
altogether.
<img width="1552" alt="Screenshot 2023-12-12 at 11 29 43 PM"
src="https://github.com/firezone/firezone/assets/167144/d517c830-64a8-462d-8cb5-c41835fa2059">
Found a reliable way to return default system DNS resolvers on iOS and
macOS. Even if this method is not perfect, I think it's still worth
pursuing because:
* Many administrators will set an upstream resolver in the portal anyway
(bypassing client system resolvers)
* It unifies our Split DNS approach across platforms (assuming we can
query the default system resolvers on Windows), allowing connlib to
intercept all DNS queries on all platforms. This opens the door for some
interesting feature possibilities in the area of malicious query
blocking. This also makes DNS bugs easier to investigate because there's
only one codepath for packets to take. See
https://github.com/firezone/firezone/issues/2859
Draft because it needs more testing and I need to figure out the
`RustVec<RustString>` type for the Swift -> Rust FFI.
Refs #2713
Previously, we just expected the portal to disconnects us and 401 on the
retry, right now we harden that behaviour by also just disconnecting
when token expiration.
This seems to work, there's another part to this which is not only
handling the replies but also handling the message generated by the
portal, I'll implement that when I can easily test expirying tokens, for
now this makes the client much more stable.
Fixes: #2854.
Note: this is ready for review but reproducing the bug that triggered
the fix takes ~1 hour or so, so I would like to wait to check that's
fixed.
Can be reviewed meanwhile.
This PR changes the protocol and adds support for DNS subdomains, now
when a DNS resource is added all its subdomains are automatically
tunneled too. Later we will add support for `*.domain` or `?.domain` but
currently there is an Apple split tunnel implementation limitation which
is too labor-intensive to fix right away.
Fixes#2661
Co-authored-by: Andrew Dryga <andrew@dryga.com>
This PR started as part of a degradation in performance for the
gateways.
The way to test performance in a realistic enviroment is using a GCP vm
as a client and an AWS vm as a gateway with a single iperf server behind
the gateway.
Then the `iperf` results with current main:
```
Connecting to host 172.31.92.238, port 5201
Reverse mode, remote host 172.31.92.238 is sending
[ 5] local 100.83.194.77 port 58426 connected to 172.31.92.238 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 1.01 MBytes 8.50 Mbits/sec
[ 5] 1.00-2.00 sec 1.14 MBytes 9.59 Mbits/sec
[ 5] 2.00-3.00 sec 699 KBytes 5.73 Mbits/sec
[ 5] 3.00-4.00 sec 1.11 MBytes 9.31 Mbits/sec
[ 5] 4.00-5.00 sec 664 KBytes 5.44 Mbits/sec
[ 5] 5.00-6.00 sec 591 KBytes 4.84 Mbits/sec
[ 5] 6.00-7.00 sec 722 KBytes 5.91 Mbits/sec
[ 5] 7.00-8.00 sec 833 KBytes 6.83 Mbits/sec
[ 5] 8.00-9.00 sec 738 KBytes 6.04 Mbits/sec
[ 5] 9.00-10.00 sec 836 KBytes 6.85 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.06 sec 8.78 MBytes 7.32 Mbits/sec 3 sender
[ 5] 0.00-10.00 sec 8.23 MBytes 6.90 Mbits/sec receiver
iperf Done.
```
Most of the performance problems were due to using SCTP and DTLS.
So I created a
[fork](https://github.com/firezone/webrtc/tree/expose-new-endpoint) of
webrtc that let us circumvent those, since we don't need them because we
are depending on wireguard for encryption.
With those changes much better throughput is achieved:
```
gabriel@cloudshell:~ (firezone-personal-instances)$ iperf3 -R -c 172.31.92.238
Connecting to host 172.31.92.238, port 5201
Reverse mode, remote host 172.31.92.238 is sending
[ 5] local 100.83.194.77 port 51206 connected to 172.31.92.238 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 5.60 MBytes 47.0 Mbits/sec
[ 5] 1.00-2.00 sec 17.2 MBytes 144 Mbits/sec
[ 5] 2.00-3.00 sec 15.8 MBytes 132 Mbits/sec
[ 5] 3.00-4.00 sec 14.8 MBytes 125 Mbits/sec
[ 5] 4.00-5.00 sec 15.9 MBytes 133 Mbits/sec
[ 5] 5.00-6.00 sec 15.8 MBytes 133 Mbits/sec
[ 5] 6.00-7.00 sec 15.3 MBytes 128 Mbits/sec
[ 5] 7.00-8.00 sec 15.6 MBytes 131 Mbits/sec
[ 5] 8.00-9.00 sec 15.6 MBytes 131 Mbits/sec
[ 5] 9.00-10.00 sec 16.0 MBytes 134 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.05 sec 151 MBytes 126 Mbits/sec 74 sender
[ 5] 0.00-10.00 sec 148 MBytes 124 Mbits/sec receiver
iperf Done
```
However, this is still worse than it was achieved with a previous
commit(`21afdf0a9a113c996d60a63b2e8c8f32d3aeb87`):
```
gabriel@cloudshell:~ (firezone-personal-instances)$ iperf3 -R -c 172.31.92.238
Connecting to host 172.31.92.238, port 5201
Reverse mode, remote host 172.31.92.238 is sending
[ 5] local 100.100.68.41 port 49762 connected to 172.31.92.238 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 6.14 MBytes 51.5 Mbits/sec
[ 5] 1.00-2.00 sec 17.1 MBytes 144 Mbits/sec
[ 5] 2.00-3.00 sec 22.8 MBytes 191 Mbits/sec
[ 5] 3.00-4.00 sec 23.5 MBytes 197 Mbits/sec
[ 5] 4.00-5.00 sec 23.0 MBytes 193 Mbits/sec
[ 5] 5.00-6.00 sec 22.1 MBytes 185 Mbits/sec
[ 5] 6.00-7.00 sec 23.0 MBytes 193 Mbits/sec
[ 5] 7.00-8.00 sec 22.7 MBytes 190 Mbits/sec
[ 5] 8.00-9.00 sec 21.0 MBytes 176 Mbits/sec
[ 5] 9.00-10.00 sec 19.9 MBytes 167 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.05 sec 204 MBytes 170 Mbits/sec 127 sender
[ 5] 0.00-10.00 sec 201 MBytes 169 Mbits/sec receiver
```
My profiling suggested that this is due to reading/writing packets
happening in its own dedicated tasks. So much so that maybe in the
future we should even consider spawning their own dedicated runtime so
that those loops have a dedicated OS thread.
Also, probably using a multi-queue interface will give us huge gains if
we have a dedicated task for each queue(currently the interface is
started as a multi-queue but a single file descriptor is used) for
handling multiple concurrent clients.
However, the changes proposed in this PR are good enough for now as long
as performance don't degrade.
In that line I will create a CI that reports the throughput using the
local `docker-compose.yml` file that we should always check before
merging, that is not the be all end all of the performance story but for
smaller PRs the correlation to real world throughput should be enough.
For bigger PRs we should manually test before merging for now, until we
have a way in CI to spin up some realistic tests(note that vms should be
in separate cloud enviroments, the same-cloud links are so reliable that
we miss actual performance degradation due to dropped packets). On this
note I'll write a small manual on how to conduct those tests with full
current results that we should use always before merging new PRs that
affect the hot-path. cc @thomaseizinger
Finally, when testing these changes I found some flakiness regarding the
re-connection path. So I changed things so that we cleanup connections
only using wireguard's error(connection expiration). This is quite slow
for now (~120 seconds) but in the future we can issue an ice restart
each time wireguard keepalive expires(rekey timeout) so that we can
restart connection each ~30 seconds and we can reduce the keepalive time
out from the portal to accelerate it even more. And in the future we can
get smarter about it.
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
## Changelog
- Updates connlib parameter API_URL (formerly known under different
names as `CONTROL_PLANE_URL`, `PORTAL_URL`, `PORTAL_WS_URL`, and
friends) to be configured as an "advanced" or "hidden" feature at
runtime so that we can test production builds on both staging and
production.
- Makes `AUTH_BASE_URL` configurable at runtime too
- Moves `CONNLIB_LOG_FILTER_STRING` to be configured like this as well
and simplifies its naming
- Fixes a timing attack bug on Android when comparing the `csrf` token
- Adds proper account ID validation to Android to prevent invalid URL
parameter strings from being saved and used
- Cleans up a number of UI / view issues on Android regarding typos,
consistency, etc
- Hides vars from from the `relay` CLI we may not want to expose just
yet
- `get_device_id()` is flawed for connlib components -- SMBios is rarely
available. Data plane components now require a `FIREZONE_ID` now instead
to use for upserting.
Fixes#2482Fixes#2471
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: Gabi <gabrielalejandro7@gmail.com>
There were 2 bugs:
* `gateway_awaiting_connections` should represent gateways were there is
an on-going connection intent but are not connected yet but currently we
were creating an empty entry when there was no entry, even if there is a
connection established, this would cause the next resource connection
intent to stop early without adding the allowed ip, thus never using the
connection.
* There was a race condition, where if the `ReuseConnection` was sent to
the gateway when the connection wasn't established, the gateway would
just ignore the message, but this connection intent would never be sent
again.
Now that I'm writing this maybe the best solution is, if there is a
pending connection to a gateway, we just do nothing. That way upper
layers would just retry the message and we send `ReuseConnection` once
the connection is established instead of buffering the requests...
Edit: that's exactly the fix I made.
With this we implement DNS forwarding that's specified in #2043
This also solve the DNS story in Android.
For the headless client in Linux we still need to implement split dns,
but we can make do with this, specially, we can read from resolvconf and
use the forward DNS (not ideal but can work if we want a beta headless
client).
For the resolver I used `trusted-proto-resolver`.
The other options were:
* Using `domain`'s resolver but while it could work for now, it's no
ideal for this since it doesn't support DoH or DoT and doesn't provide
us with a DNS cache.
* Using `trusted-proto-client`, it doesn't provide us with a DNS cache,
though we could eventually replace it since it provides a way to access
the underlying buffer which could make our code a bit simpler.
* Writing our own. While we could make the API ideal, this is too much
work for beta.
@pratikvelani I did some refactor in the kotlin side so we can return an
array of bytearrays so that we don't require parsing on connlib side, I
also tried to make the dns server detector a bit simpler please take a
look it's my first time doing kotlin
@thomaseizinger please take a look specially at the first commit, I
tried to integrate with the `poll_events` and the `ClientState`.
Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.4.1 to 1.5.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/uuid-rs/uuid/releases">uuid's
releases</a>.</em></p>
<blockquote>
<h2>1.5.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Add impl From<!-- raw HTML omitted --> for String under the std
feature flag by <a
href="https://github.com/brahms116"><code>@brahms116</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/700">uuid-rs/uuid#700</a></li>
<li>Remove dead link to templates by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/704">uuid-rs/uuid#704</a></li>
<li>make ClockSequence wrap correctly by <a
href="https://github.com/fef1312"><code>@fef1312</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/705">uuid-rs/uuid#705</a></li>
<li>Track MSRV in Cargo.toml by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/706">uuid-rs/uuid#706</a></li>
<li>Support converting between Uuid and vec by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/703">uuid-rs/uuid#703</a></li>
<li>Replace MIPS with Miri and add clippy to CI by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/712">uuid-rs/uuid#712</a></li>
<li>Added <code>bytemuck</code> support by <a
href="https://github.com/John-Toohey"><code>@John-Toohey</code></a> in
<a
href="https://redirect.github.com/uuid-rs/uuid/pull/711">uuid-rs/uuid#711</a></li>
<li>Prepare for 1.5.0 release by <a
href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/713">uuid-rs/uuid#713</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/brahms116"><code>@brahms116</code></a>
made their first contribution in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/700">uuid-rs/uuid#700</a></li>
<li><a href="https://github.com/fef1312"><code>@fef1312</code></a> made
their first contribution in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/705">uuid-rs/uuid#705</a></li>
<li><a
href="https://github.com/John-Toohey"><code>@John-Toohey</code></a>
made their first contribution in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/711">uuid-rs/uuid#711</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/uuid-rs/uuid/compare/1.4.1...1.5.0">https://github.com/uuid-rs/uuid/compare/1.4.1...1.5.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="e68b0108fa"><code>e68b010</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/713">#713</a> from
uuid-rs/cargo/1.5.0</li>
<li><a
href="b1cc27a118"><code>b1cc27a</code></a>
prepare for 1.5.0 release</li>
<li><a
href="b8ebdee9b0"><code>b8ebdee</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/711">#711</a> from
John-Toohey/bytemuck</li>
<li><a
href="2dad70d3c7"><code>2dad70d</code></a>
Added the <code>bytemuck</code> optional dependency to
<code>lib.rs</code> documentation</li>
<li><a
href="bcf2b58997"><code>bcf2b58</code></a>
Added Bytemuck to .github/workflows/ci.yml::env::DEP_FEATURES</li>
<li><a
href="a8d2e1d4bf"><code>a8d2e1d</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/712">#712</a> from
uuid-rs/ci/miri-clippy</li>
<li><a
href="0c5b2dfebd"><code>0c5b2df</code></a>
fix up a clippy warning</li>
<li><a
href="1d4bd6e5b2"><code>1d4bd6e</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/703">#703</a> from
uuid-rs/feat/convert-to-vec</li>
<li><a
href="52b3fbc04a"><code>52b3fbc</code></a>
replace MIPS with Miri and add clippy to CI</li>
<li><a
href="3833d095c1"><code>3833d09</code></a>
Make the bytemuck dependency look more like the other dependencies</li>
<li>Additional commits viewable in <a
href="https://github.com/uuid-rs/uuid/compare/1.4.1...1.5.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Should be easier to review commit by commit.
The gist of this commit is:
* `onAddRoute` on Android now takes an address+prefix as to minimize
parsing
* `onAddRoute` recreates the vpn service each time(TODO: is this too bad
for performance?)
* `on_add_route` and `onAddRoute` returns the new fd
* on android after `on_add_route` we recreate `IfaceConfig` and
`DeviceIo` and we store the new values
* `peer_handler` now runs on a loop, where each time we fail a write
with an error code 9(bad descriptor) we try to take the new `DeviceIo`
* we keep an
[`AbortHandle`](https://docs.rs/tokio/latest/tokio/task/struct.AbortHandle.html)
from the `iface_handler` task, since closing the fd doesn't awake the
`read` task for `AsyncFd`(I tried it, right now `close` is only called
after dropping the fd) so we explicitly abort the task and start a new
one with the new `device_io`.
* in android `DeviceIo` has an atomic which tells if it's closed or open
and we change it to closed after `on_add_route`, we use this as to never
double-close the fd, instead we wait until it's dropped. This *might*
affect performance on android since we use non-`Ordering::Relaxed`
atomic operation each read/write but it won't affect perfromance in
other platforms, furthermore I believe the performance gains if we
remove this will be minimal.
Fixes#2227
---------
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
fixes#2013
Stops the reconnect loop on a 4xx error.
Right now it seems like android doesn't handle `on_disconnect` properly,
just logging instead of going back to sign-in screen.