With the upgrade to 0.23, `tokio-tungstenite` pulls in `rustls` 0.27
which supports multiple crypto providers. By default, this uses the
`aws-lc-crypto` provider. The previous default was `ring`.
This PR bumps the necessary versions and installs the `ring` crypto
provider at the beginning of each application, before connlib starts. We
try and do this as early as possible to make it obvious that it only
needs to happen once per process.
Resolves: #5380.
For most cases, TURN identifies clients by their 3-tuple. This can make
it hard to correlate logs in case the client roams or its NAT session
gets reset, both of which cause the port to change.
To make problem analysis easier, we include the RFC-recommended
`SOFTWARE` attribute in all STUN requests created by `snownet`.
Typically, this includes a textual description of who sent the request
and a version number. See [0] for details. We don't track the version of
`snownet` individually and passing the actual client-version across this
many layers is deemed too complicated for now.
What we can add though is a parameter that includes a sticky session ID.
This session ID is computed based on the `Node`'s public key, meaning it
doesn't change until the user logs-out and in again.
On the relay, we now look for a `SOFTWARE` attribute in all STUN
requests and optionally include it in all spans if it is present.
[0]: https://datatracker.ietf.org/doc/html/rfc5389#section-15.10
Currently, `connlib` can only handle "simple" DNS wildcards where `*`
matches any number of subdomains, including zero and `?` matches a
single subdomain.
With this PR, we expand `connlib'`s capabilities to allow for a much
more complex matching of domains that more closely resembles glob
patterns:
- `**` matches any number of subdomains. This supersedes the previous
`*` operator.
- `*` matches a single subdomain. This supersedes the previous `?`
operator.
- `?` matches a single character. This wasn't possible before.
- Additionally, any of these can be combined. Previously, only `*` or
`?` was allowed and they were only accepted at the front of the domain
name pattern.
Resolves: #5056.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Fixes#6155
One question we get with almost each new customer is "if I enable sync,
won't that count towards my bill?". This PR aims to answer that question
right when they create the provider.
I will also make sure to update Enterprise accounts in Stripe with
`monthly_active_users_acount` so that they can view this metric on the
Billing page.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Closes#5453
Tested once on the Windows aarch64 VM. Should always leave 4 files
behind, a `.log` and a `.jsonl` for the GUI and for the IPC service. The
"log directory" is a bit of a lie since it's consistently 2 directories
on both platforms now.
```[tasklist]
- [x] Update changelog
- [x] Make a note to remove the known issue from the website when the next release is cut after this PR merges
```
This will allow us to write queries and thus alerts for increased number
of error responses such as `Allocation Mismatch`.
When attaching labels to metrics, it is important to avoid cardinality
explosions. Thus, the possible label values should always be a fixed,
bounded set of values. The possible error codes could be quite a few but
in practise, we only use a handful and clients cannot influence, which
error codes we send. Thus, it is safe to create labels for these codes.
The same would not be true for IP addresses or ports for example.
Viewing a Resource created by an API client was crashing the view due to
the function creating the link to the actor not accounting for the API
client case.
Closes#6267
In #6259, we added a regression test for concurrent DNS queries. A case
that we overlooked is that when DNS servers are defined as CIDR
resources, the queries themselves will act as connection intents and
thus dropped until we have a connection.
In the tests, the connection is only established as part of `advance`.
Thus, if we get multiple concurrent DNS queries to the same server that
is defined as a CIDR resource, we need to drop all future queries.
Fixes: #6283.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Bumps [tauri-build](https://github.com/tauri-apps/tauri) from 1.5.1 to
1.5.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/tauri-apps/tauri/releases">tauri-build's
releases</a>.</em></p>
<blockquote>
<h2>tauri-build v1.5.3</h2>
<!-- raw HTML omitted -->
<pre><code>Fetching advisory database from
`https://github.com/RustSec/advisory-db.git`
Loaded 630 security advisories (from /home/runner/.cargo/advisory-db)
Updating crates.io index
Scanning Cargo.lock for vulnerabilities (590 crate dependencies)
Crate: atty
Version: 0.2.14
Warning: unsound
Title: Potential unaligned read
Date: 2021-07-04
ID: RUSTSEC-2021-0145
URL: https://rustsec.org/advisories/RUSTSEC-2021-0145
Dependency tree:
atty 0.2.14
└── clap 3.2.25
└── tauri 1.7.0
├── tauri 1.7.0
├── restart 0.1.0
└── app-updater 0.1.0
<p>warning: 1 allowed warning found
</code></pre></p>
<!-- raw HTML omitted -->
<h2>[1.5.3]</h2>
<h3>Dependencies</h3>
<ul>
<li>Upgraded to <code>tauri-utils@1.6.0</code></li>
<li>Upgraded to <code>tauri-codegen@1.4.4</code></li>
</ul>
<!-- raw HTML omitted -->
<pre><code>Updating crates.io index
Packaging tauri-build v1.5.3
(/home/runner/work/tauri/tauri/core/tauri-build)
Verifying tauri-build v1.5.3
(/home/runner/work/tauri/tauri/core/tauri-build)
Updating crates.io index
Downloading crates ...
Downloaded tauri-winres v0.1.1
Downloaded embed-resource v2.4.2
Downloaded cargo_toml v0.15.3
Compiling proc-macro2 v1.0.86
</tr></table>
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="cf331cdc3e"><code>cf331cd</code></a>
fix(core): lint</li>
<li><a
href="574076541a"><code>5740765</code></a>
fix(ci): downgrade crates for MSRV check</li>
<li><a
href="89f3048f52"><code>89f3048</code></a>
apply version updates (<a
href="https://redirect.github.com/tauri-apps/tauri/issues/9871">#9871</a>)</li>
<li><a
href="08f57efefd"><code>08f57ef</code></a>
fix(cli): parse <code>--profile=\<profile></code> syntax (<a
href="https://redirect.github.com/tauri-apps/tauri/issues/10136">#10136</a>)</li>
<li><a
href="63da834ce4"><code>63da834</code></a>
ci: Fix msrv check (<a
href="https://redirect.github.com/tauri-apps/tauri/issues/10118">#10118</a>)</li>
<li><a
href="c2d3afa4fb"><code>c2d3afa</code></a>
prevent uncomment collision in 1.x invoke_key templating (fix <a
href="https://redirect.github.com/tauri-apps/tauri/issues/10084">#10084</a>)
(<a
href="https://redirect.github.com/tauri-apps/tauri/issues/10087">#10087</a>)</li>
<li><a
href="924387092e"><code>9243870</code></a>
feat: add dmg settings, cherry picked from <a
href="https://redirect.github.com/tauri-apps/tauri/issues/7964">#7964</a>
(<a
href="https://redirect.github.com/tauri-apps/tauri/issues/8334">#8334</a>)</li>
<li><a
href="d2786bf699"><code>d2786bf</code></a>
chore(template): template format error (<a
href="https://redirect.github.com/tauri-apps/tauri/issues/10018">#10018</a>)</li>
<li><a
href="674accad75"><code>674acca</code></a>
fix: missing depends for rpm package (<a
href="https://redirect.github.com/tauri-apps/tauri/issues/10015">#10015</a>)</li>
<li><a
href="09152d83e1"><code>09152d8</code></a>
ci(msrv-list): Downgrade os_pipe (<a
href="https://redirect.github.com/tauri-apps/tauri/issues/10014">#10014</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/tauri-apps/tauri/compare/tauri-build-v1.5.1...tauri-build-v1.5.3">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
For efficiency reasons, TURN's data channels don't have any
authentication or integrity metadata. Instead, the operate using a short
2-byte channel number to identify the target peer of the data.
To avoid abuse, channel bindings are at most valid for 10 minutes before
they need to be refreshed. In case they expire, there is a 5 minute
cooldown period, before the same channel number can be bound to a
different peer and before the same peer can be bound to a different
channel.
We had a similar issue in the past (#5613) where channels got rebound
early. Whilst that was fixed and is no longer happening, a case that we
didn't consider is what happens if we want to bind a channel to a peer
that still has a channel bound but is currently cooling down (i.e. in
the 5 minute period after its expiry).
In that case, `snownet` would wrongly assume that there is no channel to
this peer and try to bind a new one. That would get rejected by the
relay with a bad request.
To fix this, we simply need to check whether we still have a channel to
this peer and if yes, return the same channel number. On the relay, we
need to ensure that we consider a channel as `bound` again when it is
being refreshed.
We ensure that this doesn't regress in two ways:
- We add a unit-test for the `ChannelBindings` struct
- We modify the `Idle` transition to idle for 6 instead of 5 minutes.
This ensures that a combination of 2 idle transitions puts the channel
bindings into the 10-15 minute time window where rebinding the peer to a
different channel fails.
Related: #6265.
Most of `connlib-shared` exists only for historical reasons. The
`Tunnel` has since been decoupled from the `Callbacks` and most error
variants on `ConnlibError` are not actually used.
This allows us to move a few things around and trim down `ConnlibError`
to just the variants that actually cause a call to `on_disconnect`.
Moving everything related to `proptest`s to `firezone-tunnel` also
requires us to delete the specialisation for printing IDs in a shorter
format during the tests. That is a bit unfortunate but was always kind
of a hack. I'd rather make progress on getting rid of `connlib-shared`
though and perhaps re-introduce that feature once the messages are fully
moved into the tunnel.
Related: #4470.
When forwarding DNS queries, we need to temporarily store information
about them to correctly identify the response and mangle them back. This
algorithm needs to support concurrent queries to the same and different
DNS servers.
A critical bug was fixed in #6233 where we had wrongly assumed that DNS
query IDs are globally unique.
To ensure this behaviour doesn't regress, this PR modifies our test
suite to send up to 5 concurrent DNS queries into connlib. Currently,
the nature of `tunnel_test` is such that each `Transition` is a single
action that gets fully completed before we execute the next one. Thus,
the concurrency of DNS queries would never get tested because connlib's
internal data structure would at most ever contain 1 DNS query.
By sampling a set of up to 5 unique combinations of DNS server and query
ID, we make sure that concurrent DNS queries work.
Now that we have the `bin-shared` crate, it is easy to move the
health-check functionality into there. That allows us to get rid of a
crate which makes navigating the workspace a bit easier.
Currently, the logging for which resources get activated and
de-activated is spread between the `dns` and `client` module. It also
doesn't include the sites that the resource is defined in.
The name of a resource alone is not enough to unique identify it. To fix
both of these papercuts, we move the logging to the `client` module and
include the sites in the log message.
The log messages now read like this:
```
2024-08-12T02:26:01.477844Z INFO firezone_tunnel::client: Activating resource name=IPerf3 address=10.0.32.101/32 sites=AWS Dev (Gateways track `main`)
2024-08-12T02:26:01.477904Z INFO firezone_tunnel::client: Activating resource name=*.slack.com address=*.slack.com sites=Vultr Stable (Latest Release Gateways)
2024-08-12T02:26:01.477942Z INFO firezone_tunnel::client: Activating resource name=*.slack-edge.com address=*.slack-edge.com sites=Vultr Stable (Latest Release Gateways)
2024-08-12T02:26:01.477984Z INFO firezone_tunnel::client: Activating resource name=*.spotify.com address=*.spotify.com sites=AWS Dev (Gateways track `main`)
```
Fixes the flaky time condition unit test by always using midnight as the
end time range so that the `flow.expires_at` is never calculated across
a day boundary into the future.
Supersedes #6244
If a new resource is created that will use format not supported by
previous client versions we temporarily show a warning:
<img width="683" alt="Screenshot 2024-08-07 at 2 28 57 PM"
src="https://github.com/user-attachments/assets/bbfdfc96-0c4b-4226-93c5-bc2b5fdb9d30">
It will also be excluded from `resources` list for older clients (below
1.2).
---------
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
I recently discovered that the metrics reporting to Google Cloud Metrics
for the relays is actually working. Unfortunately, they are all bucketed
together because we don't set the metadata correctly.
This PR aims to fix that be setting some useful default metadata for
traces and metrics and additionally, discoveres instance ID and name
from GCE metadata.
Related: #2033.
Setting up a logger is something that pretty much every entrypoint needs
to do, be it a test, a shared library embedded in another app or a
standalone application. Thus, it makes sense to introduce a dedicated
crate that allows us to bundle all the things together, how we want to
do logging.
This allows us to introduce convenience functions like
`firezone_logging::test` which allow you to construct a logger for a
test as a one-liner.
Crucially though, introducing `firezone-logging` gives us a place to
store a default log directive that silences very noisy crates. When
looking into a problem, it is common to start by simply setting the
log-filter to `debug`. Without further action, this floods the output
with logs from crates like `netlink_proto` on Linux. It is very unlikely
that those are the logs that you want to see. Without a preset filter,
the only alternative here is to explicitly turn off the log filter for
`netlink_proto` by typing something like
`RUST_LOG=netlink_proto=off,debug`. Especially when debugging issues
with customers, this is annoying.
Log filters can be overridden, i.e. a 2nd filter that matches the exact
same scope overrides a previous one. Thus, with this design it is still
possible to activate certain logs at runtime, even if they have silenced
by default.
I'd expect `firezone-logging` to attract more functionality in the
future. For example, we want to support re-loading of log-filters on
other platforms. Additionally, where logs get stored could also be
defined in this crate.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Work for #6074 equivalent to #6166 for MacOS
MacOs view:
<img width="547" alt="image"
src="https://github.com/user-attachments/assets/f465183e-247b-49b5-a916-3ecc5f0a02f4">
iOS(ipad) view:

Other than implementing the resource disabling, this PR also refactor
the IPC between the network extension and the app so that it's some form
of structured IPC instead of relying on it being deserializable to
string to match the message.
One big difference with Android is that we don't introduce the concept
of a `ResourceView` for swift, the main reason for this is that on iOS
the resources are bound to the view instead of just being a parameter
for creating the view. So if we modify the `disabled` property it'd
update the UI unnecessarily, also it'd update the `Store` value for the
resource and then we need to copy that over again to the view. Making it
easier to go out of sync.
This will always be elevated in CI, so just check that it doesn't crash.
This came up during debugging while I was offline, and I just want to
make CI check for regressions, since there's a lot of `unsafe` code in
the Windows impl
The `firezone-bin-shared` crate is meant to house non-tunnel related
things. That allows it to compile in parallel to everything else. It
currently only depends on `connlib-shared` to access the `DEFAULT_MTU`
constant. We can remove that by requiring the MTU as a ctor parameter of
`TunDeviceManager`.
A longer write-up of the intended dependency structure is in #4470.
When forwarding DNS queries, we need to remember the original source
socket in order to send the response back. Previously, this mapping was
indexed by the DNS query ID. As it turns out, at least Windows doesn't
have a global DNS query ID counter and may reuse them across different
DNS servers. If that happens and two of these queries overlap, then we
match the wrong responses together.
In the best case, this produces bad DNS results on the client. In the
worst case, those queries were for DNS servers with different IP
versions in which case we triggered a panic in connlib further down the
stack where we created the IP packet for the response.
To fix this, we first and foremost remove the explicit `panic!` from the
`make::` functions in `ip-packet`. Originally, these functions were only
used in tests but we started to use them in production code too and
unfortunately forgot about this panic. By introducing a `Result`, all
call-sites are made aware that this can fail.
Second, we fix the actual indexing into the data structure for forwarded
DNS queries to also include the DNS server's socket. This ensures we
don't treat the DNS query IDs as globally unique.
Third, we replace the panicking path in
`try_handle_forwarded_dns_response` with a log statement, meaning if the
above assumption turns out wrong for some reason, we still don't panic
and simply don't handle the packet.
I noticed we sometimes have a flaky integration test with an ICE timeout
in its logs. For example:
https://github.com/firezone/firezone/actions/runs/10278933741/job/28443578376
Analyzing this one more closely turned out to be caused by a race
condition between client and gateway, when they exchange their ICE
candidates.
We send ICE candidates in batches but because they are serialized to
strings early, their ordering actually depends on the so-called
"foundation" of the ICE candidates. that one is simply a hash of several
components. As a result, the ordering of these candidates can vary
between test runs.
We should try ICE candidates in order of their reverse-priority (i.e.
best first). By introducing a helper-collection, we can enforce this
ordering before sending ICE candidates across.