Now that we have the `bin-shared` crate, it is easy to move the
health-check functionality into there. That allows us to get rid of a
crate which makes navigating the workspace a bit easier.
Currently, the logging for which resources get activated and
de-activated is spread between the `dns` and `client` module. It also
doesn't include the sites that the resource is defined in.
The name of a resource alone is not enough to unique identify it. To fix
both of these papercuts, we move the logging to the `client` module and
include the sites in the log message.
The log messages now read like this:
```
2024-08-12T02:26:01.477844Z INFO firezone_tunnel::client: Activating resource name=IPerf3 address=10.0.32.101/32 sites=AWS Dev (Gateways track `main`)
2024-08-12T02:26:01.477904Z INFO firezone_tunnel::client: Activating resource name=*.slack.com address=*.slack.com sites=Vultr Stable (Latest Release Gateways)
2024-08-12T02:26:01.477942Z INFO firezone_tunnel::client: Activating resource name=*.slack-edge.com address=*.slack-edge.com sites=Vultr Stable (Latest Release Gateways)
2024-08-12T02:26:01.477984Z INFO firezone_tunnel::client: Activating resource name=*.spotify.com address=*.spotify.com sites=AWS Dev (Gateways track `main`)
```
I recently discovered that the metrics reporting to Google Cloud Metrics
for the relays is actually working. Unfortunately, they are all bucketed
together because we don't set the metadata correctly.
This PR aims to fix that be setting some useful default metadata for
traces and metrics and additionally, discoveres instance ID and name
from GCE metadata.
Related: #2033.
Setting up a logger is something that pretty much every entrypoint needs
to do, be it a test, a shared library embedded in another app or a
standalone application. Thus, it makes sense to introduce a dedicated
crate that allows us to bundle all the things together, how we want to
do logging.
This allows us to introduce convenience functions like
`firezone_logging::test` which allow you to construct a logger for a
test as a one-liner.
Crucially though, introducing `firezone-logging` gives us a place to
store a default log directive that silences very noisy crates. When
looking into a problem, it is common to start by simply setting the
log-filter to `debug`. Without further action, this floods the output
with logs from crates like `netlink_proto` on Linux. It is very unlikely
that those are the logs that you want to see. Without a preset filter,
the only alternative here is to explicitly turn off the log filter for
`netlink_proto` by typing something like
`RUST_LOG=netlink_proto=off,debug`. Especially when debugging issues
with customers, this is annoying.
Log filters can be overridden, i.e. a 2nd filter that matches the exact
same scope overrides a previous one. Thus, with this design it is still
possible to activate certain logs at runtime, even if they have silenced
by default.
I'd expect `firezone-logging` to attract more functionality in the
future. For example, we want to support re-loading of log-filters on
other platforms. Additionally, where logs get stored could also be
defined in this crate.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Work for #6074 equivalent to #6166 for MacOS
MacOs view:
<img width="547" alt="image"
src="https://github.com/user-attachments/assets/f465183e-247b-49b5-a916-3ecc5f0a02f4">
iOS(ipad) view:

Other than implementing the resource disabling, this PR also refactor
the IPC between the network extension and the app so that it's some form
of structured IPC instead of relying on it being deserializable to
string to match the message.
One big difference with Android is that we don't introduce the concept
of a `ResourceView` for swift, the main reason for this is that on iOS
the resources are bound to the view instead of just being a parameter
for creating the view. So if we modify the `disabled` property it'd
update the UI unnecessarily, also it'd update the `Store` value for the
resource and then we need to copy that over again to the view. Making it
easier to go out of sync.
This will always be elevated in CI, so just check that it doesn't crash.
This came up during debugging while I was offline, and I just want to
make CI check for regressions, since there's a lot of `unsafe` code in
the Windows impl
The `firezone-bin-shared` crate is meant to house non-tunnel related
things. That allows it to compile in parallel to everything else. It
currently only depends on `connlib-shared` to access the `DEFAULT_MTU`
constant. We can remove that by requiring the MTU as a ctor parameter of
`TunDeviceManager`.
A longer write-up of the intended dependency structure is in #4470.
When forwarding DNS queries, we need to remember the original source
socket in order to send the response back. Previously, this mapping was
indexed by the DNS query ID. As it turns out, at least Windows doesn't
have a global DNS query ID counter and may reuse them across different
DNS servers. If that happens and two of these queries overlap, then we
match the wrong responses together.
In the best case, this produces bad DNS results on the client. In the
worst case, those queries were for DNS servers with different IP
versions in which case we triggered a panic in connlib further down the
stack where we created the IP packet for the response.
To fix this, we first and foremost remove the explicit `panic!` from the
`make::` functions in `ip-packet`. Originally, these functions were only
used in tests but we started to use them in production code too and
unfortunately forgot about this panic. By introducing a `Result`, all
call-sites are made aware that this can fail.
Second, we fix the actual indexing into the data structure for forwarded
DNS queries to also include the DNS server's socket. This ensures we
don't treat the DNS query IDs as globally unique.
Third, we replace the panicking path in
`try_handle_forwarded_dns_response` with a log statement, meaning if the
above assumption turns out wrong for some reason, we still don't panic
and simply don't handle the packet.
I noticed we sometimes have a flaky integration test with an ICE timeout
in its logs. For example:
https://github.com/firezone/firezone/actions/runs/10278933741/job/28443578376
Analyzing this one more closely turned out to be caused by a race
condition between client and gateway, when they exchange their ICE
candidates.
We send ICE candidates in batches but because they are serialized to
strings early, their ordering actually depends on the so-called
"foundation" of the ICE candidates. that one is simply a hash of several
components. As a result, the ordering of these candidates can vary
between test runs.
We should try ICE candidates in order of their reverse-priority (i.e.
best first). By introducing a helper-collection, we can enforce this
ordering before sending ICE candidates across.
LLMNR is a deprecated [0] protocol and we shouldn't advertise it on our
TUN interface. With LLMNR, name resolutions for hosts (i.e. single-label
domains) that are not found via search domains on other interfaces (like
a WiFI or Ethernet adapter) end up failing with "refused" instead of the
appropriate NXDOMAIN.
For example, my WiFi card has the `fritz.box` search domain assigned via
DHCP. This allows me to lookup hosts on my local network. Searching for
a host `foo` that doesn't exist currently fails with "refused":
```
❯ host foo
Host foo not found: 5(REFUSED)
```
By disabling LLMNR, we get the expected "nxdomain":
```
❯ host foo
Host foo not found: 3(NXDOMAIN)
```
To make configuring things via `resolvectl` more ergonomic, I extracted
out a helper function.
Related: #6218.
[0]:
https://techcommunity.microsoft.com/t5/networking-blog/aligning-on-mdns-ramping-down-netbios-name-resolution-and-llmnr/ba-p/3290816
Currently, `connlib` depends on `hickory-resolver` to perform DNS
queries for non-resources. This is unnecessary. Instead of buffering the
original UDP DNS query, consulting hickory to resolve the name and
mapping the response back, we can simply take the UDP payload and send
it via our protected socket directly to the original upstream DNS
server.
This ensures `connlib` is as transparent as possible for DNS queries for
non-resources. Additionally, it removes a lot of error handling and
other cruft that we currently have to perform because we are using
hickory. For example, hickory will automatically retry a DNS query after
a certain timeout. However, the OS / client talking to `connlib` will
also retry after a certain timeout because it is making DNS queries over
an unreliable transport (UDP). It is thus unnecessary for us to do that
internally.
To correctly test this change, our test-suite needed some refactoring.
Specifically, DNS servers are now modelled as dedicated `Host`s that can
receive (UDP) traffic.
Lastly, we can remove our dependency on `hickory-proto` and
`hickory-resolver` everywhere and only use `domain` for parsing DNS
messages.
Resolves: #6141.
Related: #6033.
Related: #4800. (Impossible to happen with this design)
Instead of having one giant, composed strategy, we introduce a dedicated
`stub_portal` strategy. That one samples what is defined in the portal
in production: sites, gateways and resources.
Based on a sampled portal, we can then sample gateways, a client and DNS
records for our resources.
The `DnsServer` struct is quite nested. All it really contains
(currently) is a `SocketAddr`. To make logs containing this structure
easier to use, only print the inner address on debug.
Without masquerading, packets sent by the gateway through the TUN
interface use the wrong source address (the TUN device's address)
instead of the gateway's actual network interface.
We set this env variable in all our uses of the gateway, thus we might
as well remove it and always perform unconditionally.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Closes#5878
It won't work properly as admin (deep links will all fail), and this
improves UX by making it obvious that admin powers are no longer needed
for the GUI.
```[tasklist]
- [x] Write up `SAFETY` comments
```
Closes#5063, supersedes #5850
Other refactors and changes made as part of this:
- Adds the ability to disable DNS control on Windows
- Removes the spooky-action-at-a-distance `from_env` functions that used
to be buried in `tunnel`
- `FIREZONE_DNS_CONTROL` is now a regular `clap` argument again
---------
Signed-off-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Mitigates #5880.
This should fix the issue for all practical purposes, but we don't need
a channel there, so it does not close the ticket. A more permanent fix
would involve factoring out the callbacks or cheating and using a Mutex
inside the callbacks to do a swap-and-notify thing.
This affects both the Headless Client and the GUI Client's IPC service,
on both Linux and Windows.
Builds on top of #6164
Part of the effor towards
https://github.com/firezone/firezone/issues/6074
Prepares connlib to call `setDisableResource` from android.
Furthermore, we add a `disablable` parameter for resources which default
to false for now, in the future the portal will set it for the internet
resource, and further in the future it may be used for other resources.
The `disablable` parameter only affect UI.
This is just the API part for #6074
We expose a new API `set_disabled_resources` which given a set of
resource ids it does the following:
* Disconnect any active connection depending only on this resource
* Prevent any new connection with that resource id being established
The `set_disabled_resources` API is purposely not stateful. In other
words, resources cannot be incrementally enabled or disabled. Instead,
clients always need to send the latest state, i.e. all resources that
should be disabled. `connlib` will figure out the diff and correctly
enable / disable resources as necessary. Thus, enabling a resource is
done by calling `set_disabled_resources` without the previously disabled
resource ID.
Initially, this will only be used for the internet resource but the use
can be expanded for any other resource.
Bumps [serde](https://github.com/serde-rs/serde) from 1.0.203 to
1.0.204.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/serde-rs/serde/releases">serde's
releases</a>.</em></p>
<blockquote>
<h2>v1.0.204</h2>
<ul>
<li>Apply #[diagnostic::on_unimplemented] attribute on Rust 1.78+ to
suggest adding serde derive or enabling a "serde" feature flag
in dependencies (<a
href="https://redirect.github.com/serde-rs/serde/issues/2767">#2767</a>,
thanks <a
href="https://github.com/weiznich"><code>@weiznich</code></a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="18dcae0a77"><code>18dcae0</code></a>
Release 1.0.204</li>
<li><a
href="58c307f9cc"><code>58c307f</code></a>
Alphabetize list of rustc-check-cfg</li>
<li><a
href="8cc4809414"><code>8cc4809</code></a>
Merge pull request <a
href="https://redirect.github.com/serde-rs/serde/issues/2769">#2769</a>
from dtolnay/onunimpl</li>
<li><a
href="1179158def"><code>1179158</code></a>
Update ui test with diagnostic::on_unimplemented from PR 2767</li>
<li><a
href="91aa40e749"><code>91aa40e</code></a>
Add ui test of unsatisfied serde trait bound</li>
<li><a
href="595019e979"><code>595019e</code></a>
Cut test_suite from workspace members in old toolchain CI jobs</li>
<li><a
href="b0d7917f88"><code>b0d7917</code></a>
Pull in trybuild 'following types implement trait' fix</li>
<li><a
href="8e6637a1e4"><code>8e6637a</code></a>
Merge pull request <a
href="https://redirect.github.com/serde-rs/serde/issues/2767">#2767</a>
from weiznich/feature/diagnostic_on_unimplemented</li>
<li><a
href="694fe05953"><code>694fe05</code></a>
Use the <code>#[diagnostic::on_unimplemented]</code> attribute when
possible</li>
<li><a
href="f3dfd2a237"><code>f3dfd2a</code></a>
Suppress dead code warning in test of unit struct remote derive</li>
<li>Additional commits viewable in <a
href="https://github.com/serde-rs/serde/compare/v1.0.203...v1.0.204">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
With the adoption of #5080, connlib is now resilient against temporarily
failed connections as they'll be immediately re-established. Thus, we no
longer need any of the patches that we are currently maintaining in our
str0m fork.
The only difference is an adjustment of the ICE timeout parameters but
those can be made configurable in str0m.
Related: https://github.com/algesten/str0m/pull/537.
This seems to fix#6033
What **seems** to be happening is that sometimes responses are delayed
and hickory cache the negative response.
We disable the cache, and the multiple attempts to be as transparent as
possible until #6141 is implemented.
Furthermore, the lack of recursion available in responses can cause
issues in some clients and enabling it shouldn't cause any problems.
When a relay disconnects from the portal, either during deployment or
because of a network partition, the portal sends us a `relays_presence`
event. This allows us to discontinue use of a relay. Any connections
that currently use that relay get cut and the next packet reestablishes
a new one.
In the case of relays being re-deployed, their state is gone entirely
and we will receive new relays to use. In the case of a network
partition, the relay would have retained its state but we have already
discarded ours locally. Only one allocation per client (identified by
its 3-tuple) is allowed, so making a new allocation on that relay would
fail.
In order to sync up this inconsistency, we delete our current allocation
and make a new one if we detect this case. To test this, we introduce a
new state transition to `tunnel_test` that simulates such a network
partition.
In addition, we also remove the "upsert" behaviour of relays. The
credentials of a relay can only change if it reboots. Rebooting would
trigger a `relays_presence` event and tell us to disconnect from that
relay. Thus, receiving a relay that we already know is guaranteed to use
the same credentials.
Removal of this upserting behaviour is essentially the fix for #6067.
Due to a portal bug (#6099), we may receive a relay as connected that is
in fact shutting down. In case a channel needs to be refreshed on
exactly that relay - whilst we are trying to refresh the allocation it
as part of upserting - causes a busy loop of attempting to queue a
message but failing to do so because we haven't chosen an
`active_socket` yet for that relay.
Fixes: #6067.
Bumps [tailwindcss](https://github.com/tailwindlabs/tailwindcss) from
3.4.6 to 3.4.7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/tailwindlabs/tailwindcss/releases">tailwindcss's
releases</a>.</em></p>
<blockquote>
<h2>v3.4.7</h2>
<h3>Fixed</h3>
<ul>
<li>Fix class detection in Slim templates with attached attributes and
ID (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/14019">#14019</a>)</li>
<li>Ensure attribute values in <code>data-*</code> and
<code>aria-*</code> modifiers are always quoted in the generated CSS (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/14037">#14037</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/tailwindlabs/tailwindcss/blob/v3.4.7/CHANGELOG.md">tailwindcss's
changelog</a>.</em></p>
<blockquote>
<h2>[3.4.7] - 2024-07-25</h2>
<h3>Fixed</h3>
<ul>
<li>Fix class detection in Slim templates with attached attributes and
ID (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/14019">#14019</a>)</li>
<li>Ensure attribute values in <code>data-*</code> and
<code>aria-*</code> modifiers are always quoted in the generated CSS (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/pull/14037">#14037</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="9824cb64a0"><code>9824cb6</code></a>
Update version in package.json</li>
<li><a
href="aa6c10f67f"><code>aa6c10f</code></a>
Add missing heading to changelog</li>
<li><a
href="245058c7fd"><code>245058c</code></a>
Update changelog for v3.4.7</li>
<li><a
href="605d8cd5eb"><code>605d8cd</code></a>
Update CHANGELOG.md</li>
<li><a
href="680c55c11c"><code>680c55c</code></a>
Normalize attribute selector for <code>data-*</code> and
<code>aria-*</code> modifiers (<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/14037">#14037</a>)</li>
<li><a
href="866860e6a6"><code>866860e</code></a>
Print eventual lightning CSS parsing errors when the CSS matcher fail
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/14034">#14034</a>)</li>
<li><a
href="bdc87ae1d7"><code>bdc87ae</code></a>
Fix class detection in Slim templates with attached attributes and IDs
(<a
href="https://redirect.github.com/tailwindlabs/tailwindcss/issues/14019">#14019</a>)</li>
<li>See full diff in <a
href="https://github.com/tailwindlabs/tailwindcss/compare/v3.4.6...v3.4.7">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps
[@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node)
from 20.14.12 to 22.0.2.
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This almost always indicate a user-impacting connectivity error. For
customers troubleshooting their Gateways by greping for `ERROR`, this
will make these much easier to find.
---------
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
With the upcoming feature of full-route tunneling aka an "Internet
Resource", we need to expand the reference state machine in
`tunnel_test`. In particular, packets to non-resources will now be
routed the gateway if we have previously activated the Internet
resource.
This is reasonably easy to model as we can see from the small diff.
Because `connlib` doesn't actually support the Internet resource yet,
the code snippet for where it is added to the list of all possible
resources to sample from is commented out.
When `tunnel_test` fails, it prints the initial state in verbose debug
formatting. Most of the fields in `RefClient` track state _during_ the
runtime of the test and are all empty initially. The same thing applies
to `Host`.
To make this output easier to read and scroll, we ignore some of these
fields in the debug output.