This PR resolves more drift between staging and prod in the area of SSH
firewall rules:
- Adds IPv6 SSH IAP access to prod firewall
- Removes SCTP and UDP port 22 access (this is not required by SSH)
- Enforces IAP for SSH access on staging
- Enable firewall logging for SSH on staging
Bumps [os_info](https://github.com/stanislav-tkach/os_info) from 3.9.2
to 3.10.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/stanislav-tkach/os_info/releases">os_info's
releases</a>.</em></p>
<blockquote>
<h2>os_info 3.10.0</h2>
<ul>
<li>Bluefin Linux support has been added. (<a
href="https://redirect.github.com/stanislav-tkach/os_info/issues/394">#394</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/stanislav-tkach/os_info/blob/master/CHANGELOG.md">os_info's
changelog</a>.</em></p>
<blockquote>
<h2>[3.10.0] (2025-02-09)</h2>
<ul>
<li>Bluefin Linux support has been added. (<a
href="https://redirect.github.com/stanislav-tkach/os_info/issues/394">#394</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0554ec580d"><code>0554ec5</code></a>
Merge pull request <a
href="https://redirect.github.com/stanislav-tkach/os_info/issues/396">#396</a>
from stanislav-tkach/release-3-10-0</li>
<li><a
href="9a0980c375"><code>9a0980c</code></a>
Fix markdown indent</li>
<li><a
href="d189e7de3f"><code>d189e7d</code></a>
Release the 3.10.0 version</li>
<li><a
href="6d7ea4f231"><code>6d7ea4f</code></a>
Merge pull request <a
href="https://redirect.github.com/stanislav-tkach/os_info/issues/395">#395</a>
from stanislav-tkach/fix-spellcheck</li>
<li><a
href="e81339bd5d"><code>e81339b</code></a>
Fix spellcheck</li>
<li><a
href="9c6d24ead9"><code>9c6d24e</code></a>
Merge pull request <a
href="https://redirect.github.com/stanislav-tkach/os_info/issues/394">#394</a>
from sargunv/feature/add-bluefin-support</li>
<li><a
href="a41a664650"><code>a41a664</code></a>
feat: Add support for Bluefin Linux</li>
<li>See full diff in <a
href="https://github.com/stanislav-tkach/os_info/compare/v3.9.2...v3.10.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The `wintun` crate will already shutdown the session for us when the
last instance of `Session` gets dropped. Shutting down the session prior
to that already results in an attempt to close an adapter that is no
longer present, causing WinTUN to log (unactionable) errors.
Every time we start a new session, our telemetry context potentially
changes, i.e. the user may sign into a new account. This should ensure
that both the IPC service and the GUI always use the most up-to-date
`account_slug` as part of Sentry events. In addition, this will also set
the `account_slug` for clients that just signed in. Previously, the
`account_slug` would only get populated on the next start of the client.
With the internet site changes now in, editing the Internet Resource is
impossible.
As such, the old instructions for using the Internet Resource no longer
apply, and we need to make sure the Internet Site and Internet Resource
are linked.
This migration ensures that's the case. However, if the internet
resource is currently connected to another site already, we don't move
it. This is only for internet resources that aren't connected to any
sites yet.
When flushing metrics to GCP, we sometimes get the following error:
```
{400, "{\n \"error\": {\n \"code\": 400,\n \"message\": \"One or more TimeSeries could not be written: timeSeries[0-51]: write for resource=gce_instance{zone:us-east1-d,instance_id:6130184649770384727} failed with: One or more points were written more frequently than the maximum sampling period configured for the metric.\",\n \"status\": \"INVALID_ARGUMENT\",\n \"details\": [\n {\n \"@type\": \"type.googleapis.com/google.monitoring.v3.CreateTimeSeriesSummary\",\n \"totalPointCount\": 52,\n \"successPointCount\": 48,\n \"errors\": [\n {\n \"status\": {\n \"code\": 9\n },\n \"pointCount\": 4\n }\n ]\n }\n ]\n }\n}\n"}
```
It would be helpful to know exactly which metrics are failing to flush
so we can further troubleshoot any issues.
By setting the `system_certs` cfg at compile-time, any TLS connections
from `phoenix-channel` will use the system-provided CA store instead of
the embedded one.
Resolves: #8065
Co-authored-by: oddlama <oddlama@oddlama.org>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
We can elegantly handle nil items in places where we currently don't.
This PR updates all cases in MenuBar.swift to gracefully handle nil
items like the menubar icons which can, in rare circumstances, be `nil`
if they haven't yet loaded.
Alternative to #8128. If the user dismissed the unlock prompt or has
their keyring otherwise misconfigured, it is still useful to allow them
to sign-in. They just won't stay signed-in across reboots of the device.
When we receive an inbound packet from the TUN device on the Gateway, we
make a lookup in the NAT table to see if it needs to be translated back
to a DNS proxy IP.
At present, non-existence of such a NAT entry results in the packet
being sent entirely unmodified because that is what needs to happen for
CIDR resources. Whilst that is important, the same code path is
currently being executed for DNS resources whose NAT session expired!
Those packets should be dropped instead which is what we do with this
PR.
To differentiate between not having a NAT session at all or whether a
previous one existed but is expired now, we keep around all previous
"outside" tuples of NAT sessions around. Those are only very small in
their memory-footprint. The entire NAT table is scoped to a connection
to the given peer and will thus eventually freed once the peer
disconnects. This allows us to reliably and cheaply detect, whether a
packet is using an expired NAT session. This check must be cheap because
all traffic of CIDR resources and the Internet resource needs to perform
this check such that we know that they don't have to be translated.
This might be the source of some of the "Source not allowed" errors we
have been seeing in client logs.
At present, `connlib` communicates with its host app via callbacks.
These callbacks are executed synchronously as part of `connlib`s
event-loop, meaning `connlib` cannot do anything else whilst the
callback is executing in the host app. Additionally, this callback runs
within the `Future` that represents `connlib` and thus runs on a `tokio`
worker thread.
Attempting to interact with the session from within the callback can
lead to panics, for example when `Session::disconnect` is called which
uses `Runtime::block_on`. This isn't allowed by `tokio`: You cannot
block on the execution of an async task from within one of the worker
threads.
To solve both of these problems, we introduce a thread-pool of size 1
that is responsible for executing `connlib` callbacks. Not only does
this allow `connlib` to perform more work such as routing packets or
process portal messages, it also means that it is not possible for the
host app to cause these panics within the `tokio` runtime because the
callbacks run on a different thread.
We appear to have caused a pretty big performance regression (~40%) in
037a2e64b6 (identified through
`git-bisect`). Specifically, the regression appears to have been caused
by [`aef411a`
(#7605)](aef411abf5).
Weirdly enough, undoing just that on top of `main` doesn't fix the
regression.
My hypothesis is that using the same file descriptor for read AND write
interests on the same runtime causes issues because those interests are
occasionally cleared (i.e. on false-positive wake-ups).
In this PR, we spawn a dedicated thread each for the sending and
receiving operations of the TUN device. On unix-based systems, a TUN
device is just a file descriptor and can therefore simply be copied and
read & written to from different threads. Most importantly, we only
construct the `AsyncFd` _within_ the newly spawned thread and runtime
because constructing an `AsyncFd` implicitly registers with the runtime
active on the current thread.
As a nice benefit, this allows us to get rid of a `future::select`.
Those are always kind of nasty because they cancel the future that
wasn't ready. My original intuition was that we drop packets due to
cancelled futures there but that could not be confirmed in experiments.
Notifications on Apple platforms are delivered with best-effort
reliability and are not guaranteed.
They can also be queued up by the system so that, for example, it's
possible to issue a notification, quit the app, and then upon the next
launch of the app, receive the notification.
In this second case, if the user dismissed the notification, we will
crash. This is because we only track the `lastNotifiedVersion` in the
`NotificationAdapter` instance object and don't persist it to disk, then
we assert the value not to be nil when saving the user's `dismiss`
action.
To fix this, we persist the `lastNotifiedVersion` to the `UserDefaults`
store and attempt to read this when the user is dismissing the
notification. If we can't read it for some reason, we still dismiss the
notification but won't prevent showing it again on the next update
check.
A minor bug is also fixed where the original author didn't correctly
call the function's `completionHandler`. Also, unused instance vars
`lastDismissedVersion` left over from the original author are removed as
well.
Fixes
https://github.com/firezone/firezone/actions/runs/13293765777/job/37121384825
```
╷
│ Error: Failed to query available provider packages
│
│ Could not retrieve the list of available versions for provider
│ hashicorp/aws: locked provider registry.terraform.io/hashicorp/aws 5.64.0
│ does not match configured version constraint >= 3.29.0, >= 5.[79](https://github.com/firezone/firezone/actions/runs/13293765777/job/37121384825#step:8:80).0; must use
│ terraform init -upgrade to allow selection of new versions
│
│ To see which modules are currently depending on hashicorp/aws and what
│ versions are specified, run the following command:
│ terraform providers
╵
```
In the window of time between we check `AdapterState == .tunnelStarted`
and we call `setDns` in the Apple `pathUpdateHandler`, it's possible
that connlib disconnected. This window of time could potentially be
non-trivial since we read system resolvers in there, which hits the
disk.
As such, we should always check the `session` pointer is valid just
before use.
The `AdapterState` enum tracks two states: `tunnelStopped` and
`tunnelStarted`. In the `tunnelStarted` state, we populate a
`WrappedSession` object. This is redundant - connlib is either
`connected` and we have a `WrappedSession`, or it is not. Therefore we
can remove the `AdapterState` abstraction completely (which was leftover
from a previous developer) and directly use a `WrappedSession?` object
to issue calls to connlib with.
We set this to a valid `WrappedSession` upon connecting, and back to
`nil` as soon as connlib either `onDisconnect`s us, or the user
disconnects the tunnel.
Lastly, we avoid early-returning from queued workItems because we now
call connlib with `session?` which will no-op if there is no session,
allowing whatever IPC call running at the time (such as fetchResources)
to complete successfully, even though they'll see a "snapshotted" state
of the Adapter/PacketTunnelProvider. In other words, we no longer
enforce the session pointer to be valid for things that don't depend on
its state.
Fixes#7882
With the dependency bump in #7995, we introduced a visual regression
that made all windows lose their styling:

The changelog to the v4 bump actually mentions some breaking changes and
an automated upgrade tool but both the reviewer and the author of the PR
missed that.
This allows connections to the postgresql database via the standard
socket, which - opposed to TCP sockets - allows `peer` authentication
based on local unix users. This removes the need for a password and is
much simpler to deploy when running components locally.
In the current form, `DATABASE_SOCKET_DIR` takes precedence over
hostname, if the environment variable is present. I found that
`compile_config!` somehow enforces a value to be present which is
explicitly not what I want for some of these values (i think). I'd be
glad if anyone with more elixir experience can guide me as to how I can
make this more idiomatic.
---------
Supersedes: #8044
Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
Co-authored-by: oddlama <oddlama@oddlama.org>
To my knowledge we don't rely on this particular functionality from
hackney. Unfortunately, we don't control the `hackney` version used by
deps, and there is no non-vulnerable version ready yet, so we ignore the
advisory for now.
A fuse has been set to fire one week from now.
Ok, the reason why we're still getting the error `One or more points
were written more frequently than the maximum sampling period configured
for the metric.` is because the metric points are identified by the
labels in the metric, and so are "aggregated" more frequently than our
API calls.
By adding the node name to the labels, we scope the metric by that node
and prevent inserting the points more often than our API calls.
After further investigation, it appears that the `NSImage` initializer
loads and decodes images *synchronously* from the disk. In the MenuBar,
we are "lazy-loading" these images, but since the menu is constructed as
part of app initialization, we are effectively loading these when the
app boots, in `FirezoneApp`.
After loading, these are cached, but the initial can hang the UI thread
on app launch for slow systems.
Unfortunately, `NSImage` does not _formally_ conform to `@Sendable`.
However, this may be a nuance that isn't true in most cases, such as
when treating `NSImage` instances as read-only from only a single
thread.
As such, we wrap `NSImage` with our own struct, and mark it `@unchecked
Sendable`. This allows us to load the images on a background thread and
assign them to their UI thread counterparts in an async manner.
See further discussion:
-
https://forums.swift.org/t/why-cant-i-send-an-nsimage-across-actor-boundaries/76199
-
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Multithreading/ThreadSafetySummary/ThreadSafetySummary.html#//apple_ref/doc/uid/10000057i-CH12-126728
Related: #7771