[Step
2](https://cloud.google.com/sql/docs/postgres/pg-audit#set-pgaudit-flag-values)
of the pgaudit setup guide for Google Cloud SQL. It would be good to
have detailed pg audit logs on the master application instance in case
things go wrong.
Notably, this prevents erroring out when the `pgaudit` is not available,
which by default, it is. Enabling the `pgaudit` extension for our dev
instance is left as a future endeavor.
Supersedes #5442
The submit button on the settings -> dns page has a couple UX issues
with the new search domain section:
- It's ambiguous what the `Save` is actually saving
- The spacing makes it look like it's only saving upstream resolvers
This PR introduces a simple fix that address the two issues by:
- Updating the button text to `Save DNS Settings`
- Increasing spacing between submit button and form elements
- Slightly decreasing spacing between the `search domain` and `upstream
resolvers` inputs
<img width="968" alt="Screenshot 2025-03-14 at 12 06 02 AM"
src="https://github.com/user-attachments/assets/651f54c8-3b5f-4747-ad3a-e2ae32eccbf0"
/>
Related #5248
I suspect that one issue as part local discovery is that we respond to
LLMNR queries with NXDOMAIN if the domain isn't a resource. This is
probably wrong. LLMNR works over multicast so if a particular interface
can't respond to a query with records, it should probably not respond at
all.
Related: #8266
In order for search-domains to work on Windows, we need to set the
`SearchList` registry key for our interface. This will result in Windows
sending us a DNS query with the expanded domain name from the search
list which we can then process like normal DNS queries.
Related: #8410
Our Posthog integration was so lenient in regards to errors that I
didn't even notice at all that we failed to deserialise them correctly.
In Posthog, I configured the feature flags with `kebab-case` but we
tried to deserialise them as `snake_case`.
In order to have the system expand search domains for us, we need to set
a very peculiar combination of configuration options in the
`NEDNSSettings` of the VPN configuration:
- We need to include our search domains in the list of `matchDomains`
- We need to set `matchDomainsNoSearch = false`
- We need to set the `searchDomains` field
Technically, we don't even need to set `searchDomains` by itself.
Reading the docs in more detail for the `matchDomainsNoSearch` flag
explains why:
> A Boolean that specifies if the domains in the matchDomains list
should not be appended to the resolver’s list of search domains.
The double-negative here is confusing but essentially, what this says
is:
> If false, append the list of match domains to the resolver's search
domains.
That is exactly what we want. We want a search domain of e.g.
`example.com` to append to the list of search domains for the primary
resolver of non-scoped DNS queries.
I tested without setting `searchDomains` and it does still work: The
system will still expand the domain for us und send us a FQDN query of
e.g. `foo.example.com`. However, I figured not setting `searchDomains`
at all is quite confusing so I left it in there.
Related: #8410 (Fixes it for MacOS)
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: thomas <firezone@firezones-MacBook-Air.fritz.box>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
Why:
* This commit updates the 500 error page in the portal to have the same
look and feel of the 404 error page in order to be consistent within the
portal UI.
- Adds a simple text input to configure search domains ("default DNS
suffix") in the Settings -> DNS page.
- Sends the `search_domain` field as part of the client's `init` message
- Fixes a minor UI alignment inconsistency for the upstream resolvers
field so that the total form width and `New resolver` button width are
the same.
<img width="1137" alt="Screenshot 2025-03-09 at 10 56 56 PM"
src="https://github.com/user-attachments/assets/a1d5a570-8eae-4aa9-8a1c-6aaeb9f4c33a"
/>
Fixes#8365
When debugging Firezone, it is useful to use `tail -f` on the current
logfile to see what `connlib` is doing. This is quite annoying to do
however because the log file rolls over with every restart of the
application. As a small QoL improvement, we always symlink the latest
log file to a link called `latest`. Therefore, all one needs to do is
re-run the latest `tail -f ./latest` command to get the new logs.
Resolves: #8388
Proptests found this one. It can't happen in practice because we don't
expose disabling arbitrary resources to the Client's UI, only the
Internet Resource can be enabled / disabled.
Most of the time, these flags are only read from and not written thus.
By using a read-lock, we make sure that even when we use feature-flags
from multiple threads, they don't cause any contention.
A sizeable chunk of Firezone's Rust components deal with parsing,
manipulating and emitting DNS queries and responses. The API surface of
DNS is quite large and to make handling of all corner-cases easier, we
depend on the `domain` library to do the heavy-lifting for us.
For better or worse, `domain` follows a lazy-parsing approach. Thus,
creating a new DNS message doesn't actually verify that it is in fact
valid. Within Firezone, we make several assumptions around DNS messages,
such as that they will only ever contain a single question.
Historically, DNS allows for multiple questions per query but in
practise, nobody uses that.
Due to how we handle DNS in Firezone, manipulating these messages
happens in multiple places. That combined with the lazy-parsing approach
from `domain` warrants having our own `dns-types` library that wraps
`domain` and provides us with types that offer the interface we need in
the rest of the codebase.
Resolves: #7019
Search domains are a way of performing a DNS lookup without typing the
full-qualified domain name. For example, with a search domain of
`example.com`, performing a DNS query for `app` will automatically
expand the query to `app.example.com`. At present, this doesn't work
with Firezone because there is no way to configure an account-wide
search-domain.
With this PR, we extend the `Interface` message sent by the portal to
also include an optional `search_domain` field that must be a valid
domain name. If set, `connlib`'s DNS stub resolver will now append this
domain to all single-label queries and match the resulting domain
against all active DNS resource.
On Linux - with `systemd-resolved` as the DNS backend - we need to set
the search domain on the TUN interface as well and enable LLMNR in order
to be able to intercept these queries. `resolved` expands the query for
us, however, meaning with this configuration, we don't actually receive
a single-label query in `connlib`. Instead, we directly see
`app.example.com` when we type `host app` or `dig +search app` and have
`example.com` as our search domain.
MacOS has a similar system but with a different fallack. There, the
operating system will first try all configured search domains on the
system (typically just the ones set prior to Firezone starting), and
send queries for FQDN to all resolvers. If none of the resolvers
(including Firezone's stub resolver) return results, it sends the
single-label query directly to the primary resolver. To handle this
case, Firezone needs to know about the search-domain and expand it
itself when it receives the single-label query. In the future, we may
want to look into how we can configure MacOS such that it performs this
expansion for us.
On Windows and Android, queries for a single-label domain will be
directly sent to Firezone's stub resolver where we then hit the same
codepath as explained above.
Specifically, the way this codepath works is that if we receive a
single-label query AND we have a search-domain set, we expand it and
match that particular query against our list of resources. In every
other case, we continue on with the single-label domain.
Related: #8365Fixes: #8377
Introduces a simple `search_domain` field embed into our existing
`Accounts.Account.Config` embedded schema. This will be sent to clients
to append to single-label DNS queries.
UI and API changes will come in subsequent PRs: this one adds field and
(lots of) validations only.
Related: #8365
In order to more safely roll out certain changes, being able to
runtime-toggle features is crucial. For this purpose, we build a simple
integration with Posthog that allows us to evaluate feature flags based
on the Firezone ID of a Client or Gateway.
The feature flags are also set in a dedicated context for Sentry events.
This allows us to see, which feature flags were active when a certain
error is logged to Sentry.
- Adds more actor groups to the existing `oidc_provider`
- Configures a rand seed so our seed data is reproducible across
machines
- Formats the seeds file to allow for some refactoring a later PR
- Adds a `Mock` identity provider adapter with sync enabled
UDP is an unreliable transport and thus it can happen that a UDP DNS
query gets lost in transit. Our current algorithm for picking a
nameserver of all provided ones only uses UDP DNS and thus, we may run
into a scenario where we falsely claim to not have nameservers simply
because the UDP request or response got lost in transit.
To mitigate this, we also perform a TCP DNS query to every nameserver.
TCP is reliable and will perform retransmissions in case of packet loss.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The DNS server added in #8285 was only a dummy DNS server that added
infrastructure to actually receive DNS queries on the IP of the TUN
device at port 53535 and it returns SERVFAIL for all queries. For this
DNS server to be useful, we need to take those queries and replay them
towards a DNS server that is configured locally on the Gateway.
To achieve this, we parse `/etc/resolv.conf` during startup of the
Gateway and pass the contained nameservers into the tunnel. From there,
the Gateway's event-loop can receive the queries, feed them into the
already existing machinery for performing recursive DNS queries that we
use on the Client and resolve the records.
In its current implementation, we only use the first nameserver defined
in `/etc/resolv.conf`. If the lookup fails, we send back a SERVFAIL
error and log a message.
Resolves: #8221
Rather than the current behavior of raising a 500 when we receive
missing / invalid params in IdP auth callbacks, it would be helpful to
show the user which params were provided, in case the IdP has set
anything useful to aid the user.
For example, we recently received these params from `okta` for a pilot
account (and subsequently rendered them a 500):
```
%{"account_id_or_slug" => "<redacted>", "error" => "access_denied", "error_description" => "User is not assigned to the client application.", "provider_id" => "<redacted>", "state" => "<redacted>"}
```
Prior to #8334, we had some logic within the test-suite to only reset
the TCP DNS client if the DNS mapping actually changed. This is
problematic because adding / removing CIDR resources from `connlib` may
cause packets to suddenly be re-routed to a different site. Consider the
case where the Internet Resource is active and we make a DNS query. The
query will be routed to the Internet site. If we then add a CIDR
resource to `connlib` that happens to match the DNS server that is set
as an upstream server, all new packets emitted by the TCP DNS client
will be routed to that new site. However, the DNS server we are talking
to doesn't recognise the new source port as it is routed via a different
Gateway.
This is in fact also a problem with TCP connections in general within
`connlib` when changes to the routing table happen and already tracked
in #7081.
To fix the tests, we need to always reset the DNS servers and the TCP
DNS client whenever any changes to the routes or the DNS mapping
happens.
These are currently woefully underutilized. If anything, our DB instance
size matters the most.
Thus, this PR updates the environments pointer to a sha that:
- Decreases all portal instance sizes to `e2-micro`, which are about 10x
cheaper than our current `n4-standard-2`.
- Doubles the number of each `api, web, domain` instances to `4` instead
of `2` - in case one of each goes down, we'll still have high
availability of 3 instances
- Removes some leftover client-logs cruft we no longer used
Fixes#8344
Bumps [postgrex](https://github.com/elixir-ecto/postgrex) from 0.19.3 to
0.20.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/elixir-ecto/postgrex/blob/master/CHANGELOG.md">postgrex's
changelog</a>.</em></p>
<blockquote>
<h2>v0.20.0 (2025-02-05)</h2>
<ul>
<li>
<p>Deprecations</p>
<ul>
<li>Deprecate <code>:search_path</code> and use <code>:parameters</code>
option instead</li>
</ul>
</li>
<li>
<p>Bug fixes</p>
<ul>
<li>Ensure <code>Duration</code> type returns same units as
<code>Postgrex.Interval</code></li>
<li>Call disconnect on protocol when reconnecting in
<code>Postgrex.ReplicationConnection</code></li>
<li>Call disconnect only if there is protocol in
<code>Postgrex.SimpleConnection</code></li>
</ul>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c2af85d8eb"><code>c2af85d</code></a>
Release v0.20.0 (with Elixir v1.19 warnings fixed)</li>
<li><a
href="b50103a939"><code>b50103a</code></a>
Release v0.20.0</li>
<li><a
href="51ccbdd1d5"><code>51ccbdd</code></a>
Update postgrex.ex</li>
<li><a
href="34a57fe359"><code>34a57fe</code></a>
Deprecate <code>:search_path</code> and use <code>:parameters</code>
option instead (<a
href="https://redirect.github.com/elixir-ecto/postgrex/issues/729">#729</a>)</li>
<li><a
href="928e43a816"><code>928e43a</code></a>
Have Duration return same units as Postgrex.Interval (<a
href="https://redirect.github.com/elixir-ecto/postgrex/issues/728">#728</a>)</li>
<li><a
href="a6f20205a3"><code>a6f2020</code></a>
Call disconnect on protocol when reconnecting in Replication connection
(<a
href="https://redirect.github.com/elixir-ecto/postgrex/issues/726">#726</a>)</li>
<li><a
href="9748fcbbd7"><code>9748fcb</code></a>
Update dependencies with warnings (<a
href="https://redirect.github.com/elixir-ecto/postgrex/issues/723">#723</a>)</li>
<li><a
href="c3097f429a"><code>c3097f4</code></a>
More safety checks around comments (<a
href="https://redirect.github.com/elixir-ecto/postgrex/issues/722">#722</a>)</li>
<li><a
href="6d9e2ca81a"><code>6d9e2ca</code></a>
Minor link correction and moduledoc cleanup (<a
href="https://redirect.github.com/elixir-ecto/postgrex/issues/720">#720</a>)</li>
<li><a
href="cebb02f923"><code>cebb02f</code></a>
Disconnect only if there is a protocol</li>
<li>See full diff in <a
href="https://github.com/elixir-ecto/postgrex/compare/v0.19.3...v0.20.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
At present, the TCP DNS server we use in `connlib` exposes an opaque
`SocketHandle` with each received query. This handle refers to the
socket that the query was received on. The response needs to be sent
back on the same socket because it effectively refers to the TCP stream
that was established.
We need to track this `SocketHandle` all the way through to our
user-space DNS client in `connlib` which actually resolves queries with
a DNS server. In order to be able to reuse this DNS client on the
Gateway where we receive DNS queries using a user-space socket (and thus
don't have such a `SocketHandle`), we need to remove this abstraction
from the public API of the TCP DNS server.
A TCP stream is effectively identified by the source and destination
socket address: A given 4-tuple (source IP, source port, destination IP,
destination port) can only ever hold a single TCP connection. As such,
returning the local and remote `SocketAddr` with the query is sufficient
to uniquely identify the socket.
Adds the following endpoints:
- `PUT /clients/:id` for updating the `name`
- `PUT /clients/:client_id/verify` for verifying a client
- `PUT /clients/:client_id/unverify` for unverifying a client
- `GET /clients` for listing clients in an account
- `GET /clients/:id` for getting a single client
- `DELETE /clients/:id` for deleting a client
Related: #8081
If the websocket connection between a relay and the portal experiences a
temporary network split, the portal will immediately send the
disconnected id of the relay to any connected clients and gateways, and
all relayed connections (and current allocations) will be immediately
revoked by connlib.
This tight coupling is needlessly disruptive. As we've seen in staging
and production logs, relay disconnects can happen randomly, and in the
vast majority of cases immediately reconnect. Currently we see about 1-2
dozen of these **per day**.
To better account for this, we introduce a debounce mechanism in the
portal for `relays_presence` disconnects that works as follows:
- When a relay disconnects, record its `stamp_secret` (this is somewhat
tricky as we don't get this at the time of disconnect - we need to cache
it by relay_id beforehand)
- If the same `relay_id` reconnects again with the same `stamp_secret`
within `relays_presence_debounce_timeout` -> no-op
- If the same `relay_id` reconnects again with a **different**
`stamp_secret` -> disconnect immediately
- If it doesn't reconnect, **then** send the `relays_presence` with the
disconnected_id after the `relays_presence_debounce_timeout`
There are several ways connlib detects a relay is down:
1. Binding requests time out. These happen every 25s, so on average we
don't know a Relay is down for 12.5s + backoff timer.
2. `relays_presence` - this is currently the fastest way to detect
relays are down. With this change, the caveat is we will now detect this
with a delay of `relays_presence_debounce_timer`.
Fixes#8301
The `e2-micro` instances we'll be rolling out have 1G of memory (which
should be plenty), but it would be helpful to be able to handle small
spikes without getting OOM-killed.
Related #8344
## Description
We want to resolve DNS queries of type SRV & TXT for DNS resources
within the network context of the site that is hosting the DNS resource
itself. This allows admins to e.g. deploy dedicated nameservers into
those sites and have them resolve their SRV and TXT records to names
that are scoped to that particular site.
SRV records themselves return more domains which - if they are
configured as DNS resources - will be intercepted and then routed to the
correct site.
Prior to this PR, SRV & TXT records got resolved by the DNS server
configured on the client (or the server defined in the Firezone portal),
even if the domain in question was a DNS resource. This effectively
meant that those SRV records have to be valid globally and could not be
specific to the site that the DNS resource is hosted in.
## Example
Say we have these wildcard DNS resources:
- `**.department-a.example.com`
- `**.department-b.example.com`
Each of these DNS resources is assigned to a different site. If we now
issue an SRV DNS query to `_my-service.department-a.example.com`, we may
receive back the following records:
- `_my-service.department-a.example.com. 86400 IN SRV 10 60 8080
my-service1.department-a.example.com.`
- `_my-service.department-a.example.com. 86400 IN SRV 10 60 8080
my-service2.department-a.example.com.`
- `_my-service.department-a.example.com. 86400 IN SRV 10 60 8080
my-service3.department-a.example.com.`
Notice how the SRV records point to domains that will also match the
wildcard DNS resource above! If that is the case, Firezone will also
intercept A & AAAA queries for this service (which are a natural
follow-up from an application making an SRV query). As a result, traffic
for `my-service1.department-a.example.com` will be routed to the same
site the DNS resource is defined in. If the returned domains don't match
the wildcard DNS resource, the traffic will either not be intercepted at
all (if it is not a DNS resource) or routed to whichever site defines
the corresponding DNS resource.
All of these scenarios may be what the admin wants. If the SRV records
defined for the DNS resource are globally valid (and e.g. not even
resources), then resolving them using the Client's system resolver may
be all that is needed. If the services are running in a dedicated site,
that traffic should indeed be routed to that site.
As such, Firezone itself cannot make any assumption about the structure
of these records at all. The only thing that is enabled with this PR is
that IF the structure happens to match the same DNS resource, it allows
admins to deploy site-specific services that resolve their concrete
domains via SRV records.
## Testing
The implementation is tested using our property-based testing framework.
In order to cover these cases, we introduce the notion of site-specific
DNS records which are sampled when we create each individual Gateway.
When selecting a domain to query for, all global DNS records and the
site-specific ones are merged and a domain name and query type is chosen
at random.
At present, this testing framework does not assert that the DNS response
itself is correct, i.e. that it actually returned the site-specific
record. We don't assert this for any other DNS queries, hence this is
left for a future extension. We do assert using our regression grep's
that we hit the codepath of querying an SRV or TXT record for a DNS
resource.
Related: #8221