mirror of
https://github.com/outbackdingo/firezone.git
synced 2026-01-27 18:18:55 +00:00
feat(blog): Improving reliability for DNS Resources (#5469)
Need to make this post as a reference to link to from other places. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
This commit is contained in:
@@ -8,7 +8,6 @@ export default function _Page() {
|
||||
<Post
|
||||
authorName="Jamil Bou Kheir"
|
||||
authorTitle="Founder"
|
||||
authorEmail="jamil@firezone.dev"
|
||||
authorAvatarSrc={gravatar("jamil@firezone.dev")}
|
||||
title="April 2024 Update: GA"
|
||||
date="2024-04-01"
|
||||
|
||||
@@ -14,7 +14,6 @@ export default function Page() {
|
||||
<Post
|
||||
authorName="Jeff Spencer"
|
||||
authorTitle=""
|
||||
authorEmail="jeff@firezone.dev"
|
||||
authorAvatarSrc={gravatar("jeff@firezone.dev")}
|
||||
title="Enterprises choose open source"
|
||||
date="December 6, 2023"
|
||||
|
||||
@@ -8,8 +8,7 @@ export default function Page() {
|
||||
return (
|
||||
<Post
|
||||
authorName="Jamil Bou Kheir"
|
||||
authorTitle="Founder & CEO"
|
||||
authorEmail="jamil@firezone.dev"
|
||||
authorTitle="Founder"
|
||||
authorAvatarSrc={gravatar("jamil@firezone.dev")}
|
||||
title="Firezone 1.0"
|
||||
date="July 14, 2023"
|
||||
|
||||
@@ -8,7 +8,6 @@ export default function _Page() {
|
||||
<Post
|
||||
authorName="Gabriel Steinberg"
|
||||
authorTitle="Senior Backend Engineer"
|
||||
authorEmail="gabriel@firezone.dev"
|
||||
authorAvatarSrc={gravatar("gabriel@firezone.dev")}
|
||||
title="How DNS Works in Firezone"
|
||||
date="2024-05-08"
|
||||
|
||||
@@ -0,0 +1,18 @@
|
||||
"use client";
|
||||
import Post from "@/components/Blog/Post";
|
||||
import Content from "./readme.mdx";
|
||||
import gravatar from "@/lib/gravatar";
|
||||
|
||||
export default function _Page() {
|
||||
return (
|
||||
<Post
|
||||
authorName="Jamil Bou Kheir"
|
||||
authorTitle="Founder"
|
||||
authorAvatarSrc={gravatar("jamil@firezone.dev")}
|
||||
title="Improving reliability for DNS Resources"
|
||||
date="2024-06-20"
|
||||
>
|
||||
<Content />
|
||||
</Post>
|
||||
);
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
import _Page from "./_page";
|
||||
import { Metadata } from "next";
|
||||
|
||||
export const metadata: Metadata = {
|
||||
title: "Improving reliability for DNS Resources • Firezone Blog",
|
||||
description:
|
||||
"Client and Gateway versions 1.1 onwards include a more reliable DNS routing system.",
|
||||
};
|
||||
|
||||
export default function Page() {
|
||||
return <_Page />;
|
||||
}
|
||||
@@ -0,0 +1,109 @@
|
||||
**tl;dr**: [Upgrade your Gateway(s)](#how-to-upgrade) to 1.1.0 soon to improve
|
||||
reliability for DNS Resources.
|
||||
|
||||
In our [How DNS works in Firezone](/blog/how-dns-works-in-firezone) post, we
|
||||
covered how DNS Resources are resolved and routed reliably even when the IPs
|
||||
they resolve to collide. The system described there works well for the vast
|
||||
majority of our users across many kinds of networks.
|
||||
|
||||
But, as it turns out, not all networks are well-behaved (surprise!). Certain
|
||||
networks in particular can cause issues with DNS Resources, causing them to time
|
||||
out or fail to be resolved after a period of time.
|
||||
|
||||
This post describes why that happens, how we're resolving it, and the steps you
|
||||
can take to upgrade.
|
||||
|
||||
## The case of the NAT reset
|
||||
|
||||
The issue was first discovered about a month ago during our internal dogfood
|
||||
testing sessions. We noticed that after some time (typically 30 minutes to a few
|
||||
hours), DNS Resources would become unresponsive and require the application to
|
||||
issue another DNS query to perform the hole-punching dance and re-establish
|
||||
connectivity.
|
||||
|
||||
This is odd behavior -- tunnels are designed to be kept alive indefinitely with
|
||||
a periodic keep-alive sent from Client to Gateway.
|
||||
|
||||
### When tunnels drop
|
||||
|
||||
There are two obvious reasons why a tunnel might drop and need to be
|
||||
re-established:
|
||||
|
||||
- The Client experienced a change in network connectivity (e.g. switching Wi-Fi
|
||||
networks), or
|
||||
- The Gateway experienced a change in network connectivity (e.g. restarted by an
|
||||
admin)
|
||||
|
||||
A third, less obvious reason is when network in between the Client and Gateway
|
||||
is misbehaving.
|
||||
|
||||
### Google Cloud NAT
|
||||
|
||||
We dogfood Firezone internally across a variety of network conditions for both
|
||||
Client and Gateway. After some investigation, we discovered a curious pattern:
|
||||
the DNS Resource reliaibility issue only occurred for our Gateways running in
|
||||
Google Cloud.
|
||||
|
||||
After running an overnight soak test, we discovered that the issue happened at
|
||||
regular intervals. Precisely **every 30 minutes**, the WireGuard tunnel would
|
||||
drop, and connectivity to the DNS Resource would be lost. Since new tunnels for
|
||||
DNS Resources are established only at the time of resolution, the application
|
||||
(`ping` in our case) would lose connectivity until it was restarted.
|
||||
|
||||
Google doesn't publish details on the session lifetimes for their NAT Gateways,
|
||||
so we can't be sure if the problem is related to GCP or another router close to
|
||||
GCP's datacenters (if you happen to know, please email us!).
|
||||
|
||||
But the goal of this post isn't to pick on Google -- some enterprise routers
|
||||
behave similarly, under the guide of so-called "security" features, so the issue
|
||||
could occur in other networks as well.
|
||||
|
||||
## The solution
|
||||
|
||||
The solution is a simple, yet subtle one: instead of establishing the tunnel for
|
||||
a DNS Resource at the time of resolution, we now wait until we see the first
|
||||
packet for the Resource before performing the hole-punching dance to set up the
|
||||
tunnel.
|
||||
|
||||
The stub resolver maintains a list of mapped IPs to DNS Resources, so we know at
|
||||
the packet level which DNS Resource the packet is for, even long after the query
|
||||
has been resolved.
|
||||
|
||||
If the tunnel fails, the very next packet from the application will establish it
|
||||
again, avoiding the need for another query (which the application may not make)
|
||||
and thus avoiding reliability issues detailed above.
|
||||
|
||||
### NAT64 comes for free
|
||||
|
||||
One interesting edge case we hit implementing the above solution is that we
|
||||
don't know the _actual_ IP of the DNS Resource until the tunnel to the Gateway
|
||||
is established, at which point the Gateway resolves it.
|
||||
|
||||
Since the stub resolver now immediately returns a dummy IP when asked to do so,
|
||||
it could return an IPv4 address for a Resource that has only `AAAA` records
|
||||
defined, or vice versa. If the application chooses IPv4 to connect to the
|
||||
Resource, packets would arrive at the Gateway and suddenly need to be translated
|
||||
to IPv6.
|
||||
|
||||
So we added a NAT64 implemented to Gateways in 1.1.0 that handles this
|
||||
on-the-fly, with no configuration required. That means your workforce can now
|
||||
seamlessly connect to IPv6-only Resources even if they're on IPv4-only networks!
|
||||
|
||||
## How to upgrade
|
||||
|
||||
We released Gateway version 1.1.0 yesterday that includes the change. This
|
||||
version is compatible with Client versions 1.0.x **and** 1.1.x. However, Client
|
||||
versions 1.1.x **will not** be compatible with Gateway versions 1.0.x.
|
||||
|
||||
To give admins time to upgrade their Gateways, we are waiting to release the
|
||||
1.1.0 Clients until **Thursday, June 27th**. We recommend upgrading your
|
||||
Gateways to 1.1.0 as soon as possible to avoid any service disruptions caused by
|
||||
end users upgrading their Clients prematurely.
|
||||
|
||||
Upgrading Gateway(s) usually takes only a couple minutes --
|
||||
[read the docs](/kb/administer/upgrading) to see how.
|
||||
|
||||
### Conclusion
|
||||
|
||||
That's all for now. If you have questions or hit issues, contact us via one of
|
||||
the means [listed here](/support).
|
||||
@@ -13,7 +13,6 @@ export default function Page() {
|
||||
<Post
|
||||
authorName="Jamil Bou Kheir"
|
||||
authorTitle="Founder"
|
||||
authorEmail="jamil@firezone.dev"
|
||||
authorAvatarSrc={gravatar("jamil@firezone.dev")}
|
||||
title="January 2024 Update"
|
||||
date="2024-01-01"
|
||||
|
||||
@@ -8,7 +8,6 @@ export default function _Page() {
|
||||
<Post
|
||||
authorName="Jamil Bou Kheir"
|
||||
authorTitle="Founder"
|
||||
authorEmail="jamil@firezone.dev"
|
||||
authorAvatarSrc={gravatar("jamil@firezone.dev")}
|
||||
title="March 2024 Update"
|
||||
date="2024-03-01"
|
||||
|
||||
@@ -8,7 +8,6 @@ export default function _Page() {
|
||||
<Post
|
||||
authorName="Jamil Bou Kheir"
|
||||
authorTitle="Founder"
|
||||
authorEmail="jamil@firezone.dev"
|
||||
authorAvatarSrc={gravatar("jamil@firezone.dev")}
|
||||
title="May 2024 Update"
|
||||
date="2024-05-01"
|
||||
|
||||
@@ -21,9 +21,25 @@ export default function Page() {
|
||||
Announcements, insights, and more from the Firezone team.
|
||||
</p>
|
||||
<div className="mt-14 grid divide-y">
|
||||
<SummaryCard
|
||||
title="Improving reliability for DNS Resources"
|
||||
date="June 20, 2024"
|
||||
href="/blog/improving-reliability-for-dns-resources"
|
||||
authorName="Jamil Bou Kheir"
|
||||
authorAvatarSrc={gravatar("jamil@firezone.dev")}
|
||||
type="Announcement"
|
||||
>
|
||||
<p className="mb-2">
|
||||
We're making some changes to the way DNS Resources are routed in
|
||||
Firezone. These changes will be coming in Client and Gateway
|
||||
versions 1.1 and later. Continue reading to understand how these
|
||||
changes will affect your network and what you need to do to take
|
||||
advantage of them.
|
||||
</p>
|
||||
</SummaryCard>
|
||||
<SummaryCard
|
||||
title="Using Tauri to build a cross-platform security app"
|
||||
date="Jun 11, 2024"
|
||||
date="June 11, 2024"
|
||||
href="/blog/using-tauri"
|
||||
authorName="ReactorScram"
|
||||
authorAvatarSrc="/images/avatars/reactorscram.png"
|
||||
@@ -35,7 +51,7 @@ export default function Page() {
|
||||
</p>
|
||||
</SummaryCard>
|
||||
<SummaryCard
|
||||
title="How DNS Works in Firezone"
|
||||
title="How DNS works in Firezone"
|
||||
date="May 8, 2024"
|
||||
href="/blog/how-dns-works-in-firezone"
|
||||
authorName="Gabriel Steinberg"
|
||||
@@ -50,7 +66,7 @@ export default function Page() {
|
||||
</p>
|
||||
</SummaryCard>
|
||||
<SummaryCard
|
||||
title="May 2024 Update"
|
||||
title="May 2024 update"
|
||||
date="May 1, 2024"
|
||||
href="/blog/may-2024-update"
|
||||
authorName="Jamil Bou Kheir"
|
||||
@@ -77,7 +93,7 @@ export default function Page() {
|
||||
</div>
|
||||
</SummaryCard>
|
||||
<SummaryCard
|
||||
title="April 2024 Update: GA"
|
||||
title="April 2024 update: GA"
|
||||
date="April 1, 2024"
|
||||
href="/blog/apr-2024-update"
|
||||
authorName="Jamil Bou Kheir"
|
||||
@@ -112,7 +128,7 @@ export default function Page() {
|
||||
</ul>
|
||||
</SummaryCard>
|
||||
<SummaryCard
|
||||
title="March 2024 Update"
|
||||
title="March 2024 update"
|
||||
date="March 1, 2024"
|
||||
href="/blog/mar-2024-update"
|
||||
authorName="Jamil Bou Kheir"
|
||||
@@ -136,7 +152,7 @@ export default function Page() {
|
||||
</ul>
|
||||
</SummaryCard>
|
||||
<SummaryCard
|
||||
title="Jaunary 2024 Update"
|
||||
title="January 2024 update"
|
||||
date="January 1, 2024"
|
||||
href="/blog/jan-2024-update"
|
||||
authorName="Jamil Bou Kheir"
|
||||
|
||||
@@ -12,8 +12,7 @@ export default function Page() {
|
||||
return (
|
||||
<Post
|
||||
authorName="Jamil Bou Kheir"
|
||||
authorTitle="Founder & CEO"
|
||||
authorEmail="jamil@firezone.dev"
|
||||
authorTitle="Founder"
|
||||
authorAvatarSrc={gravatar("jamil@firezone.dev")}
|
||||
title="Firezone 0.5.0 Released!"
|
||||
date="July 25, 2022"
|
||||
|
||||
@@ -12,8 +12,7 @@ export default function Page() {
|
||||
return (
|
||||
<Post
|
||||
authorName="Jamil Bou Kheir"
|
||||
authorTitle="Founder & CEO"
|
||||
authorEmail="jamil@firezone.dev"
|
||||
authorTitle="Founder"
|
||||
authorAvatarSrc={gravatar("jamil@firezone.dev")}
|
||||
title="Firezone 0.6.0 Released!"
|
||||
date="October 17, 2022"
|
||||
|
||||
@@ -13,7 +13,6 @@ export default function Page() {
|
||||
<Post
|
||||
authorName="Jeff Spencer"
|
||||
authorTitle=""
|
||||
authorEmail="jeff@firezone.dev"
|
||||
authorAvatarSrc={gravatar("jeff@firezone.dev")}
|
||||
title="Secure remote access makes remote work a win-win"
|
||||
date="November 17, 2023"
|
||||
|
||||
@@ -7,7 +7,6 @@ export default function _Page() {
|
||||
<Post
|
||||
authorName="ReactorScram"
|
||||
authorTitle="Senior Systems Engineer"
|
||||
authorEmail="ReactorScram@users.noreply.github.com"
|
||||
authorAvatarSrc="/images/avatars/reactorscram.png"
|
||||
title="Using Tauri to build a cross-platform security app"
|
||||
date="2024-06-11"
|
||||
|
||||
@@ -3,7 +3,6 @@ import Image from "next/image";
|
||||
export default function Post({
|
||||
authorName,
|
||||
authorTitle,
|
||||
authorEmail,
|
||||
authorAvatarSrc,
|
||||
title,
|
||||
date,
|
||||
@@ -11,7 +10,6 @@ export default function Post({
|
||||
}: {
|
||||
authorName: string;
|
||||
authorTitle: string;
|
||||
authorEmail: string;
|
||||
authorAvatarSrc: string;
|
||||
title: string;
|
||||
date: string;
|
||||
|
||||
Reference in New Issue
Block a user