fix(snownet): compare preshared_key on connection upsert (#9999)

By chance, I've discovered in a CI failure that we won't be able to
handshake a new session if the `preshared_key` changes. This makes a lot
of sense. The `preshared_key` needs to be the same on both ends as it is
a shared secret that gets mixed into the Noise handshake.

In following sequence of events, we would thus previously run into a
"failed to decrypt handshake packet" scenario:

1. Client requests a connection.
2. Gateway authorizes the connection.
3. Portal restarts / gets deployed. To my knowledge, this will rotate
the `preshared_key` to a new secret. Restarting the portal also cuts all
WebSockets and therefore, the Gateways response never arrives.
4. Client reconnects to the WebSocket, requests a new connection.
5. Gateway reuses the local connection but this connection still uses
the old `preshared_key`!
6. Client needs to wait for the Gateway's ICE timeout before it can
establish a new connection.

How exactly (3) happens doesn't matter. There are probably other
conditions as to where the WebSocket connections get cut and we cannot
complete our connection handshake.
This commit is contained in:
Thomas Eizinger
2025-07-26 07:14:58 +10:00
committed by GitHub
parent f1a5af356d
commit ce5650b554
6 changed files with 34 additions and 13 deletions

2
rust/Cargo.lock generated
View File

@@ -951,7 +951,7 @@ dependencies = [
[[package]]
name = "boringtun"
version = "0.6.1"
source = "git+https://github.com/firezone/boringtun?branch=master#5cb8ec28749fe6a81a33471cb77565376bcf1d67"
source = "git+https://github.com/firezone/boringtun?branch=master#b0fb537124d38d354647add1dfee5028b6cd811a"
dependencies = [
"aead",
"base64 0.22.1",

View File

@@ -247,15 +247,27 @@ where
return Ok(());
}
// Compare the ICE credentials and public key.
// Technically, just comparing the ICE credentials should be enough because the portal computes them deterministically based on Client/Gateway ID and their public keys.
// But better be safe than sorry.
let preshared_key = *session_key.expose_secret();
// Check if we already have a connection with the exact same parameters.
// In order for the connection to be same, we need to compare:
// - Local ICE credentials
// - Remote ICE credentials
// - Remote public key
// - Preshared key
//
// Only if all of those things are the same, will:
// - ICE be able to establish a connection
// - boringtun be able to handshake a session
if let Some(c) = self.connections.get_established_mut(&cid)
&& c.agent.local_credentials() == &local_creds
&& c.agent
.remote_credentials()
.is_some_and(|c| c == &remote_creds)
&& c.tunnel.remote_static_public() == remote
&& c.tunnel
.preshared_key()
.is_some_and(|key| key == preshared_key)
{
tracing::info!(local = ?local_creds, "Reusing existing connection");
@@ -295,15 +307,8 @@ where
self.seed_agent_with_local_candidates(cid, selected_relay, &mut agent);
let connection = self.init_connection(
cid,
agent,
remote,
*session_key.expose_secret(),
selected_relay,
now,
now,
);
let connection =
self.init_connection(cid, agent, remote, preshared_key, selected_relay, now, now);
self.connections.established.insert(cid, connection);

View File

@@ -38,6 +38,10 @@ export default function Android() {
Fixes an issue where connections in low-latency networks (between
Client and Gateway) would fail to establish reliably.
</ChangeItem>
<ChangeItem pull="9999">
Decreases connection setup time on flaky Internet connections in
certain edge cases.
</ChangeItem>
</Unreleased>
<Entry version="1.5.2" date={new Date("2025-06-30")}>
<ChangeItem pull="9621">

View File

@@ -46,6 +46,10 @@ export default function Apple() {
Fixes an issue where connections in low-latency networks (between
Client and Gateway) would fail to establish reliably.
</ChangeItem>
<ChangeItem pull="9999">
Decreases connection setup time on flaky Internet connections in
certain edge cases.
</ChangeItem>
</Unreleased>
<Entry version="1.5.4" date={new Date("2025-07-11")}>
<ChangeItem pull="9597">

View File

@@ -24,6 +24,10 @@ export default function GUI({ os }: { os: OS }) {
Fixes an issue where connections in low-latency networks (between
Client and Gateway) would fail to establish reliably.
</ChangeItem>
<ChangeItem pull="9999">
Decreases connection setup time on flaky Internet connections in
certain edge cases.
</ChangeItem>
</Unreleased>
<Entry version="1.5.5" date={new Date("2025-07-09")}>
<ChangeItem pull="9779">Fixes a rare crash during sign-in.</ChangeItem>

View File

@@ -23,6 +23,10 @@ export default function Headless({ os }: { os: OS }) {
Fixes an issue where connections in low-latency networks (between
Client and Gateway) would fail to establish reliably.
</ChangeItem>
<ChangeItem pull="9999">
Decreases connection setup time on flaky Internet connections in
certain edge cases.
</ChangeItem>
</Unreleased>
<Entry version="1.5.1" date={new Date("2025-07-04")}>
<ChangeItem pull="9564">