From bac5cfa4cbc21da6cdad79cd20e96d967cfde2fe Mon Sep 17 00:00:00 2001 From: Thomas Eizinger Date: Wed, 2 Apr 2025 18:03:35 +1100 Subject: [PATCH] fix(connlib): set idle timer to be longer than ICE timeout (#8612) Our idle connection detection works based on incoming and outgoing packets, whichever one happened later. If we have not received or sent packets for longer than `MAX_IDLE`, we transition into idle mode where we configure our ICE agent to only send binding requests every 60 seconds. Our ICE timeout in non-idle mode is just north of 10 seconds (the formula is a bit tricky so don't have the accurate number). This can cause a problem whenever a Gateway disappears. We leave the idle mode as soon as we send a packet through the Gateway. Thus, what we intended to happen is that, as long as you keep trying to connect to the Gateway, we will leave the idle mode, increase our rate of STUN bindings through the ICE agent and detect within ~10s that the Gateway is gone. What actually happens is that, IF whatever resource you are trying to talk to is a DNS resource (which is very likely) and the application starts off with a DNS query, then we will reset the local DNS resource NAT state and ping the Gateway to set up the NAT again (we do this to ensure we don't have stale DNS entries on the Gateway). This message is only sent once and all other packets are buffered. Thus, the connection will go back to idle before the newly sent STUN binding requests can determine that the connection is actually broken. Resolves: #8551 --- rust/connlib/snownet/src/node.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rust/connlib/snownet/src/node.rs b/rust/connlib/snownet/src/node.rs index 88ad2d0e6..01db325da 100644 --- a/rust/connlib/snownet/src/node.rs +++ b/rust/connlib/snownet/src/node.rs @@ -1746,7 +1746,7 @@ where } fn idle_at(last_incoming: Instant, last_outgoing: Instant) -> Instant { - const MAX_IDLE: Duration = Duration::from_secs(10); + const MAX_IDLE: Duration = Duration::from_secs(20); // Must be longer than the ICE timeout otherwise we might not detect a failed connection early enough. last_incoming.max(last_outgoing) + MAX_IDLE }