fix(connlib): de-prioritise timeout handling (#6077)

`connlib`'s event loop performs work in a very particular order: 1. Local buffers like IP, UDP and DNS packets are emptied. 2. Time-sensitive tasks, if any, are performed. 3. New UDP packets are processed. 4. New IP packets (from the TUN device) are processed. This priority ensures we don't accept more work (i.e. new packets) until we have finished processing existing work. As a result, we can keep local buffers small and processing latencies low. I am not completely confident on the issue of #6067 but if the busy-loop originates from a bad timer, then the above priority means we never get to the part where we read new UDP or IP packets and components such a `PhoenixChannel` - which operate outside of `connlib'`s event loop - don't get any CPU time. A naive fix for this problem is to just de-prioritise the polling of the timer within `Io::poll`. I say naive because without additional changes, this could delay the processing of time-sensitive tasks on a very busy client / gateway where packets are constantly arriving and thus we never[^1] reach the part where the timer gets polled. To fix this, we make two distinct changes: 1. We pro-actively break from `connlib'`s event loop every 5000 iterations. This ensures that even on a very busy system, other components like the `PhoenixChannel` get a chance to do _some_ work once in a while. 2. In case we force-yield from the event loop, we call `handle_timeout` and immediately schedule a new wake-up. This ensures time does advance in regular intervals as well and we don't get wrongly suspended by the runtime. These changes don't prevent any timer-loops by themselves. With a timer-loop, we still busy-loop for 5000 iterations and thus unnecessarily burn through some CPU cycles. The important bit however is that we stay operational and can accept packets and portal messages. Any of them might change the state such that the timer value changes, thus allowing `connlib` to self-heal from this loop. Fixes: #6067. [^1]: This is an assumption based on the possible control flow. In practise, I believe that reading from the sockets or the TUN device is a much slower operation than processing the packets. Thus, we should eventually hit the the timer path too.
2026-01-27 10:18:54 +00:00 · 2024-07-29 23:16:10 +01:00
parent edc80129c8
commit 194eebd164
2 changed files with 27 additions and 11 deletions
--- a/rust/connlib/tunnel/src/io.rs
+++ b/rust/connlib/tunnel/src/io.rs
@@ -94,15 +94,6 @@ impl Io {
            return Poll::Ready(Ok(Input::DnsResponse(query, response)));
        }

-        if let Some(timeout) = self.timeout.as_mut() {
-            if timeout.poll_unpin(cx).is_ready() {
-                let deadline = timeout.deadline().into();
-                self.timeout.as_mut().take(); // Clear the timeout.
-
-                return Poll::Ready(Ok(Input::Timeout(deadline)));
-            }
-        }
-
        if let Poll::Ready(network) = self.sockets.poll_recv_from(ip4_buffer, ip6_bffer, cx)? {
            return Poll::Ready(Ok(Input::Network(network)));
        }
@@ -113,6 +104,15 @@ impl Io {
            return Poll::Ready(Ok(Input::Device(packet)));
        }

+        if let Some(timeout) = self.timeout.as_mut() {
+            if timeout.poll_unpin(cx).is_ready() {
+                let deadline = timeout.deadline().into();
+                self.timeout.as_mut().take(); // Clear the timeout.
+
+                return Poll::Ready(Ok(Input::Timeout(deadline)));
+            }
+        }
+
        Poll::Pending
    }

--- a/rust/connlib/tunnel/src/lib.rs
+++ b/rust/connlib/tunnel/src/lib.rs
@@ -40,6 +40,14 @@ mod tests;
 const MAX_UDP_SIZE: usize = (1 << 16) - 1;
 const REALM: &str = "firezone";

+/// How many times we will at most loop before force-yielding from [`ClientTunnel::poll_next_event`] & [`GatewayTunnel::poll_next_event`].
+///
+/// It is obviously system-dependent, how long it takes for the event loop to exhaust these iterations.
+/// It has been measured that on GitHub's standard Linux runners, 3000 iterations is roughly 1s during an iperf run.
+/// With 5000, we could not reproduce the force-yielding to be needed.
+/// Thus, it is chosen as a safe, upper boundary that is not meant to be hit (and thus doesn't affect performance), yet acts as a safe guard, just in case.
+const MAX_EVENTLOOP_ITERS: u32 = 5000;
+
 pub type GatewayTunnel = Tunnel<GatewayState>;
 pub type ClientTunnel = Tunnel<ClientState>;

@@ -93,7 +101,7 @@ impl ClientTunnel {
    }

    pub fn poll_next_event(&mut self, cx: &mut Context<'_>) -> Poll<Result<ClientEvent>> {
-        loop {
+        for _ in 0..MAX_EVENTLOOP_ITERS {
            if let Some(e) = self.role_state.poll_event() {
                return Poll::Ready(Ok(e));
            }
@@ -164,6 +172,10 @@ impl ClientTunnel {

            return Poll::Pending;
        }
+
+        self.role_state.handle_timeout(Instant::now()); // Ensure time advances, even if we are busy handling packets.
+        cx.waker().wake_by_ref(); // Schedule another wake-up with the runtime to avoid getting suspended forever.
+        Poll::Pending
    }
 }

@@ -185,7 +197,7 @@ impl GatewayTunnel {
    }

    pub fn poll_next_event(&mut self, cx: &mut Context<'_>) -> Poll<Result<GatewayEvent>> {
-        loop {
+        for _ in 0..MAX_EVENTLOOP_ITERS {
            if let Some(other) = self.role_state.poll_event() {
                return Poll::Ready(Ok(other));
            }
@@ -246,6 +258,10 @@ impl GatewayTunnel {

            return Poll::Pending;
        }
+
+        self.role_state.handle_timeout(Instant::now(), Utc::now()); // Ensure time advances, even if we are busy handling packets.
+        cx.waker().wake_by_ref(); // Schedule another wake-up with the runtime to avoid getting suspended forever.
+        Poll::Pending
    }
 }