firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-03-22 00:41:55 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	a66a18782e	chore(connlib): add context to IP packet parse errors (#10337 ) We are seeing some very strange IP packet parse errors coming from MacOS devices. To better understand these, we extend the error messages with the src and dst IP as well as the L4 header. Related: #10335	2025-09-12 14:11:12 +00:00
Thomas Eizinger	33a75f6fee	chore(headless-client): don't make failures look like crashes (#10290 ) Returning an error from `main` by default prints a backtrace. This may lead users to believe that the program is crashing when in fact it is exiting in a controlled way but with an error (such as when we don't have Internet during startup). Printing the chain of errors ourselves resolves this.	2025-09-10 01:08:32 +00:00
Thomas Eizinger	03ac73ac00	fix(gateway): reset DNS resource NAT if proxy IPs change (#10310 ) In #10040, we decided to persist a peer's routing state on the Gateway across ICE sessions. This routing state also includes the DNS resource NAT. Prior to #10104 (which is not released yet), when a Client signs out and back in, it resets the proxy IP mapping for DNS resources and will start numbering them again from the front, i.e. starting from 100.96.0.1. With the state still being preserved on the Gateway, this represents a problem: We keep existing mappings around if there is still a NAT session for this proxy IP. However, if the proxy IP is actually for a different domain, this NAT session is meaningless. In fact, not replacing the IP is problematic as we will now route packets for the new proxy IP to the wrong destination. The persistent DNS resource mapping from #10104 fixes this. In this PR, we add an additional check to the Gateway where we detect whether the Client has started to re-assign proxy IPs and if so, we completely reset the DNS resource NAT state including all existing NAT sessions. Fixes #10268	2025-09-09 02:08:26 +00:00
Thomas Eizinger	ead1f40101	chore(gateway): only log skipped NAT entry if IP differs (#10285 ) When we resolve a DNS resource domain name on the Gateway, we establish the mapping between proxy IPs and resolved IPs in order to correctly NAT traffic. These domains are re-resolved every time the Client sees a DNS query for it. Thus, established connections could be interrupted if the IPs returned by consecutive DNS queries are different. Many SaaS products (GitHub for example) use DNS to load balance between different IPs. In order to not interrupt those connections, we check whether we have an open NAT session for an existing mapping every time we re-resolve DNS. This log is currently printed too often though because it doesn't take into account whether the IPs actually changed. If the IP is the same, we don't need to print this because the update is a no-op.	2025-09-04 21:12:46 +00:00
Thomas Eizinger	fb7b001cbf	chore(rust): fix unused variable warning (#10283 )	2025-09-03 01:17:11 +00:00
Thomas Eizinger	d718c5de8e	fix(connlib): retry packets on IO error 5 (#10279 ) Unfortunately, it isn't very easy to detect whether a socket supports GSO on Linux. Hence, `quinn-udp` simply probes for its support by trying to send GSO batches and effectively disables GSO by setting the `max-gso-segments` state variable to 1 if it encounters either EINVAL (-22) or EIO (-5). For EINVAL, `quinn-udp` has an internal retry mechanism. For EIO, the `Transmit` which is passed to `quinn-udp` needs to be re-chunked and thus cannot be automatically retried. In order to avoid dropping packets, we therefore add a once-off retry step to sending a datagram whenever we hit EIO on Linux or Android. If the error was due to GSO not being supported, the 2nd attempt should be successful and going forward, even the first one should be until we roam the socket (where this state variable gets reset). These packet drops have been causing flakiness in CI ever since we merged the eBPF tests. Those disable checksum offloading which appears to trigger these errors.	2025-09-02 21:31:57 +00:00
Thomas Eizinger	e84bdc5566	refactor(connlib): periodically record queue depths (#10242 ) Instead of recording the queue depths on every event-loop tick, we now record them once a second by setting a Gauge. Not only is that a simpler instrument to work with but it is significantly more performant. The current version - when metrics are enabled - takes on quite a bit of CPU time. Resolves: #10237	2025-09-02 02:57:36 +00:00
Thomas Eizinger	a9e1b0fbfb	chore(connlib): print full error when failing to read IP packet (#10275 ) The error returned from `IpPacket::new` is an `anyhow::Error` but in order to return it from `async_io`, we need to wrap it in an `io::Error`. Printing an `io::Error` only prints the top-level error. To fix this, we re-wrap the `io::Error` in an `anyhow::Error` again and toggle "alternate" printing mode to see the full error chain.	2025-09-01 13:39:26 +00:00
Thomas Eizinger	0c2e54f54c	feat(connlib): persistent DNS resource records across sessions (#10104 ) When we receive a DNS query for a DNS resource in Firezone, we take the next available 4 IPs from the CG-NAT range and assign them to the domain name. For example, if `example.com` is a DNS resource and it is the first resource being queried in a Firezone session, we will assigned the IPs `100.96.0.1` - `100.96.0.4` to it. If the user now restarts Firezone or signs out and back in, this state is lost and we assign those same IPs to the next DNS query coming in. This creates a problem for applications that do not re-query DNS very often or never. They expect these IPs to not change. Restarting software or signing out and back in is a common approach to fixing software problems, yet in this specific case, doing so may create even more problems for the user. To mitigate this, `ClientState` introduce a new event `DnsRecordsChanged` that gets emitted to the event-loop every time we assign new records. The event-loop then caches this in memory and reuses it in case a new session is initiated. The records are only stored in-memory and not on disk. Most likely, the tunnel process will be alive for the entire OS session. To verify this behaviour, we add a new `RestartClient` transition to our proptests. In the proptests, we already keep a mapping of all DNS names we ever resolved, including DNS resources. When generating IP traffic, we sample from this list of IPs and then expect the packet to be routed. By replacing the `ClientState` as part of this transition and re-seeding it with the previously exported DNS records, we can verify that packets to IPs resolved from a previous session still get successfully routed to the resource. Related: #5498	2025-09-01 07:29:28 +00:00
Thomas Eizinger	533f4c319b	feat(connlib): gracefully shutdown connections (#10076 ) Right now, connections cannot be actively closed in Firezone. The WireGuard tunnel and the ICE agent are coupled together, meaning only if either one of them fails will we clean up the connection. One exception here is when the Client roams. In that case, the Client simply clears its local memory completely and then re-establishes all necessary connections by re-requesting access. There are three cases where gracefully closing a connection is useful: 1. If an access authorization is revoked or expires and this was the last resource authorisation for that peer, we don't currently remove the connection on the Gateway. Instead, the Client is still able to send packets by they'll be dropped because we don't have a peer state anymore. 1. If a Gateway gets restarted due to e.g. an upgrade or other maintenance work, it loses all its connections and every Client needs to wait for the ICE timeout (~15 seconds) before it can establish a new one. 1. If a Client has its access revoked for all resources it has access to in a particular site we also don't remove this connection, even though it has become practically useless. All of these cases are fixed with this PR. Here we introduce a way to gracefully shutdown a connection without forcing the other side into an ICE timeout. The graceful connection shutdown works by introducing a new "goodbye" p2p control protocol message. Like all our p2p control protocol messages, this is based on IP and therefore delivery is not guaranteed. In other words, this "goodbye" message is sent on a best-effort basis. In the case of shutdown, the Gateway will wait for all UDP packets to be flushed but will not resend them or wait for an ACK. If either end receives such a "goodbye" message, they simply remove the local peer and connection state just as if the connection would have failed due to either ICE or WireGuard. For the Client, this means that the next packet for a resource will trigger a new access authorization request.	2025-09-01 06:30:13 +00:00
Thomas Eizinger	544ba11f21	chore(rust): allow `too_many_arguments` repo-wide (#10236 ) We always end up allow this lint when it pops up so we can also just allow it for the whole repo in general. Most of the time, the reason for too many arguments are borrow-checker limitations of Rust where mutable references need to be tracked explicitly.	2025-08-22 13:21:07 +00:00
Thomas Eizinger	c70c88c856	build(deps): upgrade to opentelemetry 0.30 (#10239 )	2025-08-21 22:47:39 +00:00
Thomas Eizinger	99155490c5	chore(connlib): make UDP buffer sizes tunable at runtime (#10234 ) For easier benchmarking, we make the UDP socket send and receive buffers runtime-tunable. Related: #7452	2025-08-21 18:18:14 +00:00
Thomas Eizinger	f85ae75ae0	refactor(connlib): increase UDP queues on desktop platforms (#10235 ) On desktop platforms, we can easily afford to have larger queues here despite each item in there being 65k. Benchmarking showed that we do sometimes fill these up. Related: #7452	2025-08-21 08:56:14 +00:00
Thomas Eizinger	a109c1a2ef	feat(connlib): discard intermediate resource and TUN updates (#10223 ) Right now, the Client event-loops have a channel with 1000 items for sending new resource lists and updates to the TUN device to the host app. This is kind of unnecessary as we always only care about the last version of these. Intermediate updates that the host app doesn't process are effectively irrelevant. We've had an issue before where a bug in the portal caused us to receive many updates to resources which ended up crashing Client apps because this channel filled up. To be more resilient on this front, we refactor the Client event loop to use a `watch` channel for this. Watch channels only retain the last value that got sent into them.	2025-08-21 05:42:54 +00:00
Thomas Eizinger	4e11112d9b	feat(connlib): improve throughput on higher latencies (#10231 ) Turns out the multi-threaded access of the TUN device on the Gateway causes packet reordering which makes the TCP congestion controller throttle the connection. Additionally, the default TX queue length of a TUN device on Linux is only 500 packets. With just a single thread and an increased TX queue length, we get a throughput performance of just over 1 GBit/s for a 20ms link between Client and Gateway with basically no packet drops: ``` Connecting to host 172.20.0.110, port 5201 [ 5] local 100.79.130.70 port 49546 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 116 MBytes 977 Mbits/sec 0 6.40 MBytes [ 5] 1.00-2.00 sec 137 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 2.00-3.00 sec 134 MBytes 1.13 Gbits/sec 0 6.40 MBytes [ 5] 3.00-4.00 sec 136 MBytes 1.14 Gbits/sec 47 6.40 MBytes [ 5] 4.00-5.00 sec 137 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 5.00-6.00 sec 138 MBytes 1.16 Gbits/sec 0 6.40 MBytes [ 5] 6.00-7.00 sec 138 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 7.00-8.00 sec 138 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 8.00-9.00 sec 138 MBytes 1.16 Gbits/sec 0 6.40 MBytes [ 5] 9.00-10.00 sec 138 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 10.00-11.00 sec 139 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 11.00-12.00 sec 139 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 12.00-13.00 sec 136 MBytes 1.14 Gbits/sec 0 6.40 MBytes [ 5] 13.00-14.00 sec 139 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 14.00-15.00 sec 140 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 15.00-16.00 sec 138 MBytes 1.16 Gbits/sec 0 6.40 MBytes [ 5] 16.00-17.00 sec 137 MBytes 1.15 Gbits/sec 0 6.40 MBytes [ 5] 17.00-18.00 sec 139 MBytes 1.17 Gbits/sec 0 6.40 MBytes [ 5] 18.00-19.00 sec 138 MBytes 1.16 Gbits/sec 0 6.40 MBytes [ 5] 19.00-20.00 sec 136 MBytes 1.14 Gbits/sec 0 6.40 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 2.67 GBytes 1.15 Gbits/sec 47 sender [ 5] 0.00-20.02 sec 2.67 GBytes 1.15 Gbits/sec receiver iperf Done. ``` For further debugging in the future, we are now recording the send and receive queue depths of both the TUN device and the UDP sockets. Neither of those showed to be full in my testing which leads me to conclude that it isn't any buffer inside Firezone that is too small here. Related: #7452 --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2025-08-20 23:08:56 +00:00
Thomas Eizinger	da00848549	build(deps): bump to Rust 1.89 (#10208 ) Rust 1.89 comes with a new lint that wants us to use explicitly refer to lifetimes, even if they are elided.	2025-08-18 05:04:55 +00:00
Thomas Eizinger	507a8957c2	chore(connlib): only debug-assert non-retransmitted DNS queries (#10136 ) When we receive the same TCP DNS query twice, we currently wrongly hit a debug assert.	2025-08-06 11:26:51 +00:00
Thomas Eizinger	2841fd0017	chore(connlib): spawn dedicated tasks for UDP send/recv (#10147 ) At the moment, `connlib`'s UDP thread spawns a single task for reading and writing to the UDP socket. It will always first try to write data before reading new data. To avoid scheduling issues, we split this into two dedicated tasks and insert ```rust tokio::task::yield_now().await; ``` into each loop. This allows the `tokio` runtime to schedule each of the tasks fairly even if one of them is very busy. For example, if we are very busy writing data (because we are receiving a lot of IP traffic), this ensures that we will occasionally also read from our socket to receive STUN control messages from our peers.	2025-08-06 07:38:01 +00:00
Thomas Eizinger	3e46727362	chore(snownet): improve logging of boringtun session index (#10135 ) Previously, boringtun's sender/receiver index of a session would just be rendered as a full u32. In reality, this u32 contains two pieces of information: The higher 24 bits identify the peer and the lower 8 bits identify the session with that peer. With the update to boringtun in https://github.com/firezone/boringtun/pull/112, we encode this logic in a dedicated type that has prints this information separately. Here is what the logs now look like: ``` 2025-08-05T07:38:37.742Z DEBUG boringtun::noise: Received handshake_response local_idx=(3428714\|1) remote_idx=(1937676\|1) 2025-08-05T07:38:37.743Z DEBUG boringtun::noise: New session idx=(3428714\|1) 2025-08-05T07:38:37.743Z DEBUG boringtun::noise: Sending keepalive local_idx=(3428714\|1) ```	2025-08-05 13:08:32 +00:00
Thomas Eizinger	96579483d8	fix(phoenix-channel): timeout room join after 5s (#10130 ) If we fail to join a given room for longer than 5s, we fail the WebSocket connection and reconnect.	2025-08-05 02:00:26 +00:00
Thomas Eizinger	d1cbf4f76d	chore(snownet): fix relay sampling spam (#10127 ) When we disconnect from a relay, we currently spam `Failed to sample new relay for connection` until we connect to a new one.	2025-08-05 00:16:28 +00:00
Thomas Eizinger	27de29fee7	test(connlib): downgrade error (#10123 ) We can run into this when multiple DNS queries all need to be sent to the same Gateway and we don't have a connection yet. Hence, downgrade this error to a debug log.	2025-08-04 13:30:24 +00:00
Thomas Eizinger	1222be8fc9	fix(snownet): de-multiplex packets based on WG session index (#10109 ) Right now, `snownet` de-multiplexes WireGuard packets based on their source tuple (IP + port) to the _first_ connection that would like to handle this traffic. What appears to be happening based on observation from customer logs is that we sometimes dispatch the traffic to the wrong connection. The WireGuard packet format uses session indices to declare, which session a packet is for. The local session index is selected during the handshake for a particular session. By associating the different session indices (we can have up to 8 in parallel per peer) with our Firezone-specific connection ID, we can change our de-multiplexing scheme to uses these indices instead of the source tuple. This is especially important for Gateways as those talk to multiple different clients. The session index is a 32-bit integer where the top 24 bits identify the connection and the bottom 8 bits are used in a round-robin fashion to identify individual sessions within the connection. Thus, to find the correct connection, we right-shift the session index of an incoming packet to arrive back at the 24-bit connection identifier. In environments with a limited number of ports outside the NAT, a connection from a new Client may come from a source tuple of a previous Client. In such a case, we'd dispatch the packets to the wrong connection, causing the Client to not be able to handshake a tunnel.	2025-08-04 23:35:48 +10:00
Thomas Eizinger	324edce754	feat(connlib): roll-over WireGuard session on connection upsert (#10102 ) When a Client upserts a connection to a Gateway, we currently assume that the connection is still intact. After all, it hasn't hit an ICE timeout, otherwise the connection would not be present in memory. If however the Gateway restarted or somehow lost its connection state and the Client hasn't noticed yet, then the upsert will be an _insert_ for the Gateway and ICE will create a new connection for us. In order to ensure that the WireGuard tunnel state and ICE are synchronized at all times, we also need to handshake a new session. `boringtun` maintains up to 8 concurrent sessions for us. This allows for a smooth roll-over where packets encrypted with the keys from previous sessions can still be decrypted. Thus, we can easily roll-over the session on every connection upsert without any trouble. To ensure that this doesn't happen _very_ rapidly, we debounce these proactive session roll-overs to happen at most every 20s. This follows the idea of MADR-0017. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2025-08-04 00:20:28 +00:00
Thomas Eizinger	cd177a6448	fix(gateway): don't remove peer state on disconnect (#10040 ) When the connection to a Client disappears, the Gateway currently clears all state related to this peer. Whilst eagerly cleaning up memory can be good, in this case, it may lead to the Client thinking it has access to a resource when in reality it doesn't. Just because the connection to a Client failed doesn't mean their access authorizations are invalid. In case the Client reconnects, it should be able to just continue sending traffic. At the moment, this only works if the connection also failed on the Client and therefore, its view of the world in regards to "which resources do I have access to" was also reset. What we are seeing in Sentry reports though is that Clients are attempting to access these resources, thinking they have access but the Gateway denies it because it has lost the access authorization state.	2025-08-02 08:27:49 +00:00
Thomas Eizinger	a4eb6509c6	chore(snownet): fix buffering log message (#10060 ) What we are actually buffering here are unencrypted IP packets that are waiting for the tunnel to be established.	2025-08-01 22:51:49 +00:00
Thomas Eizinger	17a18fdfbb	feat(connlib): always use candidates in order of priority (#10063 ) To make things easier to debug, we enforce the order that candidates are processed in. We want candidates to be processed in the order of their inverse priority as higher priorities are better. For example, a host candidate has a higher priority than a relay candidate. This will make our logs more consistent because a `0-0` candidate pair is always a `host-host` pair. We enforce this with our own `IceCandidate` type which implements `PartialOrd` and `Ord`. This now moves the deserialisation for the portal messages to a `Deserialise` impl on this type. In order to ensure that a single faulty candidate doesn't invalidate the entire list, we use `serde_with` to skip over those elements that cannot be deserialised.	2025-08-01 01:57:29 +00:00
Thomas Eizinger	52a9079d6a	feat(snownet): use in-flight channels to relay data (#10062 ) In #7548, we added a feature to Firezone where TURN channels get bound on-demand as they are needed. To ensure many communication paths work, we also proactively bind them as soon as we receive a candidate from a remote. When a new remote candidate gets added, str0m forms pairs with all the existing local candidates and starts testing these candidate pairs. For local relay candidates, this means sending a channel data message from the allocation. At the moment, this results in the following pattern in the logs: ``` Received candidate from remote cid=20af9d29-c973-4d77-909a-abed5d7a0234 candidate=Candidate(relay=[3231E680683CFC98E69A12A60F426AA5E5F110CB]:62759/udp raddr=[59A533B0D4D3CB3717FD3D655E1D419E1C9C0772]:0 prio=37492735) No channel to peer, binding new one active_socket=462A7A508E3C99875E69C2519CA020330A6004EC:3478 peer=[3231E680683CFC98E69A12A60F426AA5E5F110CB]:62759 Already binding a channel to peer active_socket=Some(462A7A508E3C99875E69C2519CA020330A6004EC:3478) peer=[3231E680683CFC98E69A12A60F426AA5E5F110CB]:62759 class=success response from=462A7A508E3C99875E69C2519CA020330A6004EC:3478 method=channel bind rtt=9.928424ms tid=042F52145848D6C1574BB997 ``` What happens here is: 1. We receive a new candidate and proactively bind a channel (this is a silent operation and therefore not visible in the logs). 2. str0m formed new pairs for these candidates and starts testing them, triggering a new channel binding because the previous one isn't completed yet. 3. We refuse to make another channel binding because we see that we already have one in-flight. 4. The channel binding succeeds. What we do now is: If we want to send data to a peer through a channel, we check whether we have a connected OR an in-flight channel and send it in both cases. If the channel binding is still in-flight, we therefore just pipeline the channel data message just after it. Chances are that - assuming no packet re-orderings on the network - by the time our channel data message arrives at the relay that binding is active and can be relayed. This allows the very first binding attempt from str0m to already succeed instead of waiting for the timeout and sending another binding request. In addition, it makes these logs less confusing.	2025-07-31 04:10:38 +00:00
Thomas Eizinger	e07e45ed29	chore(snownet): allow filtering TURN traffic in logs (#10061 ) Our TURN traffic is fairly minimal for this to be okay on DEBUG (instead of TRACE). However, it can be quite noisy when one is just scanning through the logs. Putting it on another target allows us to filter those out later. Note that these only concern the TURN control protocol. Channel data messages are separate from this and not logged.	2025-07-31 03:46:59 +00:00
Thomas Eizinger	5753b72a5e	chore(snownet): fix typo in `PeerSocket` formatting (#10049 )	2025-07-30 22:58:22 +00:00
Thomas Eizinger	6c1c42ea22	chore(snownet): fix `handle_timeout` span (#10046 ) Spans only attach to logs of lower severity, i.e. a DEBUG span is only visible for DEBUG and TRACE statements. In order to see the connection ID here with our INFO statements, we need to make it an INFO span. Also, a span does nothing unless it is entered 🤦‍♂️	2025-07-30 03:12:11 +00:00
Thomas Eizinger	69f9a03ee8	refactor(connlib): simplify `IpPacket` struct (#9795 ) With the removal of the NAT64/46 modules, we can now simplify the internals of our `IpPacket` struct. The requirements for our `IpPacket` struct are somewhat delicate. On the one hand, we don't want to be overly restrictive in our parsing / validation code because there is a lot of broken software out there that doesn't necessarily follow RFCs. Hence, we want to be as lenient as possible in what we accept. On the other hand, we do need to verify certain aspects of the packet, like the payload lengths. At the moment, we are somewhat too lenient there which causes errors on the Gateway where we have to NAT or otherwise manipulate the packets. See #9567 or #9552 for example. To fix this, we make the parsing in the `IpPacket` constructor more restrictive. If it is a UDP, TCP or ICMP packet, we attempt to fully parse its headers and validate the payload lengths. This parsing allows us to then rely on the integrity of the packet as part of the implementation. This does create several code paths that can in theory panic but in practice, should be impossible to hit. To ensure that this does in fact not happen, we also tackle an issue that is long overdue: Fuzzing. Resolves: #6667 Resolves: #9567 Resolves: #9552	2025-07-29 04:42:57 +00:00
Thomas Eizinger	879f68cf73	refactor(connlib): use `extract_if` to expire resources (#10039 ) Rust 1.88 shipped a new std-function on `HashMap` to conditionally extract elements from a `HashMap`. This is handy for time-based expiry of resources on the Gateway.	2025-07-29 03:33:47 +00:00
Thomas Eizinger	5c3b15c1a9	chore(connlib): harmonise naming of IDs (#10038 ) When filtering through logs in Sentry, it is useful to narrow them down by context of a client, gateway or resource. Currently, these fields are sometimes called `client`, `cid`, `client_id` etc and the same for the Gateway and Resources. To make this filtering easier, name all of them `cid` for Client IDs, `gid` for Gateway IDs and `rid` for Resource IDs.	2025-07-29 03:33:09 +00:00
Thomas Eizinger	e81dc452f7	refactor(connlib): use a lock-free queue for the buffer pool (#9989 ) We use several buffer pools across `connlib` that are all backed by the same buffer-pool library. Within that library, we currently use another object-pool library to provide the actual pooling functionality. Benchmarking has shown that spend quite a bit of time (a few % of total CPU time), fighting for the lock to either add or remote a buffer from the pool. This is unnecessary. By using a queue, we can remove buffers from the front and add buffers at the back, both of which can be implemented in a lock-free way such that they don't contend. Using the well-known `crossbeam-queue` library, we have such a queue directly available. I wasn't able to directly measure a performance gain in terms of throughput. What we can measure though, is how much time we spend dealing with our buffer pool vs everything else. If we compare the `perf` outputs that were recorded during an `iperf` run each, we can see that we spend about 60% less time dealing with the buffer pool than we did before. \|Before\|After\| \|---\|---\| \|<img width="1982" height="553" alt="Screenshot From 2025-07-24 20-27-50" src="https://github.com/user-attachments/assets/1698f28b-5821-456f-95fa-d6f85d901920" />\|<img width="1982" height="553" alt="Screenshot From 2025-07-24 20-27-53" src="https://github.com/user-attachments/assets/4f26a2d1-03e3-4c0d-84da-82c53b9761dd" />\| The number in the thousands on the left is how often the respective function was the currently executing function during the profiling run. Resolves: #9972	2025-07-28 21:39:11 +00:00
Thomas Eizinger	55304b3d2a	refactor(snownet): learn host candidates from TURN traffic (#9998 ) Presently, for each UDP packet that we process in `snownet`, we check if we have already seen this local address of ours and if not, add it to our list of host candidates. This is a safe way for ensuring that we consider all addresses that we receive data on as ones that we tell our peers that they should try and contact us on. Performance profiling has shown that hashing the socket address of each packet that is coming in is quite wasteful. We spend about 4-5% of our main thread time doing this. For comparison, decrypting packets is only about 30%. Most of the time, we will already know about this address and therefore, spending all this CPU time is completely pointless. At the same time though, we need to be sure that we do discover our local address correctly. Inspired by STUN, we therefore move this responsibility to the `allocation` module. The `allocation` module is responsible for interacting with our TURN servers and will yield server-reflexive and relay candidates as a result. It also knows, what the local address is that it received traffic on so we simply extend that to yield host candidates as well in addition to server-reflexive and relay candidates. On my local machine, this bumps us across the 3.5 Gbits/sec mark: ``` Connecting to host 172.20.0.110, port 5201 [ 5] local 100.93.174.92 port 57890 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 319 MBytes 2.67 Gbits/sec 18 548 KBytes [ 5] 1.00-2.00 sec 413 MBytes 3.46 Gbits/sec 4 884 KBytes [ 5] 2.00-3.00 sec 417 MBytes 3.50 Gbits/sec 4 1.10 MBytes [ 5] 3.00-4.00 sec 425 MBytes 3.56 Gbits/sec 415 785 KBytes [ 5] 4.00-5.00 sec 430 MBytes 3.60 Gbits/sec 154 820 KBytes [ 5] 5.00-6.00 sec 434 MBytes 3.64 Gbits/sec 251 793 KBytes [ 5] 6.00-7.00 sec 436 MBytes 3.66 Gbits/sec 123 811 KBytes [ 5] 7.00-8.00 sec 435 MBytes 3.65 Gbits/sec 2 788 KBytes [ 5] 8.00-9.00 sec 423 MBytes 3.55 Gbits/sec 0 1.06 MBytes [ 5] 9.00-10.00 sec 433 MBytes 3.63 Gbits/sec 8 1017 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 8.21 GBytes 3.53 Gbits/sec 1728 sender [ 5] 0.00-20.00 sec 8.21 GBytes 3.53 Gbits/sec receiver iperf Done. ```	2025-07-28 21:38:39 +00:00
Thomas Eizinger	9c71026416	chore(connlib): gate more trace logs on `debug_assertions` (#10026 ) These are otherwise hit pretty often in the hot-path and slow packet routing down because tracing needs to evaluate whether it should log the statement.	2025-07-28 21:38:23 +00:00
Thomas Eizinger	fb9a142a9e	chore(snownet): add back span in `handle_timeout` (#10025 ) Whilst entering and leaving a span for every packet is very expensive, doing the same whenever we make timeout related changes is just fine. Thus, we re-introduce a span removed in #9949 but only for the `handle_timeout` function. This gives us the context of the connection ID for not just our own logs, but also the ones from `boringtun`.	2025-07-28 04:14:39 +00:00
Thomas Eizinger	bfa77bf7fc	chore(snownet): log connection ID in more places (#10023 ) With the removal of the span in #9949, we now need to explicitly log the connection ID in a few more places to have the necessary context.	2025-07-28 02:01:01 +00:00
Thomas Eizinger	ce5650b554	fix(snownet): compare `preshared_key` on connection upsert (#9999 ) By chance, I've discovered in a CI failure that we won't be able to handshake a new session if the `preshared_key` changes. This makes a lot of sense. The `preshared_key` needs to be the same on both ends as it is a shared secret that gets mixed into the Noise handshake. In following sequence of events, we would thus previously run into a "failed to decrypt handshake packet" scenario: 1. Client requests a connection. 2. Gateway authorizes the connection. 3. Portal restarts / gets deployed. To my knowledge, this will rotate the `preshared_key` to a new secret. Restarting the portal also cuts all WebSockets and therefore, the Gateways response never arrives. 4. Client reconnects to the WebSocket, requests a new connection. 5. Gateway reuses the local connection but this connection still uses the old `preshared_key`! 6. Client needs to wait for the Gateway's ICE timeout before it can establish a new connection. How exactly (3) happens doesn't matter. There are probably other conditions as to where the WebSocket connections get cut and we cannot complete our connection handshake.	2025-07-25 21:14:58 +00:00
Thomas Eizinger	f55c61c7cb	fix(snownet): always update `last_activity` idle timer (#10000 ) Previously, our idle timer was only driven by incoming and outgoing packets. To detect whether the tunnel is idle, we checked whether either the last incoming or last outgoing packet was more than 20s ago. For one, having two timestamps here is unnecessarily complex. We can simply combine them and always update this timestamp as `last_activity`. Two, recently, we have started to also take into account not only packets but other changes to the tunnel, such as an upsert of the connection or adding new candidate. What we failed to do though, is update these timestamps because their variable name was related to packets and not to any activity. The problem with not updating these timestamps however is that we will very quickly move out of "connected" back to "idle" because the old timestamps are still more than 20s ago. Hence, the previous fixes of moving out of idle on new candidates and connection upsert were ineffective. By combining and renaming the timestamps, it is now much more obvious that we need to update this timestamp in the respective handler functions which then grants us another 20s of non-idling. This is important for e.g. connection upserts to ensure the Gateway runs into an ICE timeout within a short amount of time, should there be something wrong with the connection that the Client just upserted.	2025-07-25 15:03:18 +00:00
Thomas Eizinger	d00c3b58cd	refactor(connlib): only enable `wire` logs in debug builds (#10002 ) As profiling shows, even if the log target isn't enabled, simply checking whether or not it is enabled is a significant performance hit. By guarding these behind `debug_assertions`, I was able to almost achieve 3.75 Gbits/s locally (when rebased onto #9998). Obviously, this doesn't quite translate into real-world improvements but it is nonetheless a welcome improvement. ``` Connecting to host 172.20.0.110, port 5201 [ 5] local 100.93.174.92 port 34678 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 401 MBytes 3.37 Gbits/sec 14 644 KBytes [ 5] 1.00-2.00 sec 448 MBytes 3.76 Gbits/sec 3 976 KBytes [ 5] 2.00-3.00 sec 453 MBytes 3.80 Gbits/sec 43 979 KBytes [ 5] 3.00-4.00 sec 449 MBytes 3.77 Gbits/sec 21 911 KBytes [ 5] 4.00-5.00 sec 452 MBytes 3.79 Gbits/sec 4 1.15 MBytes [ 5] 5.00-6.00 sec 451 MBytes 3.78 Gbits/sec 81 1.01 MBytes [ 5] 6.00-7.00 sec 445 MBytes 3.73 Gbits/sec 39 705 KBytes [ 5] 7.00-8.00 sec 436 MBytes 3.66 Gbits/sec 3 1016 KBytes [ 5] 8.00-9.00 sec 460 MBytes 3.85 Gbits/sec 1 956 KBytes [ 5] 9.00-10.00 sec 453 MBytes 3.80 Gbits/sec 0 1.19 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 4.34 GBytes 3.73 Gbits/sec 209 sender [ 5] 0.00-10.00 sec 4.34 GBytes 3.73 Gbits/sec receiver ``` I didn't want to remove the `wire` logs entirely because they are quite useful for debugging. However, they are also exactly this: A debugging tool. In a production build, we are very unlikely to turn these on which makes `debug_assertions` a good tool for keeping these around without interfering with performance.	2025-07-25 12:24:25 +00:00
Thomas Eizinger	e5ee8e3572	fix(connlib): wait for sockets to be closed before rebinding (#9996 ) Our `ThreadedUdpSocket` uses a background thread for the actual socket operation. It merely represents a handle to send and receive from these sockets but not the socket itself. Dropping the handle will shutdown the background thread but that is an asynchronous operation. In order to be sure that we can rebind the same port, we need to wait for the background thread to stop. We thus add a `Drop` implementation for the `ThreadedUdpSocket` that waits for its background thread to disappear before it continues. Resolves: #9992	2025-07-25 03:09:13 +00:00
Thomas Eizinger	9133d46bbd	fix(snownet): don't log unknown packet for disconnected relay (#9961 ) Currently, packets for allocations, i.e. from relays are parsed inside the `Allocation` struct. We have one of those structs for each relay that `snownet` is talking to. When we disconnect from a relay because it is e.g. not responding, then we deallocate this struct. As a result, message that arrive from this relay can no longer be handled. This can happen when the response time is longer than our timeout. These packets then fall-through and end up being logged as "packet has unknown format". To prevent this, we make the signature on `Allocation` strongly-typed and expect a fully parsed `Message` to be given to us. This allows us to parse the message early and discard it with a DEBUG log in case we don't have the necessary local state to handle it. The functionality here is essentially the same, we just change at what level this is being logged at from WARN to DEBUG. We have to make one additional adjustment to make this work: Guard all messages to be parsed by any `Allocation` to come from port 3478. This is the assigned port that all relays are expected to listen on. If we don't have any local state for a given address, we cannot decide whether it is a STUN message for an agent or a STUN message for a relay that we have disconnected from. Therefore, we need to de-multiplex based on the source port.	2025-07-25 00:32:43 +00:00
Thomas Eizinger	aebfcd56eb	fix(connlib): resend candidates on connection upsert (#9986 ) Due to network partitions between the Client and the Portal, it is possible that a Client requests a new connection, then disconnects from the portal and re-requests the connection once it is reconnected. On the Gateway, we would have already authorized the first request and initialise our ICE agents with our local candidates. The second time around, the connection would be reused. The Client however has lost its state and therefore, we need to tell it our candidates again. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2025-07-24 21:01:50 +00:00
Thomas Eizinger	cbe114bddc	fix(connlib): clear join requests on reconnect (#9985 ) Room join requests on the portal are only valid whilst we have a WebSocket connection. To make sure the portal processes all our requests correctly, we need to hold all other messages back while we are waiting to join the room. If the connection flaps while we are waiting to join a room, we may have a lingering join request that never gets fulfilled and thus blocks the sending of messages forever. --------- Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>	2025-07-24 20:41:26 +00:00
Thomas Eizinger	f9721a1da6	fix(snownet): only idle when we are fully connected (#9987 ) Now that we are capable of migrating a connection to another relay with #9979, our test suite exposed an edge-case: If we are in the middle of migrating a connection, it could be that the idle timer triggers because we have not seen any application traffic in the last 20s. Moving to idle mode drastically reduces the number of STUN bindings we send and if this happens whilst we are still checking candidates, the nomination doesn't happen in time for our boringtun handshake to succeed. Thus, we add a condition to our idle timer to not trigger unless ICE has completed and reports us as `connected`.	2025-07-24 12:37:47 +00:00
Thomas Eizinger	d7b9ecb60b	feat(gateway): update expiry of access authoritzations on init (#9975 ) Resolves: #9971	2025-07-24 06:36:56 +00:00
Thomas Eizinger	301d2137e5	refactor(windows): share src IP cache across UDP sockets (#9976 ) When looking through customer logs, we see a lot of "Resolved best route outside of tunnel" messages. Those get logged every time we need to rerun our re-implementation of Windows' weighting algorithm as to which source interface / IP a packet should be sent from. Currently, this gets cached in every socket instance so for the peer-to-peer socket, this is only computed once per destination IP. However, for DNS queries, we make a new socket for every query. Using a new source port DNS queries is recommended to avoid fingerprinting of DNS queries. Using a new socket also means that we need to re-run this algorithm every time we make a DNS query which is why we see this log so often. To fix this, we need to share this cache across all UDP sockets. Cache invalidation is one of the hardest problems in computer science and this instance is no different. This cache needs to be reset every time we roam as that changes the weighting of which source interface to use. To achieve this, we extend the `SocketFactory` trait with a `reset` method. This method is called whenever we roam and can then reset a shared cache inside the `UdpSocketFactory`. The "source IP resolver" function that is passed to the UDP socket now simply accesses this shared cache and inserts a new entry when it needs to resolve the IP. As an added benefit, this may speed up DNS queries on Windows a bit (although I haven't benchmarked it). It should certainly drastically reduce the amount of syscalls we make on Windows.	2025-07-24 01:36:53 +00:00

1 2 3 4 5 ...

1208 Commits