mirror of
https://github.com/outbackdingo/firezone.git
synced 2026-01-27 10:18:54 +00:00
19bfb16c57dfcf0e361fb481c84167805532a7a1
2626 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
f0a8eee164 |
build(deps): bump the react group in /rust/gui-client with 5 updates (#10089)
Bumps the react group in /rust/gui-client with 5 updates: | Package | From | To | | --- | --- | --- | | [react](https://github.com/facebook/react/tree/HEAD/packages/react) | `19.1.0` | `19.1.1` | | [@types/react](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react) | `19.1.8` | `19.1.9` | | [react-dom](https://github.com/facebook/react/tree/HEAD/packages/react-dom) | `19.1.0` | `19.1.1` | | [@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom) | `19.1.6` | `19.1.7` | | [react-router](https://github.com/remix-run/react-router/tree/HEAD/packages/react-router) | `7.7.0` | `7.7.1` | Updates `react` from 19.1.0 to 19.1.1 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/facebook/react/releases">react's releases</a>.</em></p> <blockquote> <h2>19.1.1 (July 28, 2025)</h2> <h3>React</h3> <ul> <li>Fixed Owner Stacks to work with ES2015 function.name semantics (<a href="https://redirect.github.com/facebook/react/pull/33680">#33680</a> by <a href="https://github.com/hoxyq"><code>@hoxyq</code></a>)</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/facebook/react/blob/main/CHANGELOG.md">react's changelog</a>.</em></p> <blockquote> <h2>19.1.1 (July 28, 2025)</h2> <h3>React</h3> <ul> <li>Fixed Owner Stacks to work with ES2015 function.name semantics (<a href="https://redirect.github.com/facebook/react/pull/33680">#33680</a> by <a href="https://github.com/hoxyq"><code>@hoxyq</code></a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
960df4242c |
build(deps): bump the tauri group in /rust/gui-client with 2 updates (#10087)
Bumps the tauri group in /rust/gui-client with 2 updates: [@tauri-apps/api](https://github.com/tauri-apps/tauri) and [@tauri-apps/cli](https://github.com/tauri-apps/tauri). Updates `@tauri-apps/api` from 2.6.0 to 2.7.0 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/tauri-apps/tauri/releases"><code>@tauri-apps/api</code>'s releases</a>.</em></p> <blockquote> <h2><code>@tauri-apps/api</code> v2.7.0</h2> <!-- raw HTML omitted --> <pre><code>No known vulnerabilities found </code></pre> <!-- raw HTML omitted --> <h2>[2.7.0]</h2> <h3>New Features</h3> <ul> <li><a href=" |
||
|
|
17a18fdfbb |
feat(connlib): always use candidates in order of priority (#10063)
To make things easier to debug, we enforce the order that candidates are processed in. We want candidates to be processed in the order of their inverse priority as higher priorities are better. For example, a host candidate has a higher priority than a relay candidate. This will make our logs more consistent because a `0-0` candidate pair is always a `host-host` pair. We enforce this with our own `IceCandidate` type which implements `PartialOrd` and `Ord`. This now moves the deserialisation for the portal messages to a `Deserialise` impl on this type. In order to ensure that a single faulty candidate doesn't invalidate the entire list, we use `serde_with` to skip over those elements that cannot be deserialised. |
||
|
|
9b8efdcf08 |
chore(connlib): bump str0m (#10066)
This bumps our str0m dependency to include improvements that I've been making to the logs: - https://github.com/algesten/str0m/pull/681 - https://github.com/algesten/str0m/pull/682 |
||
|
|
52a9079d6a |
feat(snownet): use in-flight channels to relay data (#10062)
In #7548, we added a feature to Firezone where TURN channels get bound on-demand as they are needed. To ensure many communication paths work, we also proactively bind them as soon as we receive a candidate from a remote. When a new remote candidate gets added, str0m forms pairs with all the existing local candidates and starts testing these candidate pairs. For local relay candidates, this means sending a channel data message from the allocation. At the moment, this results in the following pattern in the logs: ``` Received candidate from remote cid=20af9d29-c973-4d77-909a-abed5d7a0234 candidate=Candidate(relay=[3231E680683CFC98E69A12A60F426AA5E5F110CB]:62759/udp raddr=[59A533B0D4D3CB3717FD3D655E1D419E1C9C0772]:0 prio=37492735) No channel to peer, binding new one active_socket=462A7A508E3C99875E69C2519CA020330A6004EC:3478 peer=[3231E680683CFC98E69A12A60F426AA5E5F110CB]:62759 Already binding a channel to peer active_socket=Some(462A7A508E3C99875E69C2519CA020330A6004EC:3478) peer=[3231E680683CFC98E69A12A60F426AA5E5F110CB]:62759 class=success response from=462A7A508E3C99875E69C2519CA020330A6004EC:3478 method=channel bind rtt=9.928424ms tid=042F52145848D6C1574BB997 ``` What happens here is: 1. We receive a new candidate and proactively bind a channel (this is a silent operation and therefore not visible in the logs). 2. str0m formed new pairs for these candidates and starts testing them, triggering a new channel binding because the previous one isn't completed yet. 3. We refuse to make another channel binding because we see that we already have one in-flight. 4. The channel binding succeeds. What we do now is: If we want to send data to a peer through a channel, we check whether we have a connected OR an in-flight channel and send it in both cases. If the channel binding is still in-flight, we therefore just pipeline the channel data message just after it. Chances are that - assuming no packet re-orderings on the network - by the time our channel data message arrives at the relay that binding is active and can be relayed. This allows the very first binding attempt from str0m to already succeed instead of waiting for the timeout and sending another binding request. In addition, it makes these logs less confusing. |
||
|
|
e07e45ed29 |
chore(snownet): allow filtering TURN traffic in logs (#10061)
Our TURN traffic is fairly minimal for this to be okay on DEBUG (instead of TRACE). However, it can be quite noisy when one is just scanning through the logs. Putting it on another target allows us to filter those out later. Note that these only concern the TURN control protocol. Channel data messages are separate from this and **not** logged. |
||
|
|
5753b72a5e |
chore(snownet): fix typo in PeerSocket formatting (#10049)
|
||
|
|
551e687cc7 |
chore(rust): bump boringtun (#10052)
This brings in https://github.com/firezone/boringtun/pull/109. |
||
|
|
6c1c42ea22 |
chore(snownet): fix handle_timeout span (#10046)
Spans only attach to logs of lower severity, i.e. a DEBUG span is only
visible for DEBUG and TRACE statements. In order to see the connection
ID here with our INFO statements, we need to make it an INFO span.
Also, a span does nothing unless it is entered 🤦♂️
|
||
|
|
2166c49033 |
chore(windows): remove noisy AccessDenied errors (#10043)
These don't really tell us much. It appears that Windows is sometimes failing to access the pipe but then succeeds on the next attempt, hence why we have the retry loop in the first place. Logging a warning here just spams Sentry unnecessarily. |
||
|
|
69f9a03ee8 |
refactor(connlib): simplify IpPacket struct (#9795)
With the removal of the NAT64/46 modules, we can now simplify the internals of our `IpPacket` struct. The requirements for our `IpPacket` struct are somewhat delicate. On the one hand, we don't want to be overly restrictive in our parsing / validation code because there is a lot of broken software out there that doesn't necessarily follow RFCs. Hence, we want to be as lenient as possible in what we accept. On the other hand, we do need to verify certain aspects of the packet, like the payload lengths. At the moment, we are somewhat too lenient there which causes errors on the Gateway where we have to NAT or otherwise manipulate the packets. See #9567 or #9552 for example. To fix this, we make the parsing in the `IpPacket` constructor more restrictive. If it is a UDP, TCP or ICMP packet, we attempt to fully parse its headers and validate the payload lengths. This parsing allows us to then rely on the integrity of the packet as part of the implementation. This does create several code paths that can in theory panic but in practice, should be impossible to hit. To ensure that this does in fact not happen, we also tackle an issue that is long overdue: Fuzzing. Resolves: #6667 Resolves: #9567 Resolves: #9552 |
||
|
|
879f68cf73 |
refactor(connlib): use extract_if to expire resources (#10039)
Rust 1.88 shipped a new std-function on `HashMap` to conditionally extract elements from a `HashMap`. This is handy for time-based expiry of resources on the Gateway. |
||
|
|
5c3b15c1a9 |
chore(connlib): harmonise naming of IDs (#10038)
When filtering through logs in Sentry, it is useful to narrow them down by context of a client, gateway or resource. Currently, these fields are sometimes called `client`, `cid`, `client_id` etc and the same for the Gateway and Resources. To make this filtering easier, name all of them `cid` for Client IDs, `gid` for Gateway IDs and `rid` for Resource IDs. |
||
|
|
e9c74b1bfe |
chore(connlib): treat Invalid Argument as unreachable hosts (#10037)
These appear to happen on systems that e.g. don't have IPv6 support or where the destination cannot be reached. It is a bit of a catch-all but all the ones I am seeing in Sentry are false-positives. To reduce the noise a bit, we log these on DEBUG now. |
||
|
|
e81dc452f7 |
refactor(connlib): use a lock-free queue for the buffer pool (#9989)
We use several buffer pools across `connlib` that are all backed by the same buffer-pool library. Within that library, we currently use another object-pool library to provide the actual pooling functionality. Benchmarking has shown that spend quite a bit of time (a few % of total CPU time), fighting for the lock to either add or remote a buffer from the pool. This is unnecessary. By using a queue, we can remove buffers from the front and add buffers at the back, both of which can be implemented in a lock-free way such that they don't contend. Using the well-known `crossbeam-queue` library, we have such a queue directly available. I wasn't able to directly measure a performance gain in terms of throughput. What we can measure though, is how much time we spend dealing with our buffer pool vs everything else. If we compare the `perf` outputs that were recorded during an `iperf` run each, we can see that we spend about 60% less time dealing with the buffer pool than we did before. |Before|After| |---|---| |<img width="1982" height="553" alt="Screenshot From 2025-07-24 20-27-50" src="https://github.com/user-attachments/assets/1698f28b-5821-456f-95fa-d6f85d901920" />|<img width="1982" height="553" alt="Screenshot From 2025-07-24 20-27-53" src="https://github.com/user-attachments/assets/4f26a2d1-03e3-4c0d-84da-82c53b9761dd" />| The number in the thousands on the left is how often the respective function was the currently executing function during the profiling run. Resolves: #9972 |
||
|
|
55304b3d2a |
refactor(snownet): learn host candidates from TURN traffic (#9998)
Presently, for each UDP packet that we process in `snownet`, we check if we have already seen this local address of ours and if not, add it to our list of host candidates. This is a safe way for ensuring that we consider all addresses that we receive data on as ones that we tell our peers that they should try and contact us on. Performance profiling has shown that hashing the socket address of each packet that is coming in is quite wasteful. We spend about 4-5% of our main thread time doing this. For comparison, decrypting packets is only about 30%. Most of the time, we will already know about this address and therefore, spending all this CPU time is completely pointless. At the same time though, we need to be sure that we do discover our local address correctly. Inspired by STUN, we therefore move this responsibility to the `allocation` module. The `allocation` module is responsible for interacting with our TURN servers and will yield server-reflexive and relay candidates as a result. It also knows, what the local address is that it received traffic on so we simply extend that to yield host candidates as well in addition to server-reflexive and relay candidates. On my local machine, this bumps us across the 3.5 Gbits/sec mark: ``` Connecting to host 172.20.0.110, port 5201 [ 5] local 100.93.174.92 port 57890 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 319 MBytes 2.67 Gbits/sec 18 548 KBytes [ 5] 1.00-2.00 sec 413 MBytes 3.46 Gbits/sec 4 884 KBytes [ 5] 2.00-3.00 sec 417 MBytes 3.50 Gbits/sec 4 1.10 MBytes [ 5] 3.00-4.00 sec 425 MBytes 3.56 Gbits/sec 415 785 KBytes [ 5] 4.00-5.00 sec 430 MBytes 3.60 Gbits/sec 154 820 KBytes [ 5] 5.00-6.00 sec 434 MBytes 3.64 Gbits/sec 251 793 KBytes [ 5] 6.00-7.00 sec 436 MBytes 3.66 Gbits/sec 123 811 KBytes [ 5] 7.00-8.00 sec 435 MBytes 3.65 Gbits/sec 2 788 KBytes [ 5] 8.00-9.00 sec 423 MBytes 3.55 Gbits/sec 0 1.06 MBytes [ 5] 9.00-10.00 sec 433 MBytes 3.63 Gbits/sec 8 1017 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-20.00 sec 8.21 GBytes 3.53 Gbits/sec 1728 sender [ 5] 0.00-20.00 sec 8.21 GBytes 3.53 Gbits/sec receiver iperf Done. ``` |
||
|
|
9c71026416 |
chore(connlib): gate more trace logs on debug_assertions (#10026)
These are otherwise hit pretty often in the hot-path and slow packet routing down because tracing needs to evaluate whether it should log the statement. |
||
|
|
1317bbb9e2 |
refactor(gui-client): replace tslink with tauri-specta (#10031)
Despite still being in development, the `tauri-specta` project already proves to be quite useful. It allows us to generate TypeScript bindings for our commands and events, creating a type-safe contract between the frontend and the backend. For example, this ensures that the TypeScript code calls a command actually with the required parameters and thus avoids runtime failures. Similarly, the frontend can listen on type-safe events without having to use any magic strings. |
||
|
|
e6fc7e62da | chore: publish apple-client 1.5.5 (#10035) | ||
|
|
2309be11fc | chore: publish headless-client 1.5.2 (#10029) | ||
|
|
cf40f4dd96 | chore: publish gateway 1.4.14 (#10030) | ||
|
|
7b8daf4074 | chore: publish gui-client 1.5.6 (#10028) | ||
|
|
fb9a142a9e |
chore(snownet): add back span in handle_timeout (#10025)
Whilst entering and leaving a span for every packet is very expensive, doing the same whenever we make timeout related changes is just fine. Thus, we re-introduce a span removed in #9949 but only for the `handle_timeout` function. This gives us the context of the connection ID for not just our own logs, but also the ones from `boringtun`. |
||
|
|
bfa77bf7fc |
chore(snownet): log connection ID in more places (#10023)
With the removal of the span in #9949, we now need to explicitly log the connection ID in a few more places to have the necessary context. |
||
|
|
ce5650b554 |
fix(snownet): compare preshared_key on connection upsert (#9999)
By chance, I've discovered in a CI failure that we won't be able to handshake a new session if the `preshared_key` changes. This makes a lot of sense. The `preshared_key` needs to be the same on both ends as it is a shared secret that gets mixed into the Noise handshake. In following sequence of events, we would thus previously run into a "failed to decrypt handshake packet" scenario: 1. Client requests a connection. 2. Gateway authorizes the connection. 3. Portal restarts / gets deployed. To my knowledge, this will rotate the `preshared_key` to a new secret. Restarting the portal also cuts all WebSockets and therefore, the Gateways response never arrives. 4. Client reconnects to the WebSocket, requests a new connection. 5. Gateway reuses the local connection but this connection still uses the old `preshared_key`! 6. Client needs to wait for the Gateway's ICE timeout before it can establish a new connection. How exactly (3) happens doesn't matter. There are probably other conditions as to where the WebSocket connections get cut and we cannot complete our connection handshake. |
||
|
|
f55c61c7cb |
fix(snownet): always update last_activity idle timer (#10000)
Previously, our idle timer was only driven by incoming and outgoing packets. To detect whether the tunnel is idle, we checked whether either the last incoming or last outgoing packet was more than 20s ago. For one, having two timestamps here is unnecessarily complex. We can simply combine them and always update this timestamp as `last_activity`. Two, recently, we have started to also take into account not only packets but other changes to the tunnel, such as an upsert of the connection or adding new candidate. What we failed to do though, is update these timestamps because their variable name was related to packets and not to any activity. The problem with not updating these timestamps however is that we will very quickly move out of "connected" back to "idle" because the old timestamps are still more than 20s ago. Hence, the previous fixes of moving out of idle on new candidates and connection upsert were ineffective. By combining and renaming the timestamps, it is now much more obvious that we need to update this timestamp in the respective handler functions which then grants us another 20s of non-idling. This is important for e.g. connection upserts to ensure the Gateway runs into an ICE timeout within a short amount of time, should there be something wrong with the connection that the Client just upserted. |
||
|
|
d00c3b58cd |
refactor(connlib): only enable wire logs in debug builds (#10002)
As profiling shows, even if the log target isn't enabled, simply checking whether or not it is enabled is a significant performance hit. By guarding these behind `debug_assertions`, I was able to almost achieve 3.75 Gbits/s locally (when rebased onto #9998). Obviously, this doesn't quite translate into real-world improvements but it is nonetheless a welcome improvement. ``` Connecting to host 172.20.0.110, port 5201 [ 5] local 100.93.174.92 port 34678 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 401 MBytes 3.37 Gbits/sec 14 644 KBytes [ 5] 1.00-2.00 sec 448 MBytes 3.76 Gbits/sec 3 976 KBytes [ 5] 2.00-3.00 sec 453 MBytes 3.80 Gbits/sec 43 979 KBytes [ 5] 3.00-4.00 sec 449 MBytes 3.77 Gbits/sec 21 911 KBytes [ 5] 4.00-5.00 sec 452 MBytes 3.79 Gbits/sec 4 1.15 MBytes [ 5] 5.00-6.00 sec 451 MBytes 3.78 Gbits/sec 81 1.01 MBytes [ 5] 6.00-7.00 sec 445 MBytes 3.73 Gbits/sec 39 705 KBytes [ 5] 7.00-8.00 sec 436 MBytes 3.66 Gbits/sec 3 1016 KBytes [ 5] 8.00-9.00 sec 460 MBytes 3.85 Gbits/sec 1 956 KBytes [ 5] 9.00-10.00 sec 453 MBytes 3.80 Gbits/sec 0 1.19 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 4.34 GBytes 3.73 Gbits/sec 209 sender [ 5] 0.00-10.00 sec 4.34 GBytes 3.73 Gbits/sec receiver ``` I didn't want to remove the `wire` logs entirely because they are quite useful for debugging. However, they are also exactly this: A debugging tool. In a production build, we are very unlikely to turn these on which makes `debug_assertions` a good tool for keeping these around without interfering with performance. |
||
|
|
e5ee8e3572 |
fix(connlib): wait for sockets to be closed before rebinding (#9996)
Our `ThreadedUdpSocket` uses a background thread for the actual socket operation. It merely represents a handle to send and receive from these sockets but not the socket itself. Dropping the handle will shutdown the background thread but that is an asynchronous operation. In order to be sure that we can rebind the same port, we need to wait for the background thread to stop. We thus add a `Drop` implementation for the `ThreadedUdpSocket` that waits for its background thread to disappear before it continues. Resolves: #9992 |
||
|
|
9133d46bbd |
fix(snownet): don't log unknown packet for disconnected relay (#9961)
Currently, packets for allocations, i.e. from relays are parsed inside the `Allocation` struct. We have one of those structs for each relay that `snownet` is talking to. When we disconnect from a relay because it is e.g. not responding, then we deallocate this struct. As a result, message that arrive from this relay can no longer be handled. This can happen when the response time is longer than our timeout. These packets then fall-through and end up being logged as "packet has unknown format". To prevent this, we make the signature on `Allocation` strongly-typed and expect a fully parsed `Message` to be given to us. This allows us to parse the message early and discard it with a DEBUG log in case we don't have the necessary local state to handle it. The functionality here is essentially the same, we just change at what level this is being logged at from WARN to DEBUG. We have to make one additional adjustment to make this work: Guard all messages to be parsed by any `Allocation` to come from port 3478. This is the assigned port that all relays are expected to listen on. If we don't have any local state for a given address, we cannot decide whether it is a STUN message for an agent or a STUN message for a relay that we have disconnected from. Therefore, we need to de-multiplex based on the source port. |
||
|
|
d6c36b0d7b |
build(deps): bump flowbite-react from 0.11.8 to 0.11.9 in /rust/gui-client in the flowbite group (#9931)
Bumps the flowbite group in /rust/gui-client with 1 update: [flowbite-react](https://github.com/themesberg/flowbite-react/tree/HEAD/packages/ui). Updates `flowbite-react` from 0.11.8 to 0.11.9 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/themesberg/flowbite-react/releases">flowbite-react's releases</a>.</em></p> <blockquote> <h2>flowbite-react@0.11.9</h2> <h3>Patch Changes</h3> <ul> <li><a href="https://redirect.github.com/themesberg/flowbite-react/pull/1587">#1587</a> <a href=" |
||
|
|
aebfcd56eb |
fix(connlib): resend candidates on connection upsert (#9986)
Due to network partitions between the Client and the Portal, it is possible that a Client requests a new connection, then disconnects from the portal and re-requests the connection once it is reconnected. On the Gateway, we would have already authorized the first request and initialise our ICE agents with our local candidates. The second time around, the connection would be reused. The Client however has lost its state and therefore, we need to tell it our candidates again. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> |
||
|
|
cbe114bddc |
fix(connlib): clear join requests on reconnect (#9985)
Room join requests on the portal are only valid whilst we have a WebSocket connection. To make sure the portal processes all our requests correctly, we need to hold all other messages back while we are waiting to join the room. If the connection flaps while we are waiting to join a room, we may have a lingering join request that never gets fulfilled and thus blocks the sending of messages forever. --------- Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com> |
||
|
|
f9721a1da6 |
fix(snownet): only idle when we are fully connected (#9987)
Now that we are capable of migrating a connection to another relay with #9979, our test suite exposed an edge-case: If we are in the middle of migrating a connection, it could be that the idle timer triggers because we have not seen any application traffic in the last 20s. Moving to idle mode drastically reduces the number of STUN bindings we send and if this happens whilst we are still checking candidates, the nomination doesn't happen in time for our boringtun handshake to succeed. Thus, we add a condition to our idle timer to not trigger unless ICE has completed and reports us as `connected`. |
||
|
|
d7b9ecb60b |
feat(gateway): update expiry of access authoritzations on init (#9975)
Resolves: #9971 |
||
|
|
dacc402721 |
chore(connlib): only log span field name into message (#9981)
When looking at logs, reducing noise is critical to make it easier to spot important information. When sending logs to Sentry, we currently append the fields of certain spans to message to make the output similar to that of `tracing_subscriber::fmt`. The actual name of a field inside a span is separated from the span name by a colon. For example, here is a log message as we see it in Sentry today: > handle_input:class=success response handle_input:from=C1A0479AA153FACA0722A5DF76343CF2BEECB10E:3478 handle_input:method=binding handle_input:rtt=34.7479ms handle_input:tid=BB30E859ED88FFDF0786B634 request=["Software(snownet; session=BCA42EF159C794F41AE45BF5099E54D3A193A7184C4D2C3560C2FE49C4C6CFB7)"] response=["Software(firezone-relay; rev=e4ba5a69)", "XorMappedAddress(B824B4035A78A6B188EF38BE13AA3C1B1B1196D6:52625)"] Really, what we would like to see is only this: > class=success response from=C1A0479AA153FACA0722A5DF76343CF2BEECB10E:3478 method=binding rtt=34.7479ms tid=BB30E859ED88FFDF0786B634 request=["Software(snownet; session=BCA42EF159C794F41AE45BF5099E54D3A193A7184C4D2C3560C2FE49C4C6CFB7)"] response=["Software(firezone-relay; rev=e4ba5a69)", "XorMappedAddress(B824B4035A78A6B188EF38BE13AA3C1B1B1196D6:52625)"] The duplication of `handle_input:` is just noise. In our local log output, we already strip the name of the span to make it easier to read. Here we now also do the same for the logs reported to Sentry. |
||
|
|
301d2137e5 |
refactor(windows): share src IP cache across UDP sockets (#9976)
When looking through customer logs, we see a lot of "Resolved best route outside of tunnel" messages. Those get logged every time we need to rerun our re-implementation of Windows' weighting algorithm as to which source interface / IP a packet should be sent from. Currently, this gets cached in every socket instance so for the peer-to-peer socket, this is only computed once per destination IP. However, for DNS queries, we make a new socket for every query. Using a new source port DNS queries is recommended to avoid fingerprinting of DNS queries. Using a new socket also means that we need to re-run this algorithm every time we make a DNS query which is why we see this log so often. To fix this, we need to share this cache across all UDP sockets. Cache invalidation is one of the hardest problems in computer science and this instance is no different. This cache needs to be reset every time we roam as that changes the weighting of which source interface to use. To achieve this, we extend the `SocketFactory` trait with a `reset` method. This method is called whenever we roam and can then reset a shared cache inside the `UdpSocketFactory`. The "source IP resolver" function that is passed to the UDP socket now simply accesses this shared cache and inserts a new entry when it needs to resolve the IP. As an added benefit, this may speed up DNS queries on Windows a bit (although I haven't benchmarked it). It should certainly drastically reduce the amount of syscalls we make on Windows. |
||
|
|
409459f11c |
chore(rust): bump boringtun (#9982)
Bumping the version to include https://github.com/firezone/boringtun/pull/105. |
||
|
|
d244a99c58 |
feat(connlib): always use all candidates (#9979)
In #6876, we added functionality that would only make use of new remote candidates whilst we haven't nominated a socket yet with the remote. The reason for that was because in the described edge-case where relays reboot or get replaced whilst the client is partitioned from the portal (or we experience a connection hiccup), only one of the two peers, i.e. Client or Gateway would migrate to the new relay, leaving the other one in an inconsistent state. Looking at recent customer logs, I've been seeing a lot of these messages: > Unknown connection or socket has already been nominated For this particular customer, these are then very quickly followed by ICE timeouts, leaving the connection unusable. Considering that, I no longer think that the above change was a good idea and we should instead always make use of all candidates that we are given. What we are seeing is that in deployment scenarios where the latency link between Client and Gateway is very short (5-10ms) yet the latency to the portal is longer (~30-50ms), we trigger a race condition where we are temporarily nominating a _peer-reflexive_ candidate pair instead of a regular one. This happens because with such a short latency link, Client and Gateway are _faster_ in sending back and forth several STUN bindings than the control plane is in delivering all the candidates. Due to the functionality added in #6876, this then results in us not accepting the candidates. It further appears that a nominated peer-reflexive candidate does not provide a stable connection which is why we then run into an ICE timeout, requiring Firezone to establish a new connection only to have the same thing happen again. This is very disruptive for the user experience as the connection only works for a few moments at a time. With #9793, we have actually added a feature that is also at play here. Now that we don't immediately act on an ICE timeout, it is actually possible for both Client and Gateway to migrate a connection to a different relay, should the one that they are using get disconnected. In #9793, we added a timeout of 2s for this. To make this fully work, we need to patch str0m to transition to `Checking` early. Presently, str0m would directly transition from `Disconnected` to `Connected` in this case which in some of the high-latency scenarios that we are testing in CI is not enough to recover the connection within 2s. By transitioning to `Checking` early, we abort this timer. Related: https://github.com/algesten/str0m/pull/676 |
||
|
|
ecb2bbc86b |
feat(gateway): allow updating expiry of access authorization (#9973)
Resolves: #9966 |
||
|
|
fafe2c43ea |
fix(connlib): update the current socket when in idle mode (#9977)
In case we received a newly nominated socket from `str0m` whilst our connection was in idle mode, we mistakenly did not apply that and kept using the old one. ICE would still be functioning in this case because `str0m` would have updated its internal state but we would be sending packets into Nirvana. I don't think that this is likely to be hit in production though as it would be quite unusual to receive a new nomination whilst the connection was completely idle. |
||
|
|
091d5b56e0 |
refactor(snownet): don't memmove every packet (#9907)
When encrypting IP packets, `snownet` needs to prepare a buffer where the encrypted packet is going to end up. Depending on whether we are sending data via a relayed connection or direct, this buffer needs to be offset by 4 bytes to allow for the 4-byte channel-data header of the TURN protocol. At present, we always first encrypt the packet and then on-demand move the packet by 4-bytes to the left if we **don't** need to send it via a relay. Internally, this translates to a `memmove` instruction which actually turns out to be very cheap (I couldn't measure a speed difference between this and `main`). All of this code has grown historically though so I figured, it is better to clean it up a bit to first evaluate, whether we have a direct or relayed connection and based on that, write the encrypted packet directly to the front of the buffer or offset it by 4 bytes. |
||
|
|
3e6fc8fda7 |
refactor(rust): use spinlock-based buffer pool (#9951)
Profiling has shown that using a spinlock-based buffer pool is marginally (~1%) faster than the mutex-based one because it resolves contention quicker. |
||
|
|
a11983e4b3 | chore: publish gateway 1.4.13 (#9969) | ||
|
|
6ae074005f |
refactor(connlib): don't check for enabled event (#9950)
Profiling has shown that checking whether the level is enabled is actually more expensive than checking whether the packet is a DNS packet. This improves performance by about 3%. |
||
|
|
71e6b56654 |
feat(snownet): remove "connection ID" span (#9949)
At present, `snownet` uses a `tracing::Span` to attach the connection ID to various log messages. This requires the span to be entered and exited on every packet. Whilst profiling Firezone, I noticed that is takes between 10% and 20% of CPU time on the main thread. Previously, this wasn't a bottleneck as other parts of Firezone were not yet as optimised. With some changes earlier this year of a dedicated UDP thread and better GSO, this does appear to be a bottleneck now. On `main`, I am currently getting the following numbers on my local machine: ``` Connecting to host 172.20.0.110, port 5201 [ 5] local 100.85.16.226 port 42012 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 251 MBytes 2.11 Gbits/sec 16 558 KBytes [ 5] 1.00-2.00 sec 287 MBytes 2.41 Gbits/sec 6 800 KBytes [ 5] 2.00-3.00 sec 284 MBytes 2.38 Gbits/sec 2 992 KBytes [ 5] 3.00-4.00 sec 287 MBytes 2.41 Gbits/sec 3 1.12 MBytes [ 5] 4.00-5.00 sec 290 MBytes 2.44 Gbits/sec 0 1.27 MBytes [ 5] 5.00-6.00 sec 300 MBytes 2.52 Gbits/sec 2 1.40 MBytes [ 5] 6.00-7.00 sec 295 MBytes 2.47 Gbits/sec 2 1.52 MBytes [ 5] 7.00-8.00 sec 304 MBytes 2.55 Gbits/sec 3 1.63 MBytes [ 5] 8.00-9.00 sec 290 MBytes 2.44 Gbits/sec 49 1.21 MBytes [ 5] 9.00-10.00 sec 288 MBytes 2.41 Gbits/sec 24 1023 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 2.81 GBytes 2.41 Gbits/sec 107 sender [ 5] 0.00-10.00 sec 2.81 GBytes 2.41 Gbits/sec receiver ``` With this patch applied, the throughput goes up significantly: ``` Connecting to host 172.20.0.110, port 5201 [ 5] local 100.85.16.226 port 41402 connected to 172.20.0.110 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 315 MBytes 2.64 Gbits/sec 7 619 KBytes [ 5] 1.00-2.00 sec 363 MBytes 3.05 Gbits/sec 11 847 KBytes [ 5] 2.00-3.00 sec 379 MBytes 3.18 Gbits/sec 1 1.07 MBytes [ 5] 3.00-4.00 sec 384 MBytes 3.22 Gbits/sec 44 981 KBytes [ 5] 4.00-5.00 sec 377 MBytes 3.16 Gbits/sec 116 911 KBytes [ 5] 5.00-6.00 sec 378 MBytes 3.17 Gbits/sec 3 1.10 MBytes [ 5] 6.00-7.00 sec 377 MBytes 3.16 Gbits/sec 48 929 KBytes [ 5] 7.00-8.00 sec 374 MBytes 3.14 Gbits/sec 151 947 KBytes [ 5] 8.00-9.00 sec 382 MBytes 3.21 Gbits/sec 36 833 KBytes [ 5] 9.00-10.00 sec 375 MBytes 3.14 Gbits/sec 1 1.06 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 3.62 GBytes 3.11 Gbits/sec 418 sender [ 5] 0.00-10.00 sec 3.61 GBytes 3.10 Gbits/sec receiver ``` Resolves: #9948 |
||
|
|
4292ca7ae8 |
test(connlib): fix failing proptest (#9864)
This essentially bumps just the boringtun dependency to include https://github.com/firezone/boringtun/pull/104. |
||
|
|
fbf96c261e |
chore(relay): remove spans (#9962)
These are flooding our monitoring infra and don't really add that much value. Pretty much all of the processing the relay does is request in and out and none of the spans are nested. We can therefore almost 1-to-1 replicate the logging we do with spans by adding the fields to each log message. Resolves: #9954 |
||
|
|
f668202c83 |
build(deps): bump the sentry group in /rust/gui-client with 2 updates (#9929)
Bumps the sentry group in /rust/gui-client with 2 updates: [@sentry/core](https://github.com/getsentry/sentry-javascript) and [@sentry/react](https://github.com/getsentry/sentry-javascript). Updates `@sentry/core` from 9.34.0 to 9.40.0 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/getsentry/sentry-javascript/releases"><code>@sentry/core</code>'s releases</a>.</em></p> <blockquote> <h2>9.40.0</h2> <h3>Important Changes</h3> <ul> <li><strong>feat(browser): Add debugId sync APIs between web worker and main thread (<a href="https://redirect.github.com/getsentry/sentry-javascript/pull/16981">#16981</a>)</strong></li> </ul> <p>This release adds two Browser SDK APIs to let the main thread know about debugIds of worker files:</p> <ul> <li><code>webWorkerIntegration({worker})</code> to be used in the main thread</li> <li><code>registerWebWorker({self})</code> to be used in the web worker</li> </ul> <pre lang="js"><code>// main.js Sentry.init({...}) <p>const worker = new MyWorker(...);</p> <p>Sentry.addIntegration(Sentry.webWorkerIntegration({ worker }));</p> <p>worker.addEventListener('message', e => {...});<br /> </code></pre></p> <pre lang="js"><code>// worker.js Sentry.registerWebWorker({ self }); self.postMessage(...); </code></pre> <ul> <li><strong>feat(core): Deprecate logger in favor of debug (<a href="https://redirect.github.com/getsentry/sentry-javascript/pull/17040">#17040</a>)</strong></li> </ul> <p>The internal SDK <code>logger</code> export from <code>@sentry/core</code> has been deprecated in favor of the <code>debug</code> export. <code>debug</code> only exposes <code>log</code>, <code>warn</code>, and <code>error</code> methods but is otherwise identical to <code>logger</code>. Note that this deprecation does not affect the <code>logger</code> export from other packages (like <code>@sentry/browser</code> or <code>@sentry/node</code>) which is used for Sentry Logging.</p> <pre lang="js"><code>import { logger, debug } from '@sentry/core'; <p>// before<br /> logger.info('This is an info message');</p> <p>// after<br /> debug.log('This is an info message');<br /> </code></pre></p> <ul> <li><strong>feat(node): Add OpenAI integration (<a href="https://redirect.github.com/getsentry/sentry-javascript/pull/17022">#17022</a>)</strong></li> </ul> <p>This release adds official support for instrumenting OpenAI SDK calls in with Sentry tracing, following OpenTelemetry semantic conventions for Generative AI. It instruments:</p> <ul> <li><code>client.chat.completions.create()</code> - For chat-based completions</li> <li><code>client.responses.create()</code> - For the responses API</li> </ul> <pre lang="js"><code></tr></table> </code></pre> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/getsentry/sentry-javascript/blob/develop/CHANGELOG.md"><code>@sentry/core</code>'s changelog</a>.</em></p> <blockquote> <h2>9.40.0</h2> <h3>Important Changes</h3> <ul> <li><strong>feat(browser): Add debugId sync APIs between web worker and main thread (<a href="https://redirect.github.com/getsentry/sentry-javascript/pull/16981">#16981</a>)</strong></li> </ul> <p>This release adds two Browser SDK APIs to let the main thread know about debugIds of worker files:</p> <ul> <li><code>webWorkerIntegration({worker})</code> to be used in the main thread</li> <li><code>registerWebWorker({self})</code> to be used in the web worker</li> </ul> <pre lang="js"><code>// main.js Sentry.init({...}) <p>const worker = new MyWorker(...);</p> <p>Sentry.addIntegration(Sentry.webWorkerIntegration({ worker }));</p> <p>worker.addEventListener('message', e => {...});<br /> </code></pre></p> <pre lang="js"><code>// worker.js Sentry.registerWebWorker({ self }); self.postMessage(...); </code></pre> <ul> <li><strong>feat(core): Deprecate logger in favor of debug (<a href="https://redirect.github.com/getsentry/sentry-javascript/pull/17040">#17040</a>)</strong></li> </ul> <p>The internal SDK <code>logger</code> export from <code>@sentry/core</code> has been deprecated in favor of the <code>debug</code> export. <code>debug</code> only exposes <code>log</code>, <code>warn</code>, and <code>error</code> methods but is otherwise identical to <code>logger</code>. Note that this deprecation does not affect the <code>logger</code> export from other packages (like <code>@sentry/browser</code> or <code>@sentry/node</code>) which is used for Sentry Logging.</p> <pre lang="js"><code>import { logger, debug } from '@sentry/core'; <p>// before<br /> logger.info('This is an info message');</p> <p>// after<br /> debug.log('This is an info message');<br /> </code></pre></p> <ul> <li><strong>feat(node): Add OpenAI integration (<a href="https://redirect.github.com/getsentry/sentry-javascript/pull/17022">#17022</a>)</strong></li> </ul> <p>This release adds official support for instrumenting OpenAI SDK calls in with Sentry tracing, following OpenTelemetry semantic conventions for Generative AI. It instruments:</p> <ul> <li><code>client.chat.completions.create()</code> - For chat-based completions</li> <li><code>client.responses.create()</code> - For the responses API</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
bc1a3df82b |
build(deps): bump react-router from 7.6.3 to 7.7.0 in /rust/gui-client in the react group (#9934)
Bumps the react group in /rust/gui-client with 1 update: [react-router](https://github.com/remix-run/react-router/tree/HEAD/packages/react-router). Updates `react-router` from 7.6.3 to 7.7.0 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/remix-run/react-router/releases">react-router's releases</a>.</em></p> <blockquote> <h2>v7.7.0</h2> <p>See the changelog for release notes: <a href="https://github.com/remix-run/react-router/blob/main/CHANGELOG.md#v770">https://github.com/remix-run/react-router/blob/main/CHANGELOG.md#v770</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/remix-run/react-router/blob/main/packages/react-router/CHANGELOG.md">react-router's changelog</a>.</em></p> <blockquote> <h2>7.7.0</h2> <h3>Minor Changes</h3> <ul> <li> <p>Add unstable RSC support (<a href="https://redirect.github.com/remix-run/react-router/pull/13700">#13700</a>)</p> <p>For more information, see the <a href="https://reactrouter.com/start/rsc/installation">RSC documentation</a>.</p> </li> </ul> <h3>Patch Changes</h3> <ul> <li> <p>Handle <code>InvalidCharacterError</code> when validating cookie signature (<a href="https://redirect.github.com/remix-run/react-router/pull/13847">#13847</a>)</p> </li> <li> <p>Pass a copy of <code>searchParams</code> to the <code>setSearchParams</code> callback function to avoid muations of the internal <code>searchParams</code> instance. This was an issue when navigations were blocked because the internal instance be out of sync with <code>useLocation().search</code>. (<a href="https://redirect.github.com/remix-run/react-router/pull/12784">#12784</a>)</p> </li> <li> <p>Support invalid <code>Date</code> in <code>turbo-stream</code> v2 fork (<a href="https://redirect.github.com/remix-run/react-router/pull/13684">#13684</a>)</p> </li> <li> <p>In Framework Mode, clear critical CSS in development after initial render (<a href="https://redirect.github.com/remix-run/react-router/pull/13872">#13872</a>)</p> </li> <li> <p>Strip search parameters from <code>patchRoutesOnNavigation</code> <code>path</code> param for fetcher calls (<a href="https://redirect.github.com/remix-run/react-router/pull/13911">#13911</a>)</p> </li> <li> <p>Skip scroll restoration on useRevalidator() calls because they're not new locations (<a href="https://redirect.github.com/remix-run/react-router/pull/13671">#13671</a>)</p> </li> <li> <p>Support unencoded UTF-8 routes in prerender config with <code>ssr</code> set to <code>false</code> (<a href="https://redirect.github.com/remix-run/react-router/pull/13699">#13699</a>)</p> </li> <li> <p>Do not throw if the url hash is not a valid URI component (<a href="https://redirect.github.com/remix-run/react-router/pull/13247">#13247</a>)</p> </li> <li> <p>Fix a regression in <code>createRoutesStub</code> introduced with the middleware feature. (<a href="https://redirect.github.com/remix-run/react-router/pull/13946">#13946</a>)</p> <p>As part of that work we altered the signature to align with the new middleware APIs without making it backwards compatible with the prior <code>AppLoadContext</code> API. This permitted <code>createRoutesStub</code> to work if you were opting into middleware and the updated <code>context</code> typings, but broke <code>createRoutesStub</code> for users not yet opting into middleware.</p> <p>We've reverted this change and re-implemented it in such a way that both sets of users can leverage it.</p> <pre lang="tsx"><code>// If you have not opted into middleware, the old API should work again let context: AppLoadContext = { /*...*/ }; let Stub = createRoutesStub(routes, context); <p>// If you have opted into middleware, you should now pass an instantiated <code>unstable_routerContextProvider</code> instead of a <code>getContext</code> factory function.<br /> let context = new unstable_RouterContextProvider();<br /> context.set(SomeContext, someValue);<br /> let Stub = createRoutesStub(routes, context);<br /> </code></pre></p> <p>⚠️ This may be a breaking bug for if you have adopted the unstable Middleware feature and are using <code>createRoutesStub</code> with the updated API.</p> </li> <li> <p>Remove <code>Content-Length</code> header from Single Fetch responses (<a href="https://redirect.github.com/remix-run/react-router/pull/13902">#13902</a>)</p> </li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
0cd4b94691 |
build(deps): bump zbus from 5.8.0 to 5.9.0 in /rust (#9939)
Bumps [zbus](https://github.com/dbus2/zbus) from 5.8.0 to 5.9.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/dbus2/zbus/releases">zbus's releases</a>.</em></p> <blockquote> <h2>🔖 zbus 5.9.0</h2> <ul> <li>🧵 Remove deadlocks in Connection name request tasks, resulting in leaks under certain circumstances.</li> <li>🐛 When registering names, allow name replacement by default.</li> <li>✨ Allow setting request name flags in <code>connection::Builder</code>.</li> <li>✨ Proper Default impl for <code>RequestNameFlags</code>. This change is theoretically an API break for users who assumed the default value to be empty.</li> <li>🧑💻 Add <code>fdo::StartServiceReply</code> type. In 6.0 this will be the return type of <code>fdo::DBusProxy::start_service_by_name</code>. For now, just provide a <code>TryFrom<u32></code>.</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |