firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-03-20 17:41:40 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	ac339ff63b	fix(gateway): evaluate fastest nameserver every 60s (#9060 ) Currently, the Gateway reads all nameservers from `/etc/resolv.conf` on startup and evaluates the fastest one to use for SRV and TXT DNS queries that are forwarded by the Client. If the machine just booted and we do not have Internet connectivity just yet, this fails which leaves the Gateway in state where it cannot fulfill those queries. In order to ensure we always use the fastest one and to self-heal from such situations, we add a 60s timer that refreshes this state. Currently, this will not re-read the nameservers from `/etc/resolv.conf` but still use the same IPs read on startup.	2025-05-09 03:38:35 +00:00
Thomas Eizinger	33d5c32f35	fix(gateway): truncate payload of ICMP errors (#9059 ) When the Gateway is handed an IP packet for a DNS resource that it cannot route, it sends back an ICMP unreachable error. According to RFC 792 [0] (for ICMPv4) and RFC 4443 [1] (for ICMPv6), parts of the original packet should be included in the ICMP error payload to allow the sending party to correlate, what could not be sent. For ICMPv4, the RFC says: ``` Internet Header + 64 bits of Data Datagram The internet header plus the first 64 bits of the original datagram's data. This data is used by the host to match the message to the appropriate process. If a higher level protocol uses port numbers, they are assumed to be in the first 64 data bits of the original datagram's data. ``` For ICMPv6, the RFC says: ``` As much of invoking packet as possible without the ICMPv6 packet exceeding the minimum IPv6 MTU ``` [0]: https://datatracker.ietf.org/doc/html/rfc792 [1]: https://datatracker.ietf.org/doc/html/rfc4443#section-3.1	2025-05-09 01:38:31 +00:00
Thomas Eizinger	005b6fe863	feat(windows): optimise network change detection (#9021 ) Presently, the network change detection on Windows is very naive and simply emits a change event everytime _anything_ changes. We can optimise this and therefore improve the start-up time of Firezone by: - Filtering out duplicate events - Filtering out network change events for our own network adapter This reduces the number of network change events to 1 during startup. As far as I can tell from the code comments in this area, we explicitly send this one to ensure we don't run into a race condition whilst we are starting up. Resolves: #8905	2025-05-06 00:23:27 +00:00
Thomas Eizinger	ea475c721a	docs(website): update changelog for latest releases (#9015 ) In #9013, we forgot to update the changelogs for Apple Clients and the Gateway.	2025-05-02 13:16:28 +00:00
Jamil	6e0e7343ba	chore: release Apple & Gateway with ECN fix (#9013 )	2025-05-02 00:16:40 -07:00
Thomas Eizinger	513e0a400c	docs(website): update Apple changelog (#9011 )	2025-05-02 05:55:25 +00:00
Thomas Eizinger	0aab954fa9	fix(connlib): never clear ECT from IP packets (#9009 ) ECN information is helpful to allow the congestion controllers to more easily fine-tune their send and receive windows. When a Firezone Client receives an IP packet where the ECN bits signal an ECN capable transport, we mirror this bit on the UDP datagram that carries the encrypted IP packet. When receiving a datagram with ECN bits set, the Gateway will then apply these bits to the decrypted IP packet and pass it along towards its destination. This implementation is unfortunately a bit too naive. Not all devices on the Internet support ECN and therefore, we may receive a datagram that has its ECN bits cleared when the ECN bits on the inner IP packet still signal an ECN capable transport. In this case, we should _not_ override the ECN bits and instead pass the IP packet along as is. Network devices along the path between Gateway and Resource may still use these ECN bits to signal congestion. We fix this by making the `with_ecn` function on `IpPacket` private. It is not meant to be used outside of the module. We supersede it with a `with_ecn_from_transport` function that implements the above logic. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2025-05-02 05:28:19 +00:00
Thomas Eizinger	ec4cd898ba	chore: release Gateway v1.4.7 (#8943 )	2025-04-30 13:37:32 +00:00
Thomas Eizinger	96998a43ae	docs(website): add missing changelog entry for Apple Clients (#8938 )	2025-04-30 07:14:33 +00:00
Thomas Eizinger	f7df445924	fix(gateway): don't invalidate active NAT sessions (#8937 ) Whenever the Gateway is instructed to (re)create the NAT for a DNS resource, it performs a DNS query and then overwrite the existing entries in the NAT table. Depending on how the DNS records are defined, this may lead to a very bad user experience where connections are cut regularly. In particular, if a service utilises round-robin DNS where a DNS query only ever returns a single entry yet that entry may change as soon as the TTL expires, all connections for this particular DNS resource for a Client get cut. To fix this, we now first check for active NAT sessions for a given proxy IP and only replace it if we don't have an open NAT session. The NAT sessions have a TTL of 1 minute, meaning there needs to be at least 1 outgoing packet from the Client every minute to keep it open.	2025-04-30 06:58:58 +00:00
Jamil	2650d81444	chore: release clients with GSO fix (#8936 )	2025-04-29 23:52:43 -07:00
Thomas Eizinger	6dc5f85cc5	fix(connlib): don't buffer when recreating DNS resource NAT (#8935 ) In order to detect changes to DNS records of DNS resources, `connlib` will recreate the DNS resource NAT whenever it receives a query for a DNS resource. The way we implemented this was by clearing the local state of the DNS resource NAT, which triggered us to perform the handshake with the Gateway again upon the next packet for this resource. The Gateway would then perform the DNS query and respond back when this was finished. In order to not drop any packets, `connlib` has a buffer where it keeps the packets that are arriving in the meantime. This works reasonably well when the connection is first set up because we are only buffering a TCP SYN or equivalent handshake packet. Yet, when the connection is full use, and the application just so happens to make another DNS query, we halt the entire flow of packets until this is confirmed again. To prevent high memory use, the buffer for this packets is constrained to 32 packets which is nowhere near enough when a connection is actively transferring data (like a file upload). In most cases, the DNS query on the Gateway will yield the exact same results as because the records haven't changed. Thus, there is no reason for us to actually halt the flow of these packets when we are _recreating_ the DNS resource NAT. That way, this handshake happens in parallel to the actual packet flow and does not interrupt anything in the happy path case.	2025-04-30 04:26:49 +00:00
Thomas Eizinger	122d84cfa2	fix(connlib): recreate log file if it got deleted (#8926 ) Currently, when `connlib`'s log file gets deleted, we write logs into nirvana until the corresponding process gets restarted. This is painful for users to do because they need to restart the IPC service or Network Extension. Instead, we can simply check if the log file exists prior to writing to it and re-create it if it doesn't. Resolves: #6850 Related: #7569	2025-04-29 13:05:02 +00:00
Thomas Eizinger	bbc9c29d5d	docs(website): add changelog for #8920 (#8923 )	2025-04-29 10:23:48 +00:00
Thomas Eizinger	ad9a453aa1	feat(linux-client): reduce number of TUN threads to 1 (#8914 ) Having multiple threads for reading and writing the TUN device can cause packet re-orderings on the client. All other clients only use a single TUN thread, so aligning this value means a more consistent behaviour of Firezone across all platforms.	2025-04-28 12:25:27 +00:00
Jamil	f181a3245b	chore(website): Remove old docs (#8895 ) These confuse users and are horribly outdated. Fixes #8528	2025-04-23 15:24:09 +00:00
Thomas Eizinger	ac5e44d5d0	feat(connlib): request larger buffers for UDP sockets (#8731 ) Sufficiently large receive buffers are important to sustain high-throughput as latency increases. If the receive buffer in the kernel is too small, packets need to be dropped on arrival. Firefox uses 1MB in its QUIC stack [0]. `quic-go` recommends to set send and receive buffers to 7.5 MB [1]. Power users of Firezone are likely receiving a lot more traffic than the average Firefox user (especially with Internet Resource activated) so setting it to 10 MB seems reasonable. Sending packets is likely not as critical because we have back-pressure through our system such that we will stop reading IP packets when we cannot write to our UDP socket. The UDP socket is sitting in a separate thread and those threads are connected with dedicated queues which act as another buffer. However, as the data below shows, some systems have really small send buffers which are currently likely a speed bottleneck because we need to suspend writing so frequently. Assuming a 50ms latency, the bandwidth-delay product tells us that we can (in theory) saturate a 1.6 Gbps link with a 10MB receive buffer (assuming the OS also has large enough buffer sizes in its TCP or QUIC stack): ``` 80 Mb / 0.05s = 1600Mbps ``` Experiments and research [2] show the following: \|OS\|Receive buffer (default)\|Receive buffer (this PR)\|Send buffer (default)\|Send buffer (this PR)\| \|---\|---\|---\|---\|---\| \|Windows\|65KB\|10MB\|65KB\|1MB\| \|MacOS\|786KB\|8MB\|9KB\|1MB\| \|Linux\|212KB\|212KB\|212KB\|212KB\| With the exception of Linux, the OSes appear to be quite generous with how big they allow receive buffers to be. On Linux, these limit can be changed by setting the `core.net.rmem_max` and `core.net.wmem_max` parameters using `sysctl`. Most of our users are on Windows and MacOS, meaning they immediately benefit from this without having to change any system settings. Larger client-side UDP receive buffers are critical for any "download" scenario which is likely the majority of usecases that Firezone is used for. On Windows, increasing this receive buffer almost doubles the throughput in an iperf3 download test. [0]: https://github.com/mozilla/neqo/pull/2470 [1]: https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes [2]: https://unix.stackexchange.com/a/424381 --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2025-04-22 06:52:33 +00:00
Jamil	5db8e20f3b	chore: release Apple and GUI clients (#8882 ) - Apple clients 1.4.12 - GUI clients 1.4.11	2025-04-21 21:45:16 +00:00
Jamil	368ace2c6e	ci: Release Android 1.4.7 (#8878 ) App is live on Play store.	2025-04-21 21:12:27 +00:00
Thomas Eizinger	4c5fd9b256	feat(connlib): prefer relay candidates of same IP version (#8798 ) When calculating preferences for candidates, `str0m` currently always prefer IPv6 over IPv4. This is as per the ICE spec. Howver, this can lead to sub-optimal situations when a connection ends up using a TURN server. TURN allows a client to allocate an IPv4 and an IPv6 address in the same allocation. This makes it possible for e.g. an IPv4-only client to connect to an IPv6-only peer as long as the TURN server runs in dual-stack AND the client requests an IPv6 address in addition to an IPv4 address with the `ADDITIONAL-ADDRESS-FAMILY` attribute. Assume that a client sits behind symmetric NAT and therefore needs to rely on a TURN server to communicate with its peers. The TURN server as well as all the peers operate in dual-stack mode. The current priority calculation will yield a communication path that uses IPv4 to talk to the TURN server (as that is the only one available) but due to the preference ordering of IPv6 over IPv4, will use an IPv6 path to the peer, despite the peer also supporting IPv4. This isn't a problem per-se but makes our life unnecessarily difficult. Our TURN servers use eBPF to efficiently deal with TURN's channel-data messages. This however is at present only implemented for the IPv4 <> IPv4 and IPv6 <> IPv6 path. Implementing the other paths is possible but complicates the eBPF code because we need to also translate IP headers between versions and not just update the source and destination IPs. We have since patched `str0m` to extend the `Candidate::relayed` constructor to also take a `base` address which is - similar to the other candidate types - the address the client is sending from in order to use this candidate. In the context of relayed candidates, this is the address the client is using to talk to the TURN server. We can use this information in the candidate's priority calculation to prefer candidates that allow traffic to remain within one IP version, i.e. if the client talks to the TURN server over IPv4, the candidate with an allocated IPv4 address will have a higher priority than the one with the IPv6 address because we are applying a "punishment" factor as part of the local-preference component in the priority formula. Staying within the same IP version whilst relaying traffic allows our TURN servers to use their eBPF kernel which results in a better UX due to lower latency and higher throughput. The final candidate ordering is ultimately decided by the controlling ICE agent which in our case is the Firezone Client. Thus, we don't necessarily need to update Gateways in order to test / benefit from this. Building a Client with this patch included should be enough to benefit from this change. Related: https://github.com/algesten/str0m/pull/640 Related: https://github.com/algesten/str0m/pull/644	2025-04-20 22:41:56 +00:00
Thomas Eizinger	f7f6e3885d	docs(website): remove duplicate `init` (#8860 ) Resolves: #8858	2025-04-19 22:09:06 +00:00
Jamil	5669c83835	ci: Bump Apple clients to 1.4.11 (#8848 ) Includes a fix for auto-starting on launch when other VPN clients have been connected previously.	2025-04-19 11:45:42 +00:00
Jamil	4c1379a6bf	fix(apple): Force enable VPN configuration on autoStart (#8814 ) If another VPN has been activated on the system while Firezone is active, Apple OSes will deactivate our configuration, and never reactivate it. We knew this already, and always activated the configuration when starting during the sign in flow, but failed to also do this when autoStarting on launch. This PR updates ensures that during autoStart, we re-enable the configuration as well. Fixes #8813	2025-04-18 18:00:44 +00:00
Jamil	a2e32a4918	ci: Bump apple to 1.4.10 to ship PKG (#8797 ) This publishes the 1.4.10 permalinks for the PKG download.	2025-04-17 15:13:44 +00:00
Jamil	fc7b6e3fb0	feat(ci): Publish installer PKG for macOS standalone (#8795 ) Microsoft Intune's DMG provisioner currently fails unexpectedly when trying to provision our published DMG file with the error: > The DMG file couldn't be mounted for installation. Check the DMG file if the error persists. (0x87D30139) I ran the following verification commands locally, which all passed: ``` hdiutil verify -verbose <dmg> hdiutil imageinfo -verbose <dmg> hdiutil hfsanalyze -verbose <dmg> hdiutil checksum -type SHA256 -verbose <dmg> hdiutil info -verbose hdiutil pmap -verbose <dmg> ``` So the issue appears to be most likely that Intune doens't like the `/Applications` shortcut in the DMG. This is a UX feature to make it easy to drag the application the /Applications folder upon opening the DMG. So we're publishing an PKG in addition to the DMG, which should be a more reliable artifact for MDMs to use. --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com>	2025-04-16 16:21:40 +00:00
Thomas Eizinger	4cf36cd8bd	docs(kb): update path to Gateway to new location (#8794 ) In #8480, we changed the location that `firezone-gateway` gets downloaded to but forgot to update the knowledgebase with the new path. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>	2025-04-16 13:20:28 +00:00
Jamil	aab691a67f	ci: Release Apple clients 1.4.9 (#8793 ) These contain the recent UDP thread enhancements.	2025-04-15 20:14:43 +00:00
Jamil	743f5fdfeb	ci: bump clients/gateway to ship write improvements (#8792 ) Signed-off-by: Jamil <jamilbk@users.noreply.github.com> Co-authored-by: Thomas Eizinger <thomas@eizinger.io>	2025-04-15 06:21:23 +00:00
Thomas Eizinger	282fdb96ea	chore: fixup changelog for latest releases (#8788 ) Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-04-14 20:41:47 -07:00
Thomas Eizinger	b3746b330f	refactor(connlib): spawn dedicated threads for UDP sockets (#7590 ) Correctly implementing asynchronous IO is notoriously hard. In order to not drop packets in the process, one has to ensure a given socket is ready to accept packets, buffer them if it is not case, suspend everything else until the socket is ready and then continue. Until now, we did this because it was the only option to run the UDP sockets on the same thread as the actual packet processing. That in turn was motivated by wanting to pass around references of the received packets for processing. Rust's borrow-checker does not allow to pass references between threads which forced us to have the sockets on the same thread as the packet processing. Like we already did in other places in `connlib`, this can be solved through the use of buffer pools. Using a buffer pool, we can use heap allocations to store the received packets without having to make a new allocation every time we read new packets. Instead, we can have a dedicated thread that is connected to `connlib`'s packet processing thread via two channels (one for inbound and one for outbound packets). These channels are bounded, which ensures backpressure is maintained in case one of the two threads lags behind. These bounds also mean that we have at most N buffers from the buffer pool in-flight (where N is the capacity of the channel). Within those dedicated threads, we can then use `async/await` notation to suspend the entire task when a socket isn't ready for sending. Resolves: #8000	2025-04-14 06:18:06 +00:00
Thomas Eizinger	e0f94824df	fix(gateway): default to 1 TUN thread on single-core systems (#8765 ) On single-core systems, spawning more than one TUN thread results in contention that hurts performance more than it helps. Resolves: #8760	2025-04-13 01:54:04 +00:00
Thomas Eizinger	132487c29e	fix(connlib): correctly compute the GSO batch size (#8754 ) We are currently naively chunking our buffer into `segment_size * max_gso_segments()`. `max_gso_segments` is by default 64. Assuming we processed several IP packets, this would quickly balloon to a size that the kernel cannot handle. For example, during an `iperf3` run, we receive _a lot_ of packets at maximum MTU size (1280). With the overhead that we are adding to the packet, this results in a UDP payload size of 1320. ``` 1320 x 64 = 84480 ``` That is way too large for the kernel to handle and it will fail the `sendmsg` call with `EMSGSIZE`. Unfortunately, this error wasn't surfaced because `quinn_udp` handles it internally because it can also happen as a result of MTU probes. We've already patched `quinn_udp` in the past to move the handling of more quinn-specific errors to the infallible `send` function. The same is being done for this error in https://github.com/quinn-rs/quinn/pull/2199. Resolves: #8699	2025-04-12 13:10:43 +00:00
Jamil	7f4bfc938c	docs: Update outdated docs regarding record types (#8532 )	2025-03-28 03:22:42 +00:00
Thomas Eizinger	19c5bc530a	feat(gateway): deprecate the NAT64 module (#8383 ) At present, the Gateway implements a NAT64 conversion that can convert IPv4 packets to IPv6 and vice versa. Doing this efficiently creates a fair amount of complexity within our `ip-packet` crate. In addition, routing ICMP errors back through our NAT is also complicated by this because we may have to translate the packet embedded in the ICMP error as well. The NAT64 module was originally conceived as a result of the new stub resolver-based DNS architecture. When the Client resolves IPs for a domain, it doesn't know whether the domain will actually resolve to IPv4 AND IPv6 addresses so it simply assigns 4 of each to every domain. Thus, when receiving an IPv6 packet for such a DNS resource, the Gateway may only have IPv4 addresses available and can therefore not route the packet (unless it translates it). This problem is not novel. In fact, an IP being unroutable or a particular route disappearing happens all the time on the Internet. ICMP was conceived to handle this problem and it is doing a pretty good job at it. We can make use of that and simply return an ICMP unreachable error back to the client whenever it picks an IP that we cannot map to one that we resolved. In this PR, we leave all of the NAT64 code intact and only add a feature-flag that - when active - sends aforementioned ICMP error. While offline (and thus also for our tests), the feature-flag evaluates to false. It is however set to `true` in the backend, meaning on staging and later in production, we will send these ICMP errors. Once this is rolled out and indeed proving to be working as intended, we can simplify our codebase and rip out the NAT64 module. At that point, we will also have to adapt the test-suite.	2025-03-27 01:01:37 +00:00
Jamil	cbea27cb57	fix(website): Update broken website links (#8518 ) Updates broken links found as a result of https://github.com/firezone/firezone/pull/8516	2025-03-25 21:12:31 +00:00
Thomas Eizinger	58086bf1e4	docs(website): fix broken links to terraform modules (#8515 )	2025-03-25 13:26:35 +00:00
Jamil	effe169414	chore: release apple 1.4.8 (#8499 ) Introduces the autoconnect and session end fixes.	2025-03-21 11:43:00 +00:00
Jamil	4701306835	docs: Update terraform gcp module docs for new published module (#8485 ) Updates our Google terraform module guide to suit the new published module in the Terraform registry.	2025-03-19 05:07:11 +00:00
Jamil	a8b9e34c33	fix(apple): Try to connect on launch (#8477 ) This is a regression introduced in `c9f085c102`. The `status` at this point is still `nil` because we have not yet fully subscribed to VPN status change updates from the system. That actually shouldn't prevent us from trying to start the tunnel anyway. If the `token` is missing from the Keychain, the tunnel process will no-op. So we simply try to start a session on launch always. Fixes #8456	2025-03-18 03:06:57 +00:00
Jamil	e642eefb35	chore: Cut all clients to ship search domains (#8442 ) Waiting on app reviews to be approved, then this PR will be ready to merge.	2025-03-17 17:25:11 +00:00
Jamil	a47b96bcad	chore: Release android 1.4.4 (#8449 ) This was already published on Google Play, but the other clients will follow suit in #8442.	2025-03-15 17:13:17 -05:00
Jamil	0809d992d6	docs: Search domains (#8437 ) - Adds search domains section to Deploy -> DNS docs - Mentions known issue: #8430	2025-03-14 10:49:48 +00:00
Jamil	eb195861c2	chore(website): Remove redundant no-changes block (#8424 ) https://github.com/firezone/firezone/pull/8413#pullrequestreview-2672919083	2025-03-14 02:35:22 +00:00
Jamil	25c708fb43	ci: Bump apple clients to 1.4.6 (#8418 )	2025-03-12 04:09:49 +00:00
Jamil	f3e36a2253	ci: bump android to 1.4.3 (#8416 )	2025-03-11 05:52:26 +00:00
Jamil	df5bbdd240	ci: Ship SRV/TXT for GUI/Headless/Gateway (#8413 )	2025-03-10 21:30:23 -07:00
Jamil	cb0283f00c	fix(android): Ensure Android layouts `fitsSystemWindows` (#8376 ) - Sets the `fitsSystemWindows` var to avoid overlapping any system controls - Makes all margin padding consistent at `@dimen/spacing_medium` so that no controls are right on the edge of the view Fixes: https://firezonehq.slack.com/archives/C08FPHECLUF/p1741266356394749 Fixes: #7094	2025-03-06 20:28:08 +00:00
Jamil	ab7e805fdd	fix(apple): actually show user-friendly alert messages (#8282 ) Before, we would receive an `NSError` object and the type-matching wouldn't take effect at all, causing the default alert to show every time. This solves that by introducing a `UserFriendlyError` protocol which is more robust against the two main `Error` and `NSError` variants.	2025-02-28 14:12:24 +00:00
Jamil	1bd8051aae	fix(connlib): Emit resources updated when display fields change (#8286 ) Whenever a Resource's name, address_description, or assigned sites change, it is not currently reflected in clients. For that to happen the address is changed. This PR updates that behavior so that if any display fields are changed, the `on_update_resources` callback is called which properly updates the resource list views in clients. Fixes #8284	2025-02-28 04:32:10 +00:00
Jamil	14436908d2	chore: Release GUI client 1.4.7 (#8275 )	2025-02-25 23:30:44 -08:00

1 2 3 4 5 ...

608 Commits