mirror of
https://github.com/outbackdingo/firezone.git
synced 2026-01-27 18:18:55 +00:00
DNS resolution is a critical part of `connlib`. If it is slow for whatever reason, users will notice this. To make sure we notice as well, we add `telemetry` spans to the client's and gateway's DNS resolution. For the client, this applies to all DNS queries that we forward to the upstream servers. For the gateway, this applies to all DNS resources. In addition to those IO operations, we also instrument the `match_resource_linear` function. This function operates in `O(n)` of all defined DNS resources. It _should_ be fast enough to not create an impact but it can't hurt to measure this regardless. Lastly, we also instrument `refresh_translations` on the gateway. Refreshing the DNS resolution of a DNS resource should really only happen, when the previous IP addresses become stale yet the user is still trying to send traffic to them. We don't actually have any data on how often that happens. By instrumenting it, we can gather some of this data. To make sure that none of these telemetry events and spans hurt the end-user performance, we introduce macros to `firezone-logging` that sample the creation of these events and spans at a rate of 1%. I ran a flamegraph and none of these even showed up. The most critical one here is probably the `match_resource_linear` span because it happens on every DNS query. Resolves: #7198. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io>