Nobody looks at these logs, writing them uses unnecessary CPU + storage
on users devices. It also means we have 1 background thread less because
we need one less non-blocking writer.
One of Rust's promises is "if it compiles, it works". However, there are
certain situations in which this isn't true. In particular, when using
dynamic typing patterns where trait objects are downcast to concrete
types, having two versions of the same dependency can silently break
things.
This happened in #7379 where I forgot to patch a certain Sentry
dependency. A similar problem exists with our `tracing-stackdriver`
dependency (see #7241).
Lastly, duplicate dependencies increase the compile-times of a project,
so we should aim for having as few duplicate versions of a particular
dependency as possible in our dependency graph.
This PR introduces `cargo deny`, a linter for Rust dependencies. In
addition to linting for duplicate dependencies, it also enforces that
all dependencies are compatible with an allow-list of licenses and it
warns when a dependency is referred to from multiple crates without
introducing a workspace dependency. Thanks to existing tooling
(https://github.com/mainmatter/cargo-autoinherit), transitioning all
dependencies to workspace dependencies was quite easy.
Resolves: #7241.
Our logging library `tracing` supports structured logging. Structured
logging means we can include values within a `tracing::Event` without
having to immediately format it as a string. Processing these values -
such as errors - as their original type allows the various `tracing`
layers to capture and represent them as they see fit.
One of these layers is responsible for sending ERROR and WARN events to
Sentry, as part of which `std::error::Error` values get automatically
captured as so-called "sentry exceptions".
Unfortunately, there is a caveat: If an `std::error::Error` value is
included in an event that does not get mapped to an exception, the
`error` field is completely lost. See
https://github.com/getsentry/sentry-rust/issues/702 for details.
To work around this, we introduce a `err_with_sources` adapter that an
error and all its sources together into a string. For all
`tracing::debug!` statements, we then use this to report these errors.
It is really unfortunate that we have to do this and cannot use the same
mechanism, regardless of the log level. However, until this is fixed
upstream, this will do and gives us better information in the log
submitted to Sentry.
With #7105, all ERROR events from `tracing` get logged as exceptions in
Sentry and all WARN events get logged as "messages". We don't want to
fill up the user's harddrive with logs which means we have to be
somewhat conservative, what gets logged on INFO and above (with INFO
being the default log level). There are certain events though where it
would be useful to know, how often they happen because too many of them
can indicate a problem.
To solve this problem, we introduce a dedicated `telemetry` log target
that the tracing-sentry integration layer watches for. Events for the
`telemetry` log target that gets logged on TRACE will be sampled at a
rate of 1% and submitted as messages to Sentry.
Using the `sentry-tracing` integration, we can automatically capture
events based on what we log via `tracing`. The mapping is defined as
follows:
- ERROR: Gets captured as a fatal error
- WARN: Gets captured as a message
- INFO: Gets captured as a breadcrumb
- `_`: Does not get captured at all
If telemetry isn't active / configured, this integration does nothing.
It is therefore safe to just always enable it.
Currently, any failure in the `StubResolver` while processing a query
results only in a log but no response. For UDP DNS queries, that isn't
too bad because the client will simply retry. With the upcoming support
for TCP DNS queries in #6944, we should really always reply with a
message.
This PR refactors the handling of DNS messages to always generate a
reply. In the case of an error, we reply with SERVFAIL.
This opens up a few more refactorings where we can now collapse the
handling of some branches into the same. As part of that, I noticed the
recurring need for "unwrapping" a `Result<(), E>` and logging the error.
To make that easier, I introduced an extension trait that does exactly
that.
This PR introduces a custom logging format for all Rust-components. It
is more or less a copy of `tracing_subscriber::fmt::format::Compact`
with the main difference that span-names don't get logged.
Spans are super useful because they allow us to record contextual
values, like the current connection ID, for a certain scope. What is IMO
less useful about them is that in the default formatter configuration,
active spans cause a right-drift of the actual log message.
The actual log message is still what most accurately describes, what
`connlib` is currently doing. Spans only add contextual information that
the reader may use for further understand what is happening. This
optional nature of the utility of spans IMO means that they should come
_after_ the actual log message.
Resolves: #7014.
Setting up a logger is something that pretty much every entrypoint needs
to do, be it a test, a shared library embedded in another app or a
standalone application. Thus, it makes sense to introduce a dedicated
crate that allows us to bundle all the things together, how we want to
do logging.
This allows us to introduce convenience functions like
`firezone_logging::test` which allow you to construct a logger for a
test as a one-liner.
Crucially though, introducing `firezone-logging` gives us a place to
store a default log directive that silences very noisy crates. When
looking into a problem, it is common to start by simply setting the
log-filter to `debug`. Without further action, this floods the output
with logs from crates like `netlink_proto` on Linux. It is very unlikely
that those are the logs that you want to see. Without a preset filter,
the only alternative here is to explicitly turn off the log filter for
`netlink_proto` by typing something like
`RUST_LOG=netlink_proto=off,debug`. Especially when debugging issues
with customers, this is annoying.
Log filters can be overridden, i.e. a 2nd filter that matches the exact
same scope overrides a previous one. Thus, with this design it is still
possible to activate certain logs at runtime, even if they have silenced
by default.
I'd expect `firezone-logging` to attract more functionality in the
future. For example, we want to support re-loading of log-filters on
other platforms. Additionally, where logs get stored could also be
defined in this crate.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>