Currently, we have a homegrown benchmark suite that reports results of
the iperf runs within CI by comparing a run on `main` with the current
branch.
These comments are noisy because they happen on every PR, regardless of
the performance results. As a result, they tend to be skimmed over by
devs and not actually considered. To properly track performance, we need
to record benchmark results over time and use statistics to detect
regressions.
https://bencher.dev does exactly that. it supports various benchmark
harnesses to automatically collect benchmarks. For our case, we simply
use the generic JSON adapter to extract the relevant metrics from the
iperf results and report them to the bencher backend.
With these metrics in place, bencher can plot the results over time, and
alert us in the case of regressions using thresholds based on
statistical tests.
Resolves: #5818.
---------
Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>
Now that we've worked out the flakiness from the iperf tests, we should
increase the UDP send rate so we have some benchmark of how many packets
we can actually handle before dropping.
Whenever we receive a `relays_presence` message from the portal, we
invalidate the candidates of all now disconnected relays and make
allocations on the new ones. This triggers signalling of new candidates
to the remote party and migrates the connection to the newly nominated
socket.
This still relies on #4613 until we have #4634.
Resolves: #4548.
---------
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>
As part of #4568, we are adding a 2nd relay which showed some
short-comings of the current process state assertions because they were
running outside the docker containers, thus listing all relays as soon
as there are multiple.
The clients, gateway and relay all employ an internal design that is
based on an eventloop. This gives us a lot of control in how various IO
components interact with each other. Great control also comes with a
source of bugs, the latest of which made the relay busy-loop once it
started relaying some traffic.
Eventloops are notoriously hard to unit-test because they compose
various IO bits together. Instead of writing unit tests, we can go and
assert the process state after the performance tests. Those generate a
fair bit of load on all our components but after that, they should
suspend.
The most effective tests survive even large refactorings and for that,
they need to be coded against a stable API / property. Asserting that
the process sleeps when it is idle from an application PoV is such a
property.
Related: #4511.
Fixes some issues encountered after the merge of #4049
- Fix performance tests to only run using base_ref and head_ref to avoid
dependence on `main`
- Fixes some typos
- Prevents a catch-22 condition where breaking compatibility meant we
wouldn't be able to deploy production
- Lower UDP bandwidth to 50M -- this fixes intermittent file descriptor
issues because we overload iperf3 for more than 5 seconds
- Simplify iperf3 to the minimum set that makes tests reliable
The iperf3 server sometimes hangs, or takes a while to startup.
Rather than trying to reset the iperf3 state between performance tests,
this PR refactors them so they each run in their matrix job. This
ensures each performance test will run on a separate VM, unaffected by
previous test runs to eliminate the effect any residual network buffer
state can have on a particular test.
It also makes sure the server is listening with a `healthcheck`.
So the cause of the flaky tests is that they aren't waiting long enough
for a connection to be established. Both the test in #3666 and the
`iperf` tests have a timeout of 10 seconds.
Connections _should_ be established **very quickly** in CI. However, I
have a few guesses as to why they might not be, essentially causing us
to have to wait for a timeout to re-initiate a connection request:
- Packets arrive out of order or too quickly for the WireGuard state
machine to establish a handshake.
- Too many ICE candidates gathered (the gateway has 3 interfaces)
This PR:
- Refactors the iperf tests to be a little easier to maintain
- Ensures `integration-tests` run for at least 30 seconds before timing
out
In any case, we can debug / optimize this further after snownet is
merged, which might just solve the problem completely.