firezone

mirror of https://github.com/outbackdingo/firezone.git synced 2026-01-27 18:18:55 +00:00

Author	SHA1	Message	Date
Thomas Eizinger	a8aafc9e14	ci: use bencher.dev for continuous benchmarking (#5915 ) Currently, we have a homegrown benchmark suite that reports results of the iperf runs within CI by comparing a run on `main` with the current branch. These comments are noisy because they happen on every PR, regardless of the performance results. As a result, they tend to be skimmed over by devs and not actually considered. To properly track performance, we need to record benchmark results over time and use statistics to detect regressions. https://bencher.dev does exactly that. it supports various benchmark harnesses to automatically collect benchmarks. For our case, we simply use the generic JSON adapter to extract the relevant metrics from the iperf results and report them to the bencher backend. With these metrics in place, bencher can plot the results over time, and alert us in the case of regressions using thresholds based on statistical tests. Resolves: #5818. --------- Signed-off-by: Thomas Eizinger <thomas@eizinger.io> Co-authored-by: Reactor Scram <ReactorScram@users.noreply.github.com>	2024-07-24 01:22:17 +00:00
Jamil	a43f39ae8b	perf: increase UDP send rate for performance test (#4793 ) Now that we've worked out the flakiness from the iperf tests, we should increase the UDP send rate so we have some benchmark of how many packets we can actually handle before dropping.	2024-04-26 21:11:44 +00:00
Thomas Eizinger	51089b89e7	feat(connlib): smoothly migrate relayed connections (#4568 ) Whenever we receive a `relays_presence` message from the portal, we invalidate the candidates of all now disconnected relays and make allocations on the new ones. This triggers signalling of new candidates to the remote party and migrates the connection to the newly nominated socket. This still relies on #4613 until we have #4634. Resolves: #4548. --------- Co-authored-by: Jamil <jamilbk@users.noreply.github.com>	2024-04-20 06:16:35 +00:00
Thomas Eizinger	4972e49b34	ci: run assertions inside docker container (#4680 ) As part of #4568, we are adding a 2nd relay which showed some short-comings of the current process state assertions because they were running outside the docker containers, thus listing all relays as soon as there are multiple.	2024-04-18 23:48:42 +00:00
Thomas Eizinger	8d49452668	ci: assert that nothing busy loops after the perf tests (#4546 ) The clients, gateway and relay all employ an internal design that is based on an eventloop. This gives us a lot of control in how various IO components interact with each other. Great control also comes with a source of bugs, the latest of which made the relay busy-loop once it started relaying some traffic. Eventloops are notoriously hard to unit-test because they compose various IO bits together. Instead of writing unit tests, we can go and assert the process state after the performance tests. Those generate a fair bit of load on all our components but after that, they should suspend. The most effective tests survive even large refactorings and for that, they need to be coded against a stable API / property. Asserting that the process sleeps when it is idle from an application PoV is such a property. Related: #4511.	2024-04-09 07:09:50 +00:00
Jamil	391150f0e1	chore(ci): Fix new issues in cd.yml (#4085 ) Fixes some issues encountered after the merge of #4049 - Fix performance tests to only run using base_ref and head_ref to avoid dependence on `main` - Fixes some typos - Prevents a catch-22 condition where breaking compatibility meant we wouldn't be able to deploy production	2024-03-12 02:06:19 +00:00
Jamil	3bd7dc504e	fix(ci): Fix flaky iperf3 "Bad file descriptor" (#3731 ) - Lower UDP bandwidth to 50M -- this fixes intermittent file descriptor issues because we overload iperf3 for more than 5 seconds - Simplify iperf3 to the minimum set that makes tests reliable	2024-02-22 19:57:22 +00:00
Jamil	5bd717b877	fix(ci): Use workflow id to fetch perf results (#3710 )	2024-02-20 19:40:16 -08:00
Jamil	63cdd09a01	refactor(ci): Merge perf results into one comment (#3707 ) One comment vs eight, need I say more?	2024-02-20 18:17:48 -08:00
Jamil	2d208b1991	fix(ci): Fix js typo (#3704 ) More fixes from the perf test refactor	2024-02-20 16:38:05 -08:00
Jamil	0598ca55c3	fix(ci): Fix result overwrite (#3700 ) Buttoning up fixes from #3695	2024-02-20 15:46:59 -08:00
Jamil	7ff40b82ed	fix(ci): Run each perf test in its own matrix job (#3695 ) The iperf3 server sometimes hangs, or takes a while to startup. Rather than trying to reset the iperf3 state between performance tests, this PR refactors them so they each run in their matrix job. This ensures each performance test will run on a separate VM, unaffected by previous test runs to eliminate the effect any residual network buffer state can have on a particular test. It also makes sure the server is listening with a `healthcheck`.	2024-02-20 22:44:20 +00:00
Jamil	eebd7fc7f1	fix(ci): Ensure integration-tests allow for at least 30 seconds to establish a connection (#3676 ) So the cause of the flaky tests is that they aren't waiting long enough for a connection to be established. Both the test in #3666 and the `iperf` tests have a timeout of 10 seconds. Connections _should_ be established very quickly in CI. However, I have a few guesses as to why they might not be, essentially causing us to have to wait for a timeout to re-initiate a connection request: - Packets arrive out of order or too quickly for the WireGuard state machine to establish a handshake. - Too many ICE candidates gathered (the gateway has 3 interfaces) This PR: - Refactors the iperf tests to be a little easier to maintain - Ensures `integration-tests` run for at least 30 seconds before timing out In any case, we can debug / optimize this further after snownet is merged, which might just solve the problem completely.	2024-02-19 20:50:58 +00:00

13 Commits