The test was flaky because it required the job succeeds 3 times with
pseudorandom 50% failure chance within 15 minutes, while there is an
exponential back-off delay (10s, 20s, 40s …) capped at 6 minutes before
recreating failed pods. As 7 consecutive failures (1/128 chance) could
take 20+ minutes, exceeding the timeout, the test failed intermittently
because of "timed out waiting for the condition".
This PR forces the Pods of a Job to be scheduled to a single node and
uses a hostPath volume instead of an emptyDir to persist data across new
Pods.
The redis version has been bumped to version 5.0.5, but the maximum version supported on
Windows is 3.2. This can lead to failing tests, the output and behaviour can be different
(see #80516). In order to prevent such failures, the amount of times the Redis image is
used can be reduced.
This commit uses the previously added agnhost guestbook subcommand as a replacement for the
Guestbook application created by the test "should create and stop a working application".
Adds AgnhostPrivate to test/utils/image/manifest. Some tests are trying to pull
the agnhost image from the private registry, meaning that we would need to
always build and push the agnhost image to both e2e and private registry
whenever we bump its version. Decoupling them would mean that we only need
to push the image to the e2e registry.
This PR moves functions from test/e2e/framework.util.go for making e2e
core framework small and simple:
- RestartKubeProxy: Moved to e2e network package
- CheckConnectivityToHost: Moved to e2e network package
- RemoveAvoidPodsOffNode: Move to e2e scheduling package
- AddOrUpdateAvoidPodOnNode: Move to e2e scheduling package
- UpdateDaemonSetWithRetries: Move to e2e apps package
- CheckForControllerManagerHealthy: Moved to e2e storage package
- ParseKVLines: Removed because of e9345ae5f0
- AddOrUpdateLabelOnNodeAndReturnOldValue: Removed because of ff7b07c43c
Tests should provide more verbose output when their core functionality
fails; in this case we were only logging that the job count did not
match but this is not sufficient for understanding a failure.
Many TestJig methods made the caller pass a serviceName argument, even
though the jig already has a name, and every caller was passing the
same name to each function as they had passed to NewTestJig().
Likewise, many methods made the caller pass a namespace argument, but
only a single test used more than one namespace, and it can easily be
rewritten to use two test jigs as well.