This is not necessarily a problem, some code might use a timeout and expect it
to trigger. Therefore this should only be an info message, not a
warning. Long-term it might be useful to have an API where the caller decides
whether this gets logged.
The caller should use short messages and leave it to the user of those to
provide more context (no pun intended...). When logging, "canceling context" is
that context.
Before:
scheduler_perf.go:1431: FATAL ERROR: op 7: delete scheduled pods: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
contexthelper.go:69:
WARNING: the operation ran for the configured 2s
After:
scheduler_perf.go:1431: FATAL ERROR: op 7: delete scheduled pods: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline
contexthelper.go:69:
INFO: canceling context: the operation ran for the configured 2s
How exactly a test reacts when its context times out is unclear. In the case of
scheduler_perf, the apiserver started to shut down and the test failure then
was about not being able to reach the apiserver, which was a bit confusing.
To make it more obvious why the shutdown starts, a WARNING message gets added
to the test output by ktesting before cancellation and thus before any other
output related to that cancellation.
format.Object adds some white space in front of the value and a type identifier
in angle brackets. Both is distracting when printing simple values and can be
avoided by picking fmt.Sprintf for those types, plus trimming the result of
format.Object.
Before:
allocator.go:483: I0625 15:35:31.946980] Allocating one device currentClaim= <int>: 0 totalClaims= <int>: 1 currentRequest= <int>: 0 totalRequestsPerClaim= <int>: 1 currentDevice= <int>: 0 devicesPerRequest= <int>: 1 allDevices= <bool>: false adminAccess= <bool>: false
After:
allocator.go:483: I0625 15:35:04.371441] Allocating one device currentClaim=0 totalClaims=1 currentRequest=0 totalRequestsPerClaim=1 currentDevice=0 devicesPerRequest=1 allDevices=false adminAccess=false
The return type of ktesting.NewTestContext is now a TContext. Code
which combined it WithCancel often didn't compile anymore (cannot overwrite
ktesting.TContext with context.Context). This is a good thing because all of
that code can be simplified to let ktesting handle the cancelation.
Extending the duration and the allowed delta in f6682370b1 was still not enough
to make the unit test run reliably in pull-kubernetes-unit.
Now it uses the original, stricter timing again, but only when run locally. In
Prow (detected by checking the "CI" env variable), the duration check is
skipped.
The new TContext interface combines a normal context and the testing interface,
then adds some helper methods. The context gets canceled when the test is done,
but that can also be requested earlier via Cancel.
The intended usage is to pass a single `tCtx ktesting.TContext` parameter
around in all helper functions that get called by a unit or integration test.
Logging is also more useful: Log[f] and Fatal[f] output is prefixed with
"[FATAL] ERROR: " to make it stand out more from regular log output.
If this approach turns out to be useful, it could be extended further (for
example, with a per-test timeout) and might get moved to a staging repository
to enable usage of it in other staging repositories.
To allow other implementations besides testing.T and testing.B, a custom
ktesting.TB interface gets defined with the methods expected from the
actual implementation. One such implementation can be ginkgo.GinkgoT().
The upstream ktesting has to be very flexible to accommodate different ways of
using it. In Kubernetes, we can be opinionated and make certain choices, like
using klog flags, and only those.