add (admittedly pretty crude) CPU allocatable check.
A more incisive refactoring is needed, but we need
to unbreak CI first, so this seems the minimal decently clean test.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Our CI machines happen to have 1 fully allocatable CPU for test workloads.
This is really, really the minimal amount. But still should be sufficient for the tests to run
the tests; the CFS quota pod, however, does create a series of pods (at time of writing, 6)
and does the cleanup only at the very end the end. This means pods
requiring resources accumulate on the CI machine node.
The fix implemented here is to just clean up after each subcase.
Doing so the cpu test footprint is equal to the higher requirement (say, 1000 millicores) vs
the sum of all the subcases requirements.
Doing like this doesn't change the test behavior, and make it possible
to run it on very barebones machines.
For unknown reasons, hack/make-rules/test-e2e-node.sh adds -timeout instead of
--timeout. Therefore the fallback code in test/e2e_node/remote/remote.go didn't
find it and added its own --timeout=60m after it. This effectively limits E2E
node test runs to 60 minutes, regardless of what is specified in the job:
W0206 09:53:51.425532 7151 remote.go:158] ginkgo flags are missing explicit --timeout (ginkgo defaults to 60 minutes)
I0206 09:53:51.425565 7151 remote.go:165] updated ginkgo flags: -timeout=24h --label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky && !Slow" --no-color -v --timeout=60m
...
I0206 09:53:57.767096 7151 ssh.go:146] Running the command ssh, with args: ... timeout -k 30s 3600.000000s ./ginkgo -timeout=24h --label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky && !Slow" --no-color -v --timeout=60m ...
Note that the timeout for the test was 60m in this case (hence the "timeout -k
30s 3600.000000s") but it could also be something larger.
With the device plugin node reboot test fixed, we can see in testgrid
[node-kubelet-containerd-flaky](https://testgrid.k8s.io/sig-node-containerd#node-kubelet-containerd-flaky)
that the test is passing consitently and we can remove the flaky label.
With the test not flaky anymore, we can validate new PRs against it
and ensure we don't cause regressions.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
We have a e2e test which tries to ensure device plugin assignments to pods are kept
across node reboots. And this tests is permafailing since many weeks at
time of writing (xref: #128443).
Problem is: closer inspection reveals the test was well intentioned, but
puzzling:
The test runs a pod, then restarts the kubelet, then _expects the pod to
end up in admission failure_ and yet _ensure the device assignment is
kept_! https://github.com/kubernetes/kubernetes/blob/v1.32.0-rc.0/test/e2e_node/device_plugin_test.go#L97
A reader can legitmately wonder if this means the device will be kept busy forever?
This is not the case, luckily. The test however embodied the behavior at
time of the kubelet, in turn caused by #103979
Device manager used to record the last admitted pod and forcibly added
to the list of active pod. The retention logic had space for exactly one
pod, the last which attempted admission.
This retention prevented the cleanup code
(see: https://github.com/kubernetes/kubernetes/blob/v1.32.0-rc.0/pkg/kubelet/cm/devicemanager/manager.go#L549
compare to: https://github.com/kubernetes/kubernetes/blob/v1.31.0-rc.0/pkg/kubelet/cm/devicemanager/manager.go#L549)
to clear the registration, so the device was still (mis)reported
allocated to the failed pod.
This fact was in turn leveraged by the test in question:
the test uses the podresources API to learn about the device assignment,
and because of the chain of events above the pod failed admission yet
was still reported as owning the device.
What happened however was the next pod trying admission would have
replaced the previous pod in the device manager data, so the previous
pod was no longer forced to be added into the active list, so its
assignment were correctly cleared once the cleanup code runs;
And the cleanup code is run, among other things, every time device
manager is asked to allocated devices and every time podresources API
queries the device assignment
Later, in PR https://github.com/kubernetes/kubernetes/pull/120661
the forced retention logic was removed from all the resource managers,
thus also from device manager, and this is what caused the permafailure.
Because all of the above, it should be evident that the e2e test was
actually enforcing a very specific and not really work-as-intended
behavior, which was also overall quite puzzling for users.
The best we can do is to fix the test to record and ensure that
pods which did fail admission _do not_ retain device assignment.
Unfortunately, we _cannot_ guarantee the desirable property that
pod going running retain their device assignment across node reboots.
In the kubelet restart flow, all pods race to be admitted. There's no
order enforced between device plugin pods and application pods.
Unless an application pod is lucky enough to _lose_ the race with both
the device plugin (to go running before the app pod does) and _also_
with the kubelet (which needs to set devices healthy before the pod
tries admission).
Signed-off-by: Francesco Romani <fromani@redhat.com>
The "// import <path>" comment has been superseded by Go modules.
We don't have to remove them, but doing so has some advantages:
- They are used inconsistently, which is confusing.
- We can then also remove the (currently broken) hack/update-vanity-imports.sh.
- Last but not least, it would be a first step towards avoiding the k8s.io domain.
This commit was generated with
sed -i -e 's;^package \(.*\) // import.*;package \1;' $(git grep -l '^package.*// import' | grep -v 'vendor/')
Everything was included, except for
package labels // import k8s.io/kubernetes/pkg/util/labels
because that package is marked as "read-only".