Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler
As part of the PR 132028 we added more e2e test coverage to validate
the fix, and check as much as possible there are no regressions.
The issue and the fix become evident largely when inspecting
memory allocation with the Memory Manager static policy enabled.
Quoting the commit message of bc56d0e45a
```
The podresources API List implementation uses the internal data of the
resource managers as source of truth.
Looking at the implementation here:
https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/pkg/kubelet/apis/podresources/server_v1.go#L60
we take care of syncing the device allocation data before querying the
device manager to return its pod->devices assignment.
This is needed because otherwise the device manager (and all the other
resource managers) would do the cleanup asynchronously, so the `List` call
will return incorrect data.
But we don't do this syncing neither for CPUs or for memory,
so when we report these we will get stale data as the issue #132020 demonstrates.
For CPU manager, we however have the reconcile loop which cleans the stale data periodically.
Turns out this timing interplay was actually the reason the existing issue #119423 seemed fixed
(see: #119423 (comment)).
But it's actually timing. If in the reproducer we set the `cpuManagerReconcilePeriod` to a time
very high (>= 5 minutes), then the issue still reproduces against current master branch
(https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/test/e2e_node/podresources_test.go#L983).
```
The missing actor here is memory manager. Memory manager has no
reconcile loop (implicit fixing the stale data problem) no explicit
synchronization, so it is the unlucky one which reported stale data,
leading to the eventual understanding of the problem.
For this reason it was (and still is) important to exercise it during
the test.
Turns out the test is however wrong, likely because a hidden dependency
between the test expectations and the lane configuration (notably
machine specs), so we disable the memory manager activation for the time
being, until we figure out a safe way to enable it.
Note this significantly weakens the signal for this specific test.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This avoids the overhead for the more complex conversion to v1beta1 and might
make it a bit more realistic to get rid of the v1beta1 eventually.
The expected GVK must be set explicitly because when emulating 1.33,
v1beta1 is the default although the fixed storage version is v1beta2.
It hasn't been on-by-default before, therefore it does not get locked to the
new default on yet. This has some impact on the scheduler configuration
because the plugin is now enabled by default.
Because the feature is now GA, it doesn't need to be a label on E2E tests,
which wouldn't be possible anyway once it gets removed entirely.
Some tests do version emulation and need the DRA feature. In that combination
the --runtime-config-emulation-forward-compatible option is needed to allow
enabling the V1 API although it's only available in 1.34.
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.
However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
The sig-node tests have scenarios of doing probes and
lifecycle handler tests with post-start and pre-stop hooks
setting the host field to be another pod.
In baseline level such things won't be allowed because of
the PSA rules we are adding in this PR. So unsetting
the host field means it uses the podIP of self for doing
the checks and using that in the pre-stop and post-start
hooks is tricky because of the timing issues with when the
container is actually up v/s running the test.
So I have changed the tests to be privileded for them to
use the .host fields if they desire to.
See https://github.com/kubernetes/kubernetes/issues/133091
which is an issue opened to properly refactor these tests.
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
The dual-stack integration tests already validate that we get the
expected Endpoints for single- and dual-stack Services. There is no
further "end to end" testing needed for Endpoints, given that
everything in a normal cluster would look at EndpointSlices, not
Endpoints.
There's no need to clarify how many nodes are used in the test because the
overall test names are still unique without that (verified with go test -v
./test/e2e -args -list-tests | grep -w DRA | wc -l).
"must be possible for the driver to update the ResourceClaim.Status.Devices
once allocated" was also run as kubelet test although it only checks the
control plane.
Before:
[sig-node] [DRA] [FeatureGate:DynamicResourceAllocation] [Beta] [Feature:OffByDefault] control plane with single node [ConformanceCandidate] must be possible for the driver to update the ResourceClaim.Status.Devices once allocated [FeatureGate:DRAResourceClaimDeviceStatus] [Beta]
[sig-node] [DRA] [FeatureGate:DynamicResourceAllocation] [Beta] [Feature:OffByDefault] kubelet [Feature:DynamicResourceAllocation] on single node must be possible for the driver to update the ResourceClaim.Status.Devices once allocated [FeatureGate:DRAResourceClaimDeviceStatus] [Beta]
After:
[sig-node] [DRA] [FeatureGate:DynamicResourceAllocation] [Beta] [Feature:OffByDefault] control plane with single node [ConformanceCandidate] must be possible for the driver to update the ResourceClaim.Status.Devices once allocated [FeatureGate:DRAResourceClaimDeviceStatus] [Beta]
New validation logic follows the API ratcheting principle, will not be executed for already stored invalid if the corresponding fields or item in array is not modified. Please enter the commit message for your changes. Lines starting
* Add FileKeyRef field and struct to the Pod API
* Add the implementation code in the kubelet.
* Add validation code
* Add basic functionality e2e tests
* add codes for drop disabled pod fields
* update go.mod
fix the utilities to enable multi-app-container tests,
which were previously quite hard to implement.
Add a consumer of the new utility to demonstrate the usage
and to initiate the basic coverage.
Signed-off-by: Francesco Romani <fromani@redhat.com>
add a e2e test to ensure that if the Get endpoint is asked
about a non-existing pod, it returns error.
Likewise, add a e2e test for terminated pods, which should
not be returned because they don't consume nor hold resources,
much like `List` does.
The expected usage patterns is to iterate over the list of
pods returned by `List`, but nevertheless the endpoint must
handle this case.
Signed-off-by: Francesco Romani <fromani@redhat.com>