As part of the PR 132028 we added more e2e test coverage to validate
the fix, and check as much as possible there are no regressions.
The issue and the fix become evident largely when inspecting
memory allocation with the Memory Manager static policy enabled.
Quoting the commit message of bc56d0e45a
```
The podresources API List implementation uses the internal data of the
resource managers as source of truth.
Looking at the implementation here:
https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/pkg/kubelet/apis/podresources/server_v1.go#L60
we take care of syncing the device allocation data before querying the
device manager to return its pod->devices assignment.
This is needed because otherwise the device manager (and all the other
resource managers) would do the cleanup asynchronously, so the `List` call
will return incorrect data.
But we don't do this syncing neither for CPUs or for memory,
so when we report these we will get stale data as the issue #132020 demonstrates.
For CPU manager, we however have the reconcile loop which cleans the stale data periodically.
Turns out this timing interplay was actually the reason the existing issue #119423 seemed fixed
(see: #119423 (comment)).
But it's actually timing. If in the reproducer we set the `cpuManagerReconcilePeriod` to a time
very high (>= 5 minutes), then the issue still reproduces against current master branch
(https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/test/e2e_node/podresources_test.go#L983).
```
The missing actor here is memory manager. Memory manager has no
reconcile loop (implicit fixing the stale data problem) no explicit
synchronization, so it is the unlucky one which reported stale data,
leading to the eventual understanding of the problem.
For this reason it was (and still is) important to exercise it during
the test.
Turns out the test is however wrong, likely because a hidden dependency
between the test expectations and the lane configuration (notably
machine specs), so we disable the memory manager activation for the time
being, until we figure out a safe way to enable it.
Note this significantly weakens the signal for this specific test.
Signed-off-by: Francesco Romani <fromani@redhat.com>
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.
However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
fix the utilities to enable multi-app-container tests,
which were previously quite hard to implement.
Add a consumer of the new utility to demonstrate the usage
and to initiate the basic coverage.
Signed-off-by: Francesco Romani <fromani@redhat.com>
add a e2e test to ensure that if the Get endpoint is asked
about a non-existing pod, it returns error.
Likewise, add a e2e test for terminated pods, which should
not be returned because they don't consume nor hold resources,
much like `List` does.
The expected usage patterns is to iterate over the list of
pods returned by `List`, but nevertheless the endpoint must
handle this case.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Introduce a mock net.Listener for tests that triggers a controlled
error on Close, enabling reliable simulation of gRPC server failures
in test scenarios.
Refactor StartPlugin and related test helpers to accept a variadic
list of options of any type, allowing both public and test-specific
options to be passed.
Refactor the DRA e2e_node test helpers and test cases to accept
variadic kubeletplugin.Option arguments.
This change improves test flexibility and maintainability, allowing
new options to be passed in the future without requiring widespread
code changes.
There are no functional changes to test coverage or behavior.
We want to fix and enhance lanes which exercise
the podresources API tests. The first step is to clarify
the label and made it specific to podresources API,
minimzing the clash and the ambiguity with the "PodLevelResources"
feature.
Note we change the label names, but the label name is backward
compatible (filtering for "Feature:PodResources" will still
get the tests). This turns out to be not a problem because
these tests are no longer called out explicitly in the lane
definitions. We want to change this ASAP.
The new name is more specific and allows us to clearly
call out tests for this feature in the lane definitions.
Signed-off-by: Francesco Romani <fromani@redhat.com>
add more e2e tests to cover the interaction with
core resource managers (cpu, memory) and to ensure
proper reporting.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Since the KEP 4885
(https://github.com/kubernetes/enhancements/blob/master/keps/sig-windows/4885-windows-cpu-and-memory-affinity/README.md)
memory manager is supported also on windows.
Plus, we want to add podresources e2e tests which configure
the memory manager. Both these facts suggest it's useful to build
the e2e memory manager tests on all OSes, not just on linux;
However, since we are not sure we are ready to run these tests
everywhere, we tag them LinuxOnly to keep preserve most of the
old behavior.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The non regression tests should check the quota management
introduced in #127525 can be disabled, so we need to verify
the previous behaviour using the integer quotas.
It seems the problem was just a bad rebase that wrongly duplicated
the tests. We fix removing the incorrect duplicates.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Whenever swap is provisioned on the node,
the kernel might be able to reclaim much more memory,
hence it is harder to get the node to be memory pressured.
This will add another container that allocates
the same amount as the swap capacity to help
bring the node to memory pressure.
Signed-off-by: Itamar Holder <iholder@redhat.com>
This small refactor:
- Adds swap log statistics.
- Adds a pre pods modification function.
The later can be used in order to perform
changes to pods before creation.
Signed-off-by: Itamar Holder <iholder@redhat.com>