Commit Graph

3160 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
b99ca3f736 Merge pull request #132498 from ffromani/e2e-serial-node-cpumanager-fix-ordered
e2e: serial: node cpumanager parity with the old suite
2025-07-01 07:15:31 -07:00
Kubernetes Prow Robot
dcefe0ef41 Merge pull request #132058 from pohly/dra-kubelet-connection-monitoring
DRA kubelet: connection monitoring
2025-06-26 03:40:29 -07:00
Kubernetes Prow Robot
1e59323e60 Merge pull request #132065 from yuanwang04/SwapMetrics
Fix pod and container level swap metrics for CRI
2025-06-25 16:22:28 -07:00
Sascha Grunert
0028ea8e99 Improve containers lifecycle test output parsing
This should fix the following test when running it with CRI-O:

```
[It] [sig-node] [Feature:SidecarContainers] [Serial] Containers
Lifecycle when A node running restartable init containers reboots should
restart the containers in right order with the proper phase after the
node reboot
```

The issue is that we have prefixed "unable to retrieve container logs
for …" outputs in the message to be parsed. We now skip that part and
leave the current behavior untouched.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2025-06-25 08:51:29 +02:00
Francesco Romani
3b0fd32810 e2e: serial: cpumanager: continue on failure
The `ginkgo.ContinueOnFailure` decorator serves the usecase
of the new cpumanager tests perfectly:

https://onsi.github.io/ginkgo/#failure-handling-in-ordered-containers

"""
You can override this behavior by decorating an Ordered container with
ContinueOnFailure. This is useful in cases where Ordered is being used
to provide shared expensive set up for a collection of specs.
When ContinueOnFailure is set, Ginkgo will continue running specs even
if an earlier spec in the Ordered container has failed.
"""

And this is exactly the case at hand. Previously, without this
decorator, subsequent failures were masked, which is dangerous and not
what we want.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-06-24 15:46:06 +02:00
Francesco Romani
f76e1381d0 e2e: node: fix quota disablement testcases
Initially we added minimal quota disablement e2e tests,
but since the emergence of https://github.com/kubevirt/kubevirt/issues/14965
it becames clear that is better to have full coverage.

This PR restores coverage parity with the old test suite.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-06-24 15:46:01 +02:00
Ed Bartosh
cf544da6f7 e2e_node: DRA: add tests for different socket setups
Added tests to verify DRA functionality with 2 different socket
configurations:
- the same socket is used for the registration and the DRA service
- 2 separate sockets are used for the registration and the DRA service

Used table-driven ginkgo to avoid code duplication:
specs https://onsi.github.io/ginkgo/#table-driven-tests

This change enhances the robustness of the DRA e2e tests by
validating its behavior with different socket setups.
2025-06-24 10:42:45 +02:00
Ed Bartosh
7f6389e770 e2e_node: DRA: pass socket path as a parameter
Added an ability to specify the socket path for the DRA gRPC
service in the e2e node tests.

The PluginSocket option is added to allow setting the name
of the socket inside the directory where the DRA driver
creates the socket for the DRA gRPC calls. This is used by
the kubelet to connect to the DRA plugin.

The newDRAService and newRegistrar functions are updated to
accept a socketPath parameter, which is used to configure
the PluginDataDirectoryPath and PluginSocket options for the
DRA plugin.

This change enables more flexible configuration of the DRA
plugin in e2e tests, allowing for testing with different
socket paths.
2025-06-24 10:42:45 +02:00
Ed Bartosh
c90c2e0d40 kubelet: DRA: fix linter warnings
Fixed the following warnings:
dra_test.go:884:2: singleCaseSwitch: should rewrite switch statement to if statement (gocritic)
	switch podName {
	^
dra_test.go:686:4: SA4006: this value of kubeletPlugin is never used (staticcheck)
	kubeletPlugin = newDRAService(ctx, f.ClientSet, nodeName, driverName)
        ^
2025-06-24 10:42:45 +02:00
Ed Bartosh
4ee7374b24 DRA kubelet: add connection monitoring
This ensures that ResourceSlices get removed also when a plugin becomes
unresponsive without removing the registration socket.

Tests are from https://github.com/kubernetes/kubernetes/pull/131073 by Ed
with some modifications, the implementation is new.
2025-06-24 10:42:41 +02:00
Yuan Wang
c5f061e0df Fix pod and container level swap metrics for CRI 2025-06-23 17:57:12 +00:00
Kubernetes Prow Robot
54291a55c2 Merge pull request #132096 from pohly/dra-kubelet-refactoring
DRA kubelet: refactoring
2025-06-13 04:45:09 -07:00
Kubernetes Prow Robot
8afdc5583f Merge pull request #132215 from ffromani/e2e-serial-cpumgr-crio-fix
e2e: node: serial: fix cgroup path with crio
2025-06-11 04:04:57 -07:00
Francesco Romani
b39741b506 e2e: node: serial: fix cgroup path with crio
the path construction with crio is wrong (typo).

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-06-10 19:25:48 +02:00
Patrick Ohly
494a129d02 DRA kubelet: clarify plugin vs, driver name
The rest of the system logs information using "driverName" as key in structured
logging. The kubelet should do the same.

This also gets clarified in the code, together with using consistent a
consistent name for a Plugin pointer: "plugin" instead of "client" or
"instance".

The New in NewDRAPluginClient made no sense because it's not constructing
anything, and it returns a plugin, not a client -> GetDRAPlugin.
2025-06-06 18:24:33 +02:00
Kubernetes Prow Robot
8bcc78c7bf Merge pull request #132067 from bzsuni/bz/npd/update/0.8.21
Update npd from v0.8.20 to v0.8.21
2025-06-05 11:58:38 -07:00
Kubernetes Prow Robot
6188e5cb7b Merge pull request #132101 from haircommander/restart-flake
e2e_node: verify restart looping container correctly
2025-06-04 13:40:50 -07:00
Kubernetes Prow Robot
6eaef7b0d6 Merge pull request #131969 from skitt/test-e2e-pkg-errors
test: drop dependency on github.com/pkg/errors
2025-06-04 12:16:38 -07:00
Peter Hunt
daae472fe1 e2e_node: verify restart looping container correctly
when a test is verifying a container has restarted, we use a continually exiting
container. Not verifying the number of restarts is less than (rather than equal) introduces
a race between the container restarting and the status observation.

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2025-06-04 13:27:50 -04:00
Kubernetes Prow Robot
1c56fff49b Merge pull request #132077 from ffromani/e2e-node-cgroup-v2-only
e2e: node: cpumanager: require cgroup v2
2025-06-03 12:10:46 -07:00
Kubernetes Prow Robot
9819f760f0 Merge pull request #131991 from SergeyKanzhelev/clarifyTheTokenScope
Clarified the token scope and future plans for the next security scan…
2025-06-03 10:02:38 -07:00
Francesco Romani
7e7aa6d810 e2e: node: cpumanager: require cgroup v2
in general, the rewritten e2e cpumanager test assume cgroup v2.
A limited set of these may be updated to work also with the
obsolete and declining cgroup v1, but these need to be reviewed
on test-by-test matter.

To fix test failures, we add a top level require for cgroup v2,
skipping otherwise. This will fix the red lanes while we review
the testcases and the deprecation plan of the other tests.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-06-03 18:22:48 +02:00
bzsuni
b9d9dea03f Update npd from v0.8.20 to v0.8.21
Signed-off-by: bzsuni <bingzhe.sun@daocloud.io>
2025-06-03 16:08:29 +08:00
Sergey Kanzhelev
a512de6e09 Clarified the token scope and future plans for the next security scan to refer to it 2025-06-02 16:53:10 +00:00
Stephen Kitt
545fbc99c2 test: drop dependency on github.com/pkg/errors
The package is unmaintained, and the tests don't rely on the
functionality it provides on top of Golang errors (stack traces).

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2025-06-02 11:27:09 +02:00
Kubernetes Prow Robot
bfb5c3781a Merge pull request #131794 from ffromani/e2e-serial-node-cpumanager-fix-lowcpu
e2e: node: always declare testcase CPU requirements
2025-05-28 11:36:16 -07:00
Francesco Romani
d7b6049099 e2e: node: always declare testcase CPU requirements
The PR https://github.com/kubernetes/kubernetes/pull/130274 rewrote the
cpumanager tests assuming there are always at least 4 online CPUs,
adding checks for the tests which require more.

We still have, and likely we will have for the time being, lanes
which run on machines with 2 online CPUs.

Thus, every test which either reserve cpus (--reserved-cpus) or run
pods with exclusive CPU allocation must declare the requisites
and skip if the machine don't provide them.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-05-21 09:22:33 +02:00
Ed Bartosh
b9e2a16083 e2e_node: dra: test plugin registration retry
Adds a DRA e2e_node test to verify that the kubelet plugin manager
retries plugin registration when the GetInfo call fails, and
successfully registers the plugin once GetInfo succeeds.

This ensures correct recovery and registration behavior for
DRA plugins in failure scenarios.
2025-05-16 21:53:35 +03:00
Ed Bartosh
ec7e732cbc e2e: dra: move gomega matchers to dedicated package
Moved gomega matcher definitions from test-driver/app
to a new test-driver/gomega package.
2025-05-15 20:55:17 +03:00
Morten Torkildsen
e262cccf23 Cleanup after rebase 2025-05-12 16:00:07 +00:00
Morten Torkildsen
ece35e5882 Update DRA e2e test framework to allow publishing advanced ResourceSlices 2025-05-12 15:56:24 +00:00
Kubernetes Prow Robot
48fcf418ce Merge pull request #131691 from pohly/dra-e2e-labeling
DRA E2E: revise test labeling
2025-05-12 05:43:16 -07:00
Patrick Ohly
b09d034a57 DRA E2E: revise test labeling
We only need one special "DynamicResourceAllocation" feature for the optional
node support of DRA (plugin registration, CDI support in the container
runtime). For individual features, the automatic labeling through
WithFeatureGate is sufficient.

To find DRA-related tests in a label filter, instead of plain-text "DRA" a
"DRA" label now gets added.

This change depends on an update of the DRA jobs.
2025-05-09 11:33:04 +02:00
Francesco Romani
13bd0b4ee8 e2e: node: rewrite the sidecar related tests
rewrite the tests porting to the new layout and utilities.
We may add more cases and better integration in the future.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-05-09 11:07:05 +02:00
Francesco Romani
f4265638be e2e: node: factor out reservedCPUs
now that we have a minimal BeforeEach (and let's keep it this way)
we can factor out the reservedCPUs setting, since it's the same
code for each testcase.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-05-09 11:07:04 +02:00
Francesco Romani
a8c8b0987d e2e: node: dissolve skipIfNotEnoughAllocatableCPUs
reuse existing building blocks at the cost of
a tiny, non-nested BeforeEach (which is still OK)
and some targeted duplication.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-05-09 11:07:04 +02:00
Francesco Romani
32d4724ab8 e2e: node: add comment about reserved CPU
Let's have a single general comment instead of
copypasting the same text all around the tests.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-05-09 11:07:04 +02:00
Francesco Romani
daf2fc7100 e2e: node: rewrite multi-pod tests
rewrite tests which check with multiple pods.
We may extend the coverage later on.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-05-09 11:07:04 +02:00
Francesco Romani
ccc662c228 e2e: node: initial multi-container tests
rewrite tests which exercise multiple container within the
same pod. Preserve the existing testcases, add more.

Note basic coverage for mixed pods - some containers requiring
exclusive CPUs, some not, was already added with the initial batch.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-05-09 11:07:04 +02:00
Francesco Romani
2419d9ccc5 e2e: node: rewrite: multi-cpus single-container pods
We have tests which cover the case on which a pod
with a single container require multiple CPUs;
rewrite them preserving the testcases and actually
adding coverage.

Add and use stricter checks along the way.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-05-09 11:07:04 +02:00
Francesco Romani
e4726719a7 e2e: node: rewrite more compatibility tests
Complete the rewrite the policy option compatibility tests,
rewriting the tests which check compatibility
between the `full-pcpus-only` and `distribute-cpus-across-numa`.

All testcases are preserved.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-05-09 11:07:04 +02:00
Francesco Romani
74fda8c700 e2e: node: rewrite compatibility tests
Rewrite the policy option compatibility tests.
We start with the tests which check the compatibility
between the `full-pcpus-only` and `strict-cpu-reservation`
tests, because the former is the only GA option
at time of writing.

All testcases are preserved.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-05-09 11:07:04 +02:00
Francesco Romani
dd3f9b6074 e2e: node: rewrite CFS quota tests
Rewrite the e2e cpumanager tests about CFS quota management.
All testcases are preserved.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-05-09 11:07:04 +02:00
Francesco Romani
b9ce058ab6 e2e: node: rewrite strict-cpu-reservation tests
rewrite the cpumanager e2e tests for the
`strict-cpu-reservation` policy option to fit
into the new layout.

All testcases are preserved.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-05-09 11:07:04 +02:00
Francesco Romani
3eb2e65fc3 e2e: node: rewrite cpumanager tests
Rewrite the cpumanager tests to make use of the lessons
learned, more modern idioms, remove obsolete assumptions
and in gneeral remove all the legacy which was accumulating
over the years.

The goal is to have a simpler, flatter and more maintenable
code layout, de-entangle the net of dependency,
making the tests more robust and easier to extend.

In short, this is all about maintainability. All the testcases
will be preserved, and few other can be added along the way.

Comments in the code will explain the code layout decisions
and tradeoff, and provide a good guide to add more tests
in the future.

Special care was added in order to maximize the isolation between
tests, at cost, in selected cases of a controlled and planned
code duplication.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-05-09 11:07:04 +02:00
Adrian Moisey
e5ffec242a Bump CNI to 1.7.1 2025-05-04 20:27:09 +02:00
Kubernetes Prow Robot
0b8133816b Merge pull request #131477 from pohly/golangci-lint@v2
golangci-lint v2
2025-05-02 23:03:55 -07:00
Patrick Ohly
9bada79de1 DRA node test: fix useless gomega.Consistently
Passing a constant value to gomega.Consistently means that it will not re-check
while running.

Found by linter after removing the suppression rule for the check. It was
disabled earlier because of a bug in the linter.
2025-05-02 12:51:02 +02:00
Kubernetes Prow Robot
ea08d4df93 Merge pull request #128946 from jackfrancis/SetPVCVACName-eventually-gomega
test: don't panic during an Eventually retry loop
2025-04-29 09:17:55 -07:00
Kubernetes Prow Robot
5a630b7289 Merge pull request #131042 from pacoxu/adjust-metric-range
adjust container_spec_memory_limit_bytes e2e to range: ppc64le is less
2025-04-23 15:58:51 -07:00