Commit Graph

28168 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
b3d00a026d Merge pull request #132756 from ylink-lfs/ci/redis_removal
ci: redis removal for e2e test dependency simplicity
2025-07-24 09:38:42 -07:00
Kubernetes Prow Robot
a11bc701e8 Merge pull request #132457 from ania-borowiec/depends_on_cluster_move_podinfo
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler
2025-07-24 09:38:27 -07:00
Kubernetes Prow Robot
d21da29c9e Merge pull request #133170 from ffromani/e2e-node-podres-memmgr
e2e: podresources: disable memory manager integration
2025-07-24 07:56:48 -07:00
Kubernetes Prow Robot
b3e39344ff Merge pull request #132959 from ylink-lfs/test/e2e_named_port_con_case
test: add e2e case for mutating named port
2025-07-24 07:56:34 -07:00
Mayuka Channankaiah
ffe306d679 client-go, kubectl: Replace deprecated ErrWaitTimeout with recommended method (#132718)
* client-go: Replace depracted ErrWaitTimeout with recommended method

* Fix UT and Integration tests

* IT test
2025-07-24 07:56:27 -07:00
Ania Borowiec
aecd37e6fb Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler 2025-07-24 12:10:58 +00:00
Francesco Romani
449763fb11 e2e: podresources: disable memory manager integration
As part of the PR 132028 we added more e2e test coverage to validate
the fix, and check as much as possible there are no regressions.

The issue and the fix become evident largely when inspecting
memory allocation with the Memory Manager static policy enabled.
Quoting the commit message of bc56d0e45a
```
The podresources API List implementation uses the internal data of the
resource managers as source of truth.
Looking at the implementation here:
https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/pkg/kubelet/apis/podresources/server_v1.go#L60
we take care of syncing the device allocation data before querying the
device manager to return its pod->devices assignment.
This is needed because otherwise the device manager (and all the other
resource managers) would do the cleanup asynchronously, so the `List` call
will return incorrect data.

But we don't do this syncing neither for CPUs or for memory,
so when we report these we will get stale data as the issue #132020 demonstrates.

For CPU manager, we however have the reconcile loop which cleans the stale data periodically.
Turns out this timing interplay was actually the reason the existing issue #119423 seemed fixed
(see: #119423 (comment)).
But it's actually timing. If in the reproducer we set the `cpuManagerReconcilePeriod` to a time
very high (>= 5 minutes), then the issue still reproduces against current master branch
(https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/test/e2e_node/podresources_test.go#L983).
```

The missing actor here is memory manager. Memory manager has no
reconcile loop (implicit fixing the stale data problem) no explicit
synchronization, so it is the unlucky one which reported stale data,
leading to the eventual understanding of the problem.

For this reason it was (and still is) important to exercise it during
the test.
Turns out the test is however wrong, likely because a hidden dependency
between the test expectations and the lane configuration (notably
machine specs), so we disable the memory manager activation for the time
being, until we figure out a safe way to enable it.

Note this significantly weakens the signal for this specific test.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-07-24 12:35:45 +02:00
Patrick Ohly
b768c1d1d5 DRA API: bump storage version to v1beta2
This avoids the overhead for the more complex conversion to v1beta1 and might
make it a bit more realistic to get rid of the v1beta1 eventually.

The expected GVK must be set explicitly because when emulating 1.33,
v1beta1 is the default although the fixed storage version is v1beta2.
2025-07-24 08:33:56 +02:00
Patrick Ohly
24de875ceb DRA: graduate DynamicResourceAllocation feature to GA
It hasn't been on-by-default before, therefore it does not get locked to the
new default on yet. This has some impact on the scheduler configuration
because the plugin is now enabled by default.

Because the feature is now GA, it doesn't need to be a label on E2E tests,
which wouldn't be possible anyway once it gets removed entirely.
2025-07-24 08:33:56 +02:00
Patrick Ohly
21d929f599 integration: use --runtime-config-emulation-forward-compatible
Some tests do version emulation and need the DRA feature. In that combination
the --runtime-config-emulation-forward-compatible option is needed to allow
enabling the V1 API although it's only available in 1.34.
2025-07-24 08:33:56 +02:00
Patrick Ohly
5c4f81743c DRA: use v1 API
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.

However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
2025-07-24 08:33:45 +02:00
Patrick Ohly
cff91579e8 DRA API: v1 registration + tests 2025-07-24 08:30:25 +02:00
Kubernetes Prow Robot
01c5535387 Merge pull request #133085 from ritazh/DRAAdminAccess_beta
DRAAdminAccess: move to beta
2025-07-23 21:44:34 -07:00
Simran Kaur
c7d6c09683 List available endpoints for kube-apiserver (#132581)
Fix tests and formatting

Use ListedPaths for finding useful endpoints

Fix maps import

Update dependencies

Fix lint

Add option to pass listedpaths

Remove apiserver component check

Install statuz in genericapiserver

Register zpagesfeatures

Fix import order

Avoid adding non-debugging endpoints

Fix tests

Fix tests

fix tests

Sort paths

Sort in-place

Copy paths before sorting

Fix string initialization

Move sorting to later stage

Fix imports
2025-07-23 21:44:27 -07:00
Kubernetes Prow Robot
051dd70772 Merge pull request #133149 from ritazh/draadminaccess-test
draadminaccess test make it serial
2025-07-23 19:56:55 -07:00
Kubernetes Prow Robot
dd6fa8bafd Merge pull request #133129 from ffromani/podres-get-add-tests
node: podresources: improve test coverage for the `Get` endpoint
2025-07-23 19:56:40 -07:00
Kubernetes Prow Robot
6ad14ad876 Merge pull request #132991 from danwinship/endpoints-e2e-updates
Endpoints e2e updates for KEP-4974
2025-07-23 19:56:26 -07:00
Kubernetes Prow Robot
ca569e152d Merge pull request #132700 from pohly/dra-kubelet-grpc-v1
DRA kubelet: add v1 gRPC
2025-07-23 17:36:26 -07:00
Kubernetes Prow Robot
6ef2215eb7 Merge pull request #132558 from HirazawaUi/Implement-4762
KEP-4762: Allows setting any FQDN as the pod's hostname
2025-07-23 16:26:27 -07:00
Kubernetes Prow Robot
49cd87182c Merge pull request #125271 from tssurya/psa-probe-lifecycle-handler-host-option
Add PSA for blocking `.host` on pod probes
2025-07-23 15:16:27 -07:00
Surya Seetharaman
4c87e60d0d Tests using .host field in probes must be at priviledged level
The sig-node tests have scenarios of doing probes and
lifecycle handler tests with post-start and pre-stop hooks
setting the host field to be another pod.

In baseline level such things won't be allowed because of
the PSA rules we are adding in this PR. So unsetting
the host field means it uses the podIP of self for doing
the checks and using that in the pre-stop and post-start
hooks is tricky because of the timing issues with when the
container is actually up v/s running the test.

So I have changed the tests to be privileded for them to
use the .host fields if they desire to.

See https://github.com/kubernetes/kubernetes/issues/133091
which is an issue opened to properly refactor these tests.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
2025-07-23 21:17:05 +02:00
Kubernetes Prow Robot
c41cc0a144 Merge pull request #129837 from danwinship/aggregated-apiserver-endpointslices
Port aggregated apiserver discovery to EndpointSlices
2025-07-23 10:30:28 -07:00
Dan Winship
765d84e9bf Test only EndpointSlices, not Endpoints, in dual-stack e2e tests
The dual-stack integration tests already validate that we get the
expected Endpoints for single- and dual-stack Services. There is no
further "end to end" testing needed for Endpoints, given that
everything in a normal cluster would look at EndpointSlices, not
Endpoints.
2025-07-23 13:19:07 -04:00
Kubernetes Prow Robot
e979287f29 Merge pull request #133117 from Phaow/bump-external-snapshotter
Bump external snapshotter for vgs tests
2025-07-23 09:22:27 -07:00
Rita Zhang
61cc6cf807 draadminaccess test make it serial
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
2025-07-23 09:13:13 -07:00
Dan Winship
33b45c8383 Update "should proxy through a service and a pod" to look at EndpointSlices 2025-07-23 11:07:29 -04:00
HirazawaUi
8d65e1e98e Add e2e tests. 2025-07-23 22:57:11 +08:00
Kubernetes Prow Robot
49af85d86a Merge pull request #133110 from ritazh/DRAAdminAccess_upgradedowngradetest
DRAAdminAccess: add upgrade downgrade test
2025-07-23 07:08:28 -07:00
HirazawaUi
af6c97bd14 add Feature Gate. 2025-07-23 20:28:13 +08:00
Kubernetes Prow Robot
9148694d4b Merge pull request #133140 from KobayashiD27/fix-integ-dra-testFilterTimeout
DRA Updated to not directly change the global variable `claim`
2025-07-23 04:58:33 -07:00
Kubernetes Prow Robot
0d5921171a Merge pull request #133105 from togettoyou/scheduler_perf_132495
KEP-5229: Run Unschedulable scheduler_perf test case with SchedulerAsyncAPICalls feature gate enabled
2025-07-23 04:58:27 -07:00
Patrick Ohly
f6061605fb DRA E2E: run multi-node control plane tests also with two nodes
The tests should work also with only two nodes, which is the minimum required
for conformance testing.
2025-07-23 09:12:46 +02:00
Patrick Ohly
f0e2920898 DRA E2E: simplify "control plane" test names
There's no need to clarify how many nodes are used in the test because the
overall test names are still unique without that (verified with go test -v
./test/e2e -args -list-tests | grep -w DRA | wc -l).
2025-07-23 09:10:45 +02:00
Patrick Ohly
603751ee80 DRA E2E: remove redundant test
"must be possible for the driver to update the ResourceClaim.Status.Devices
once allocated" was also run as kubelet test although it only checks the
control plane.

Before:
    [sig-node] [DRA] [FeatureGate:DynamicResourceAllocation] [Beta] [Feature:OffByDefault] control plane with single node [ConformanceCandidate] must be possible for the driver to update the ResourceClaim.Status.Devices once allocated [FeatureGate:DRAResourceClaimDeviceStatus] [Beta]
    [sig-node] [DRA] [FeatureGate:DynamicResourceAllocation] [Beta] [Feature:OffByDefault] kubelet [Feature:DynamicResourceAllocation] on single node must be possible for the driver to update the ResourceClaim.Status.Devices once allocated [FeatureGate:DRAResourceClaimDeviceStatus] [Beta]

After:
    [sig-node] [DRA] [FeatureGate:DynamicResourceAllocation] [Beta] [Feature:OffByDefault] control plane with single node [ConformanceCandidate] must be possible for the driver to update the ResourceClaim.Status.Devices once allocated [FeatureGate:DRAResourceClaimDeviceStatus] [Beta]
2025-07-23 09:10:45 +02:00
Kobayashi,Daisuke
61bd5789be Updated to not directly change the global variable claim 2025-07-23 03:44:48 +00:00
ylink-lfs
4f0a5771ab test: add e2e case for mutating named port 2025-07-23 10:19:20 +08:00
Kubernetes Prow Robot
3e3f43f4b8 Merge pull request #132537 from lalitc375/hpa-validation
add validation logic for APIVersion fields of HPA
2025-07-22 19:04:27 -07:00
Kubernetes Prow Robot
4676341457 Merge pull request #133065 from natasha41575/dedupe-resize-test
dedupe fetching allocatable and available resources in node test
2025-07-22 17:56:27 -07:00
Kubernetes Prow Robot
aee92cd6c3 Merge pull request #132968 from wongchar/uncore-e2e-beta
cpumanager: expand test coverage for prefer-align-cpus-by-uncore-cache
2025-07-22 13:40:50 -07:00
Lalit Chauhan
f6aee63690 add validation logic for APIVersion fields of HPA
New validation logic follows the API ratcheting principle,  will not be executed for already stored invalid if the corresponding fields or item in array is not modified. Please enter the commit message for your changes. Lines starting
2025-07-22 20:40:48 +00:00
Bing Hongtao
6f3b6b91f0 KEP-3721: Support for env files (#132626)
* Add FileKeyRef field and struct to the Pod API

* Add the implementation code in the kubelet.

* Add validation code

* Add basic functionality e2e tests

* add codes for drop disabled pod fields

* update go.mod
2025-07-22 13:40:42 -07:00
Kubernetes Prow Robot
08362f0650 Merge pull request #132429 from torredil/kep4876-beta
Promote sig-storage feature `MutableCSINodeAllocatableCount` to Beta
2025-07-22 13:40:34 -07:00
Kubernetes Prow Robot
2c9bb9e27b Merge pull request #133107 from pohly/dra-e2e-conformance-candidates
DRA E2E: revisit conformance classification of tests
2025-07-22 12:32:35 -07:00
Kubernetes Prow Robot
7bf8066a58 Merge pull request #133042 from rzlink/winoverlay
[KEP-5100] WinOverlay feature gate to GA
2025-07-22 12:32:27 -07:00
Rita Zhang
216f7485bd DRAAdminAccess: add upgrade downgrade test
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
2025-07-22 11:54:34 -07:00
Kubernetes Prow Robot
52bc7515ca Merge pull request #132108 from rzlink/windsr
[KEP-5100] WinDSR feature gate to GA
2025-07-22 11:04:33 -07:00
Francesco Romani
303a7056ff e2e: node: podresources: enable multi-container tests
fix the utilities to enable multi-app-container tests,
which were previously quite hard to implement.

Add a consumer of the new utility to demonstrate the usage
and to initiate the basic coverage.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-07-22 19:58:29 +02:00
Francesco Romani
38a9a8a59d e2e: node: podresources: add tests for missing pod
add a e2e test to ensure that if the Get endpoint is asked
about a non-existing pod, it returns error.
Likewise, add a e2e test for terminated pods, which should
not be returned because they don't consume nor hold resources,
much like `List` does.

The expected usage patterns is to iterate over the list of
pods returned by `List`, but nevertheless the endpoint must
handle this case.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-07-22 19:55:09 +02:00
Charles Wong
545b36ba29 fix uncore e2e check 2025-07-22 09:50:18 -05:00
Junhao Zou
ce2d979390 Run Unschedulable scheduler_perf test case with SchedulerAsyncAPICalls feature gate enabled 2025-07-22 17:35:48 +08:00