16144 Commits

Author SHA1 Message Date
Dan Winship
db01f94032 Remove patch/update from ServiceCIDR API conformance test
They were already listed in ineligible_endpoints.yaml, so we shouldn't
be testing them here anyway.
2025-08-20 09:19:55 -04:00
Kubernetes Prow Robot
5568780ca3 Merge pull request #133562 from nojnhuh/dra-e2e-slice-controller-flake
DRA: wait for stats to converge in "creates slices" e2e test
2025-08-19 04:21:36 -07:00
Kubernetes Prow Robot
29e16ff792 Merge pull request #133286 from yliaog/deviceplugin
added WithFlaky to the device plugin test case: supports extended resources together with ResourceClaim
2025-08-19 00:43:35 -07:00
Jon Huhn
bf6c86b562 DRA: wait for stats to converge in "creates slices" e2e test 2025-08-15 01:02:44 -05:00
yliao
edfa9a5bd2 added WithFlaky() to the device plugin test case: supports extended resources together with ResourceClaim 2025-08-13 17:57:33 +00:00
Roman Bednar
064b591617 improve CRD handling in VolumePopulator test
Test: provisioning should provision storage with any volume data source

During CSI certification test we observed that the test can fail
with a message:
"customresourcedefinitions.apiextensions.k8s.io \"volumepopulators.populator.storage.k8s.io\" already exists"

This is because the test does not consider that this CRD can be already
installed in the cluster.

The test was updated to handle the CRD better by creating it for the
duration of the test and removing it afterward. Otherwise, if the CRD
is already installed, the test will neither create nor remove it
2025-08-11 13:50:03 +02:00
yliao
b796918986 reduced numPods to 5 from 10 to fix flaky test (supports reusing resources) due to timeout. 2025-08-06 07:42:31 +00:00
Tim Allclair
01470d973b Fix memory limit decrease test on cri-o 2025-08-01 09:56:51 -07:00
Sunyanan Choochotkaew
7f052afaef KEP 5075: implement scheduler
Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>
2025-07-30 09:52:49 +09:00
yliao
23d6f73e72 extended resource backed by DRA: test 2025-07-29 18:55:28 +00:00
Kubernetes Prow Robot
656360f67c Merge pull request #133254 from HirazawaUi/fixinvalid-validation
Fix incorrect validation on the kubelet
2025-07-29 10:12:34 -07:00
Luiz Oliveira
7fbf63a23f HPA support for pod-level resource specifications (#132430)
* HPA support for pod-level resource specifications

* Add e2e tests for HPA support for pod-level resource specifications
2025-07-29 09:02:26 -07:00
Kubernetes Prow Robot
1ce98e3c09 Merge pull request #133267 from danwinship/apiserver-proxy-flake
Fix apiserver service proxying e2e test flakiness
2025-07-28 23:18:37 -07:00
Kubernetes Prow Robot
1ef7677415 Merge pull request #133220 from toVersus/fix/downward-api-e2e-failure
[PodLevelResources] Add missing label to Downward API test
2025-07-28 21:06:29 -07:00
Kubernetes Prow Robot
dd4e4f1dd1 Merge pull request #133262 from BenTheElder/no-authenticated-image-pulling
remove broken test that depends on expired credential, remove hardcoded credential, add TODOs
2025-07-28 17:28:28 -07:00
Dan Winship
f9bb14fcf0 Fix apiserver service proxying e2e test flakiness
Also, fix its conformance description, which appears to have
accidentally been filled in with the description of the wrong test.
2025-07-28 19:42:04 -04:00
Benjamin Elder
8ace0fb89f remove failing test that depends on expired credential, remove credential, add TODOs
see: https://github.com/kubernetes/kubernetes/issues/130271
2025-07-28 15:43:43 -07:00
Yuan Wang
4b479da4b5 Remove the feature from e2e test 2025-07-28 16:33:20 +00:00
HirazawaUi
6997fbd1ed Fix incorrect validation on the kubelet 2025-07-29 00:02:20 +08:00
Tsubasa Nagasawa
48fd30113c [PodLevelResources] Add missing label to Downward API test 2025-07-26 07:51:58 +09:00
Kubernetes Prow Robot
a493bafd02 Merge pull request #133156 from ritazh/draadminaccess-update-flake
DRAAdminAccess: move metrics test from e2e to integration
2025-07-25 13:40:27 -07:00
Patrick Ohly
40a90df3b3 DRA E2E: remove stress test
The test needs to schedule 256 pods at once, which only works with three
nodes (default limit is 100, but could also be lower). It's also a stress test
which flaked recently.

For now it gets removed without a replacement. A similar integration test is in
development, but too big (needs some infrastructure changes in
test/integration/dra) to add during code freeze.
2025-07-25 12:45:01 +02:00
Kubernetes Prow Robot
6d4ca967f7 Merge pull request #132824 from roycaihw/psi-pressure-test
Extend E2E test coverage for PSI metrics under pressure
2025-07-25 00:32:27 -07:00
Kubernetes Prow Robot
72f9a9260a Merge pull request #130606 from Jpsassine/dra_device_health_status
Expose DRA device health in PodStatus
2025-07-24 20:14:27 -07:00
Kubernetes Prow Robot
b09f1bfe12 Merge pull request #132902 from haircommander/userns-metrics
KEP-127: kubelet: add metrics for userns pods
2025-07-24 19:08:41 -07:00
Eddie
727a6e6db5 Reject pod when attachment limit is exceeded (#132933)
* Reject pod when attachment limit is exceeded

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Record admission rejection

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Fix pull-kubernetes-linter-hints

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Fix AD Controller unit test failure

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Consolidate error handling logic in WaitForAttachAndMount

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Improve error context

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Update admissionRejectionReasons to include VolumeAttachmentLimitExceededReason

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Update status message

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Add TestWaitForAttachAndMountVolumeAttachLimitExceededError unit test

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Add e2e test

Signed-off-by: Eddie Torres <torredil@amazon.com>

* Fix pull-kubernetes-linter-hints

Signed-off-by: Eddie Torres <torredil@amazon.com>

---------

Signed-off-by: Eddie Torres <torredil@amazon.com>
2025-07-24 17:58:54 -07:00
Kubernetes Prow Robot
26045b2fab Merge pull request #132642 from yuanwang04/restart-rules
Implement container restart policy rules
2025-07-24 16:44:51 -07:00
Kubernetes Prow Robot
bd7fb738bd Merge pull request #132605 from toVersus/feat/downward-api-plresources
[PodLevelResources] Update Downward API defaulting for resource limits
2025-07-24 16:44:42 -07:00
Kubernetes Prow Robot
7912e5fd67 Merge pull request #131549 from carlory/KEP-3751-GA
[Kep-3751] Promote VolumeAttributesClass to GA
2025-07-24 16:44:27 -07:00
John-Paul Sassine
b7de71f9ce feat(kubelet): Add ResourceHealthStatus for DRA pods
This change introduces the ability for the Kubelet to monitor and report
the health of devices allocated via Dynamic Resource Allocation (DRA).
This addresses a key part of KEP-4680 by providing visibility into
device failures, which helps users and controllers diagnose pod failures.

The implementation includes:
- A new `v1alpha1.NodeHealth` gRPC service with a `WatchResources`
  stream that DRA plugins can optionally implement.
- A health information cache within the Kubelet's DRA manager to track
  the last known health of each device and handle plugin disconnections.
- An asynchronous update mechanism that triggers a pod sync when a
  device's health changes.
- A new `allocatedResourcesStatus` field in `v1.ContainerStatus` to
  expose the device health information to users via the Pod API.

Update vendor

KEP-4680: Fix lint, boilerplate, and codegen issues

Add another e2e test, add TODO for KEP4680 & update test infra helpers

Add Feature Gate e2e test

Fixing presubmits

Fix var names, feature gating, and nits

Fix DRA Health gRPC API according to review feedback
2025-07-24 23:23:18 +00:00
Haowei Cai
252513a1b9 Add WithFeature and WithSerial, also check if cgroup v2 is used in test 2025-07-24 21:40:08 +00:00
Rita Zhang
c15a54f8c0 draadminaccess: move metrics test from e2e to integration
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
2025-07-24 14:08:14 -07:00
Kubernetes Prow Robot
d9538c2c8c Merge pull request #133163 from pohly/revert-133110-DRAAdminAccess_upgradedowngradetest
Revert "DRAAdminAccess: add upgrade downgrade test"
2025-07-24 13:05:11 -07:00
carlory
94bf8fc8a9 Promoted API VolumeAttributesClass and VolumeAttributesClassList to storage.k8s.io/v1.
Promoted feature-gate `VolumeAttributesClass` to GA (on by default)

Signed-off-by: carlory <baofa.fan@daocloud.io>
2025-07-25 01:53:59 +08:00
Yuan Wang
b34f8782e2 Add e2e tests 2025-07-24 16:49:54 +00:00
Kubernetes Prow Robot
b3d00a026d Merge pull request #132756 from ylink-lfs/ci/redis_removal
ci: redis removal for e2e test dependency simplicity
2025-07-24 09:38:42 -07:00
Kubernetes Prow Robot
b3e39344ff Merge pull request #132959 from ylink-lfs/test/e2e_named_port_con_case
test: add e2e case for mutating named port
2025-07-24 07:56:34 -07:00
Patrick Ohly
c954e13255 Revert "DRAAdminAccess: add upgrade downgrade test" 2025-07-24 14:04:08 +02:00
Patrick Ohly
24de875ceb DRA: graduate DynamicResourceAllocation feature to GA
It hasn't been on-by-default before, therefore it does not get locked to the
new default on yet. This has some impact on the scheduler configuration
because the plugin is now enabled by default.

Because the feature is now GA, it doesn't need to be a label on E2E tests,
which wouldn't be possible anyway once it gets removed entirely.
2025-07-24 08:33:56 +02:00
Patrick Ohly
5c4f81743c DRA: use v1 API
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.

However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
2025-07-24 08:33:45 +02:00
Kubernetes Prow Robot
051dd70772 Merge pull request #133149 from ritazh/draadminaccess-test
draadminaccess test make it serial
2025-07-23 19:56:55 -07:00
Kubernetes Prow Robot
6ad14ad876 Merge pull request #132991 from danwinship/endpoints-e2e-updates
Endpoints e2e updates for KEP-4974
2025-07-23 19:56:26 -07:00
Kubernetes Prow Robot
ca569e152d Merge pull request #132700 from pohly/dra-kubelet-grpc-v1
DRA kubelet: add v1 gRPC
2025-07-23 17:36:26 -07:00
Kubernetes Prow Robot
6ef2215eb7 Merge pull request #132558 from HirazawaUi/Implement-4762
KEP-4762: Allows setting any FQDN as the pod's hostname
2025-07-23 16:26:27 -07:00
Tsubasa Nagasawa
a82187cf11 [PodLevelResources] Update Downward API defaulting for resource limits
Currently, when container-level resource limits were not specified and
the Downward API was used to set environment variables referencing them,
the node's allocatable resources were used as the fallback.
With the introduction of the Pod Level Resources feature, this behavior
is updated: if container-level resource limits are not specified,
the Downward API now uses the pod-level resource limits instead.
If neither container-level nor pod-level resource limits are specified,
the behavior remains unchanged. It falls back to the node's allocatable
resources.

Signed-off-by: Tsubasa Nagasawa <toversus2357@gmail.com>
2025-07-24 08:15:36 +09:00
Kubernetes Prow Robot
49cd87182c Merge pull request #125271 from tssurya/psa-probe-lifecycle-handler-host-option
Add PSA for blocking `.host` on pod probes
2025-07-23 15:16:27 -07:00
Surya Seetharaman
4c87e60d0d Tests using .host field in probes must be at priviledged level
The sig-node tests have scenarios of doing probes and
lifecycle handler tests with post-start and pre-stop hooks
setting the host field to be another pod.

In baseline level such things won't be allowed because of
the PSA rules we are adding in this PR. So unsetting
the host field means it uses the podIP of self for doing
the checks and using that in the pre-stop and post-start
hooks is tricky because of the timing issues with when the
container is actually up v/s running the test.

So I have changed the tests to be privileded for them to
use the .host fields if they desire to.

See https://github.com/kubernetes/kubernetes/issues/133091
which is an issue opened to properly refactor these tests.

Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
2025-07-23 21:17:05 +02:00
Kubernetes Prow Robot
c41cc0a144 Merge pull request #129837 from danwinship/aggregated-apiserver-endpointslices
Port aggregated apiserver discovery to EndpointSlices
2025-07-23 10:30:28 -07:00
Dan Winship
765d84e9bf Test only EndpointSlices, not Endpoints, in dual-stack e2e tests
The dual-stack integration tests already validate that we get the
expected Endpoints for single- and dual-stack Services. There is no
further "end to end" testing needed for Endpoints, given that
everything in a normal cluster would look at EndpointSlices, not
Endpoints.
2025-07-23 13:19:07 -04:00
Kubernetes Prow Robot
e979287f29 Merge pull request #133117 from Phaow/bump-external-snapshotter
Bump external snapshotter for vgs tests
2025-07-23 09:22:27 -07:00