Test: provisioning should provision storage with any volume data source
During CSI certification test we observed that the test can fail
with a message:
"customresourcedefinitions.apiextensions.k8s.io \"volumepopulators.populator.storage.k8s.io\" already exists"
This is because the test does not consider that this CRD can be already
installed in the cluster.
The test was updated to handle the CRD better by creating it for the
duration of the test and removing it afterward. Otherwise, if the CRD
is already installed, the test will neither create nor remove it
The test needs to schedule 256 pods at once, which only works with three
nodes (default limit is 100, but could also be lower). It's also a stress test
which flaked recently.
For now it gets removed without a replacement. A similar integration test is in
development, but too big (needs some infrastructure changes in
test/integration/dra) to add during code freeze.
* Reject pod when attachment limit is exceeded
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Record admission rejection
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Fix pull-kubernetes-linter-hints
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Fix AD Controller unit test failure
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Consolidate error handling logic in WaitForAttachAndMount
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Improve error context
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Update admissionRejectionReasons to include VolumeAttachmentLimitExceededReason
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Update status message
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Add TestWaitForAttachAndMountVolumeAttachLimitExceededError unit test
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Add e2e test
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Fix pull-kubernetes-linter-hints
Signed-off-by: Eddie Torres <torredil@amazon.com>
---------
Signed-off-by: Eddie Torres <torredil@amazon.com>
This change introduces the ability for the Kubelet to monitor and report
the health of devices allocated via Dynamic Resource Allocation (DRA).
This addresses a key part of KEP-4680 by providing visibility into
device failures, which helps users and controllers diagnose pod failures.
The implementation includes:
- A new `v1alpha1.NodeHealth` gRPC service with a `WatchResources`
stream that DRA plugins can optionally implement.
- A health information cache within the Kubelet's DRA manager to track
the last known health of each device and handle plugin disconnections.
- An asynchronous update mechanism that triggers a pod sync when a
device's health changes.
- A new `allocatedResourcesStatus` field in `v1.ContainerStatus` to
expose the device health information to users via the Pod API.
Update vendor
KEP-4680: Fix lint, boilerplate, and codegen issues
Add another e2e test, add TODO for KEP4680 & update test infra helpers
Add Feature Gate e2e test
Fixing presubmits
Fix var names, feature gating, and nits
Fix DRA Health gRPC API according to review feedback
It hasn't been on-by-default before, therefore it does not get locked to the
new default on yet. This has some impact on the scheduler configuration
because the plugin is now enabled by default.
Because the feature is now GA, it doesn't need to be a label on E2E tests,
which wouldn't be possible anyway once it gets removed entirely.
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.
However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
Currently, when container-level resource limits were not specified and
the Downward API was used to set environment variables referencing them,
the node's allocatable resources were used as the fallback.
With the introduction of the Pod Level Resources feature, this behavior
is updated: if container-level resource limits are not specified,
the Downward API now uses the pod-level resource limits instead.
If neither container-level nor pod-level resource limits are specified,
the behavior remains unchanged. It falls back to the node's allocatable
resources.
Signed-off-by: Tsubasa Nagasawa <toversus2357@gmail.com>
The sig-node tests have scenarios of doing probes and
lifecycle handler tests with post-start and pre-stop hooks
setting the host field to be another pod.
In baseline level such things won't be allowed because of
the PSA rules we are adding in this PR. So unsetting
the host field means it uses the podIP of self for doing
the checks and using that in the pre-stop and post-start
hooks is tricky because of the timing issues with when the
container is actually up v/s running the test.
So I have changed the tests to be privileded for them to
use the .host fields if they desire to.
See https://github.com/kubernetes/kubernetes/issues/133091
which is an issue opened to properly refactor these tests.
Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com>
The dual-stack integration tests already validate that we get the
expected Endpoints for single- and dual-stack Services. There is no
further "end to end" testing needed for Endpoints, given that
everything in a normal cluster would look at EndpointSlices, not
Endpoints.