The test needs to schedule 256 pods at once, which only works with three
nodes (default limit is 100, but could also be lower). It's also a stress test
which flaked recently.
For now it gets removed without a replacement. A similar integration test is in
development, but too big (needs some infrastructure changes in
test/integration/dra) to add during code freeze.
* Reject pod when attachment limit is exceeded
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Record admission rejection
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Fix pull-kubernetes-linter-hints
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Fix AD Controller unit test failure
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Consolidate error handling logic in WaitForAttachAndMount
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Improve error context
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Update admissionRejectionReasons to include VolumeAttachmentLimitExceededReason
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Update status message
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Add TestWaitForAttachAndMountVolumeAttachLimitExceededError unit test
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Add e2e test
Signed-off-by: Eddie Torres <torredil@amazon.com>
* Fix pull-kubernetes-linter-hints
Signed-off-by: Eddie Torres <torredil@amazon.com>
---------
Signed-off-by: Eddie Torres <torredil@amazon.com>
* apiextensions: Treat whitespace-only caBundle as empty for webhook client config and validation
- Updates webhookClientConfigForCRD to treat caBundle values containing only whitespace as empty, ensuring system trust roots are used in this case.
- Updates ValidateCABundle to treat whitespace-only caBundle as valid, consistent with empty or nil values.
- Adds/updates unit tests to verify that whitespace-only caBundle is handled equivalently to empty or nil.
- Ensures consistent and user-friendly handling of caBundle across CRD conversion webhook configuration and validation.
* Revert validation logic
* Add integration test for webhook bypass
* Fix linting
This change introduces the ability for the Kubelet to monitor and report
the health of devices allocated via Dynamic Resource Allocation (DRA).
This addresses a key part of KEP-4680 by providing visibility into
device failures, which helps users and controllers diagnose pod failures.
The implementation includes:
- A new `v1alpha1.NodeHealth` gRPC service with a `WatchResources`
stream that DRA plugins can optionally implement.
- A health information cache within the Kubelet's DRA manager to track
the last known health of each device and handle plugin disconnections.
- An asynchronous update mechanism that triggers a pod sync when a
device's health changes.
- A new `allocatedResourcesStatus` field in `v1.ContainerStatus` to
expose the device health information to users via the Pod API.
Update vendor
KEP-4680: Fix lint, boilerplate, and codegen issues
Add another e2e test, add TODO for KEP4680 & update test infra helpers
Add Feature Gate e2e test
Fixing presubmits
Fix var names, feature gating, and nits
Fix DRA Health gRPC API according to review feedback
Pod level hugepage resources are not propagated to the containers, only pod level cgroup values are propagated to the containers when they do not specify hugepage resources.
The hugepage aggregated container limits cannot be greater than pod-level limits.
This was already enforced with the defaulted requests from the specfied
limits, however it did not make it clear about both hugepage requests and limits.