Files
kubernetes/hack/update-codegen.sh
John-Paul Sassine b7de71f9ce feat(kubelet): Add ResourceHealthStatus for DRA pods
This change introduces the ability for the Kubelet to monitor and report
the health of devices allocated via Dynamic Resource Allocation (DRA).
This addresses a key part of KEP-4680 by providing visibility into
device failures, which helps users and controllers diagnose pod failures.

The implementation includes:
- A new `v1alpha1.NodeHealth` gRPC service with a `WatchResources`
  stream that DRA plugins can optionally implement.
- A health information cache within the Kubelet's DRA manager to track
  the last known health of each device and handle plugin disconnections.
- An asynchronous update mechanism that triggers a pod sync when a
  device's health changes.
- A new `allocatedResourcesStatus` field in `v1.ContainerStatus` to
  expose the device health information to users via the Pod API.

Update vendor

KEP-4680: Fix lint, boilerplate, and codegen issues

Add another e2e test, add TODO for KEP4680 & update test infra helpers

Add Feature Gate e2e test

Fixing presubmits

Fix var names, feature gating, and nits

Fix DRA Health gRPC API according to review feedback
2025-07-24 23:23:18 +00:00

38 KiB
Executable File