mirror of
https://github.com/outbackdingo/kubernetes.git
synced 2026-01-27 10:19:35 +00:00
This change introduces the ability for the Kubelet to monitor and report the health of devices allocated via Dynamic Resource Allocation (DRA). This addresses a key part of KEP-4680 by providing visibility into device failures, which helps users and controllers diagnose pod failures. The implementation includes: - A new `v1alpha1.NodeHealth` gRPC service with a `WatchResources` stream that DRA plugins can optionally implement. - A health information cache within the Kubelet's DRA manager to track the last known health of each device and handle plugin disconnections. - An asynchronous update mechanism that triggers a pod sync when a device's health changes. - A new `allocatedResourcesStatus` field in `v1.ContainerStatus` to expose the device health information to users via the Pod API. Update vendor KEP-4680: Fix lint, boilerplate, and codegen issues Add another e2e test, add TODO for KEP4680 & update test infra helpers Add Feature Gate e2e test Fixing presubmits Fix var names, feature gating, and nits Fix DRA Health gRPC API according to review feedback
38 KiB
Executable File
38 KiB
Executable File