Merge pull request #42942 from vishh/gpu-cont-fix

Automatic merge from submit-queue (batch tested with PRs 42942, 42935)

[Bug] Handle container restarts and avoid using runtime pod cache while allocating GPUs

Fixes #42412

**Background**
Support for multiple GPUs is an experimental feature in v1.6. 
Container restarts were handled incorrectly which resulted in stranding of GPUs
Kubelet is incorrectly using runtime cache to track running pods which can result in race conditions (as it did in other parts of kubelet). This can result in same GPU being assigned to multiple pods.

**What does this PR do**
This PR tracks assignment of GPUs to containers and returns pre-allocated GPUs instead of (incorrectly) allocating new GPUs.
GPU manager is updated to consume a list of active pods derived from apiserver cache instead of runtime cache.
Node e2e has been extended to validate this failure scenario.

**Risk**
Minimal/None since support for GPUs is an experimental feature that is turned off by default. The code is also isolated to GPU manager in kubelet.

**Workarounds**
In the absence of this PR, users can mitigate the original issue by setting `RestartPolicyNever`  in their pods.
There is no workaround for the race condition caused by using the runtime cache though.
Hence it is worth including this fix in v1.6.0.

cc @jianzhangbjz @seelam @kubernetes/sig-node-pr-reviews 

Replaces #42560
This commit is contained in:
Kubernetes Submit Queue
2017-03-14 10:19:17 -07:00
committed by GitHub
7 changed files with 128 additions and 61 deletions

View File

@@ -185,6 +185,7 @@ pkg/kubelet/container
pkg/kubelet/envvars
pkg/kubelet/eviction
pkg/kubelet/eviction/api
pkg/kubelet/gpu/nvidia
pkg/kubelet/util/csr
pkg/kubelet/util/format
pkg/kubelet/util/ioutils