mirror of
https://github.com/outbackdingo/kubernetes.git
synced 2026-02-25 12:20:29 +00:00
Automatic merge from submit-queue CRI: Handle cri in-place upgrade Fixes https://github.com/kubernetes/kubernetes/issues/40051. ## How does this PR restart/remove legacy containers/sandboxes? With this PR, dockershim will convert and return legacy containers and infra containers as regular containers/sandboxes. Then we can rely on the SyncPod logic to stop the legacy containers/sandboxes, and the garbage collector to remove the legacy containers/sandboxes. To forcibly trigger restart: * For infra containers, we manually set `hostNetwork` to opposite value to trigger a restart (See [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L389)) * For application containers, they will be restarted with the infra container. ## How does this PR avoid extra overhead when there is no legacy container/sandbox? For the lack of some labels, listing legacy containers needs extra `docker ps`. We should not introduce constant performance regression for legacy container cleanup. So we added the `legacyCleanupFlag`: * In `ListContainers` and `ListPodSandbox`, only do extra `ListLegacyContainers` and `ListLegacyPodSandbox` when `legacyCleanupFlag` is `NotDone`. * When dockershim starts, it will check whether there are legacy containers/sandboxes. * If there are none, it will mark `legacyCleanupFlag` as `Done`. * If there are any, it will leave `legacyCleanupFlag` as `NotDone`, and start a goroutine periodically check whether legacy cleanup is done. This makes sure that there is overhead only when there are legacy containers/sandboxes not cleaned up yet. ## Caveats * In-place upgrade will cause kubelet to restart all running containers. * RestartNever container will not be restarted. * Garbage collector sometimes keep the legacy containers for a long time if there aren't too many containers on the node. In that case, dockershim will keep performing extra `docker ps` which introduces overhead. * Manually remove all legacy containers will fix this. * Should we garbage collect legacy containers/sandboxes in dockershim by ourselves? /cc @yujuhong * Host port will not be reclaimed for the lack of checkpoint for legacy sandboxes. https://github.com/kubernetes/kubernetes/pull/39903 /cc @freehan /cc @yujuhong @feiskyer @dchen1107 @kubernetes/sig-node-api-reviews **Release note**: ```release-note We should mention the caveats of in-place upgrade in release note. ```