node: cm: don't share containerMap instances between managers

Since the GA graduation of memory manager in https://github.com/kubernetes/kubernetes/pull/128517
we are sharing the initial container map across managers.

The intention of this sharing was not to actually share a data
structure, but
1. save the relatively expensive relisting from runtime
2. have all the managers share a consistent view - even though the
   chance for misalignement tend to be tiny.

The unwanted side effect though is now all the managers race
to modify a data shared, not thread safe data structure.

The fix is to clone (deepcopy) the computed map when passing it
to each manager. This restores the old semantic of the code.

This issue brings the topic of possibly managers go out of sync
since each of them maintain a private view of the world.
This risk is real, yet this is how the code worked for
most of the lifetime, so the plan is to look at this and evaluate
possible improvements later on.

Signed-off-by: Francesco Romani <fromani@redhat.com>
This commit is contained in:
Francesco Romani
2024-11-07 11:32:31 +01:00
parent 09e5e6269a
commit 2a99bfc3d1
4 changed files with 100 additions and 12 deletions

View File

@@ -575,13 +575,13 @@ func (cm *containerManagerImpl) Start(ctx context.Context, node *v1.Node,
}
// Initialize CPU manager
err := cm.cpuManager.Start(cpumanager.ActivePodsFunc(activePods), sourcesReady, podStatusProvider, runtimeService, containerMap)
err := cm.cpuManager.Start(cpumanager.ActivePodsFunc(activePods), sourcesReady, podStatusProvider, runtimeService, containerMap.Clone())
if err != nil {
return fmt.Errorf("start cpu manager error: %w", err)
}
// Initialize memory manager
err = cm.memoryManager.Start(memorymanager.ActivePodsFunc(activePods), sourcesReady, podStatusProvider, runtimeService, containerMap)
err = cm.memoryManager.Start(memorymanager.ActivePodsFunc(activePods), sourcesReady, podStatusProvider, runtimeService, containerMap.Clone())
if err != nil {
return fmt.Errorf("start memory manager error: %w", err)
}
@@ -643,7 +643,7 @@ func (cm *containerManagerImpl) Start(ctx context.Context, node *v1.Node,
}
// Starts device manager.
if err := cm.deviceManager.Start(devicemanager.ActivePodsFunc(activePods), sourcesReady, containerMap, containerRunningSet); err != nil {
if err := cm.deviceManager.Start(devicemanager.ActivePodsFunc(activePods), sourcesReady, containerMap.Clone(), containerRunningSet); err != nil {
return err
}