Since the GA graduation of memory manager in https://github.com/kubernetes/kubernetes/pull/128517
we are sharing the initial container map across managers.
The intention of this sharing was not to actually share a data
structure, but
1. save the relatively expensive relisting from runtime
2. have all the managers share a consistent view - even though the
chance for misalignement tend to be tiny.
The unwanted side effect though is now all the managers race
to modify a data shared, not thread safe data structure.
The fix is to clone (deepcopy) the computed map when passing it
to each manager. This restores the old semantic of the code.
This issue brings the topic of possibly managers go out of sync
since each of them maintain a private view of the world.
This risk is real, yet this is how the code worked for
most of the lifetime, so the plan is to look at this and evaluate
possible improvements later on.
Signed-off-by: Francesco Romani <fromani@redhat.com>
We are still only calling NodeExpand after the volume is mounted.
avoid depending on ASW from dswp.findAndAddNewPods(). It is weird to determine desired state based on actual state.
Reusing types from the alpha in the beta made it possible to provide and use
both versions without conversion. The downside was that removal of the alpha
would have been harder, if not impossible. DRA drivers could continue to
use the alpha types and provided the beta interface automatically.
Now the two versions are completely separate gRPC APIs, although in practice
there are no differences besides the name. Support for the alpha API in kubelet
is provided via automatically generated conversion and manually written
interface wrappers.
Those are provided as part of the v1alpha4 package. The advantage of having all
of that in a central place is that it'll be easier to remove when no longer
needed.
Listing supported gRPC services (e.g. drav1alpha3.Node, drav1beta1.DRAPlugin)
during registration enables the kubelet to determine in advance which methods
it can call.
Versioning by Kubernetes release makes less sense because it doesn't say
anything about which gRPC service is supported. New ones might get added and
obsolete ones removed. Some services might be optional.
In the past, this versioning support wasn't really used. At least one version
had to be provided and kubelet tried to use the plugin with the highest
version. This version comparison gets dropped. In the unlikely situation
that different plugins register under the same name, the most recent one is
used.
Because advertising gRPC services is a new convention, plugins only reporting
some version are treated as providing the old alpha gRPC service.
Rename old CreateVolumeSpec to CreateVolumeSpecWithNodeMigration that
extracts volume.Spec with node specific CSI migration.
Add CreateVolumeSpec that does the same, only without evaluating node CSI
migration.
The v1beta1 API is identical to the previous v1alpha4, which erroneously was
still called "v1alpha3" in a few places, including the gRPC interface
definition itself.
The only reason for v1beta1 is to document the increased maturity of this API.
To simplify the transition, kubelet supports both v1alpha4 and v1beta1, picking
the more recent one automatically. All that DRA driver authors need to do to
implement v1beta1 is to update to the latest
k8s.io/dynamic-resource-allocation/kubeletplugin: it will automatically
register both API versions unless explicitly configured otherwise, which is
mostly just for testing.
DRA driver authors may replace their package import of v1alpha4 with v1beta1,
but they don't have to because the types in both packages are the same.