This ensures that ResourceSlices get removed also when a plugin becomes
unresponsive without removing the registration socket.
Tests are from https://github.com/kubernetes/kubernetes/pull/131073 by Ed
with some modifications, the implementation is new.
When the new RollingUpdate option is used, the DRA driver gets deployed such
that it uses unique socket paths and uses file locking to serialize gRPC
calls. This enables the kubelet to pick arbitrarily between two concurrently
instances. The handover is seamless (no downtime, no removal of ResourceSlices
by the kubelet).
For file locking, the fileutils package from etcd is used because that was
already a Kubernetes dependency. Unfortunately that package brings in some
additional indirect dependency for DRA drivers (zap, multierr), but those
seem acceptable.
The key difference is that the kubelet must remember all plugin instances
because it could always happen that the new instance dies and leaves only the
old one running.
The endpoints of each instance must be different. Registering a plugin with the
same endpoint as some other instance is not supported and triggers an error,
which should get reported as "not registered" to the plugin. This should only
happen when the kubelet missed some unregistration event and re-registers the
same instance again. The recovery in this case is for the plugin to shut down,
remove its socket, which should get observed by kubelet, and then try again
after a restart.
Currently, when a Kubelet Plugin is being added in the DesiredStateOfWorld,
a timestamp is saved in the PluginInfo. This timestamp is then updated on
subsequent plugin reregistrations.
The Reconciler, when it detects different timestamps for a Plugin in its
DesiredStateOfWorld and ActualStateOfWorld, it will then trigger a Plugin unregister
and then a new Plugin registration.
Basically, the timestamp is being used to detect whether or not a Plugin needs to
be reregistered or not. However, this can be an issue on Windows, where the time
measurements are not as fine-grained. time.Now() calls within the same ~1-15ms
window will have the same timestamp. This can mean that Plugin Reregistration events
can be missed on Windows [1]. Because of this, some of the Plugin registration unit
tests fail on Windows.
This commit updates the behaviour, instead of relying on different timestamps,
the Reconciler will check the set PluginInfo UUID to detect a Plugin Reregistration.
With this change, the unit tests mentioned above will also pass on Windows.
[1] https://github.com/golang/go/issues/8687
Currently, there are some unit tests that are failing on Windows due to
various reasons:
- paths not properly joined (filepath.Join should be used).
- Proxy Mode IPVS not supported on Windows.
- DeadlineExceeded can occur when trying to read data from an UDP
socket. This can be used to detect whether the port was closed or not.
- In Windows, with long file name support enabled, file names can have
up to 32,767 characters. In this case, the error
windows.ERROR_FILENAME_EXCED_RANGE will be encountered instead.
- files not closed, which means that they cannot be removed / renamed.
- time.Now() is not as precise on Windows, which means that 2
consecutive calls may return the same timestamp.
- path.Base() will return the same path. filepath.Base() should be used
instead.
- path.Join() will always join the paths with a / instead of the OS
specific separator. filepath.Join() should be used instead.
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Currently, the plugin Watcher checks if a file is a socket or not by
running mode&os.ModeSocket != 0, which can't be True on Windows.
util.IsUnixDomainSocket should be used instead.
v1.43.0 marked grpc.WithInsecure() deprecated so this commit moves to use
what is the recommended replacement:
grpc.WithTransportCredentials(insecure.NewCredentials())
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Due to recent os.Stat() issue for Windows 20H2, added the logic to use
os.Lstat() if os.Stat() fails for Windows
Change-Id: Ic9fc07ecc6e9c2984e854ac77121ff61c8e0d041