This ensures that ResourceSlices get removed also when a plugin becomes
unresponsive without removing the registration socket.
Tests are from https://github.com/kubernetes/kubernetes/pull/131073 by Ed
with some modifications, the implementation is new.
When the new RollingUpdate option is used, the DRA driver gets deployed such
that it uses unique socket paths and uses file locking to serialize gRPC
calls. This enables the kubelet to pick arbitrarily between two concurrently
instances. The handover is seamless (no downtime, no removal of ResourceSlices
by the kubelet).
For file locking, the fileutils package from etcd is used because that was
already a Kubernetes dependency. Unfortunately that package brings in some
additional indirect dependency for DRA drivers (zap, multierr), but those
seem acceptable.
The key difference is that the kubelet must remember all plugin instances
because it could always happen that the new instance dies and leaves only the
old one running.
The endpoints of each instance must be different. Registering a plugin with the
same endpoint as some other instance is not supported and triggers an error,
which should get reported as "not registered" to the plugin. This should only
happen when the kubelet missed some unregistration event and re-registers the
same instance again. The recovery in this case is for the plugin to shut down,
remove its socket, which should get observed by kubelet, and then try again
after a restart.
Currently, there are some unit tests that are failing on Windows due to
various reasons:
- paths not properly joined (filepath.Join should be used).
- Proxy Mode IPVS not supported on Windows.
- DeadlineExceeded can occur when trying to read data from an UDP
socket. This can be used to detect whether the port was closed or not.
- In Windows, with long file name support enabled, file names can have
up to 32,767 characters. In this case, the error
windows.ERROR_FILENAME_EXCED_RANGE will be encountered instead.
- files not closed, which means that they cannot be removed / renamed.
- time.Now() is not as precise on Windows, which means that 2
consecutive calls may return the same timestamp.
- path.Base() will return the same path. filepath.Base() should be used
instead.
- path.Join() will always join the paths with a / instead of the OS
specific separator. filepath.Join() should be used instead.
Currently, the plugin Watcher checks if a file is a socket or not by
running mode&os.ModeSocket != 0, which can't be True on Windows.
util.IsUnixDomainSocket should be used instead.
v1.43.0 marked grpc.WithInsecure() deprecated so this commit moves to use
what is the recommended replacement:
grpc.WithTransportCredentials(insecure.NewCredentials())
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Due to recent os.Stat() issue for Windows 20H2, added the logic to use
os.Lstat() if os.Stat() fails for Windows
Change-Id: Ic9fc07ecc6e9c2984e854ac77121ff61c8e0d041