kubernetes

mirror of https://github.com/optim-enterprises-bv/kubernetes.git synced 2025-11-26 19:35:10 +00:00

Author	SHA1	Message	Date
rongfu.leng	d04a54c50b	optimize code, filter podUID is empty string Signed-off-by: rongfu.leng <lenronfu@gmail.com>	2024-09-13 01:48:14 +00:00
Kubernetes Prow Robot	14f2cab4de	Merge pull request #126976 from jsturtevant/socket-file-revert Revert "fix: handle socket file detection on Windows"	2024-09-03 18:31:16 +01:00
Kubernetes Prow Robot	a4ec0c039a	Merge pull request #126435 from bart0sh/PR151-Kubelet-devicemanager-stop-using-CDI-annotations Kubelet: stop using CDI annotations	2024-08-29 16:49:30 +01:00
James Sturtevant	3ca610757e	Revert "fix: handle socket file detection on Windows" This reverts commit `4060ee60c1`.	2024-08-28 10:31:58 -07:00
Ed Bartosh	ea3c6628b7	Kubelet: stop using CDI annotations Removing setting CDI annotations by the device manager as CRI field CDIDevices is mature enough to be used instead.	2024-07-29 18:26:27 +03:00
Paco Xu	78d3830d97	ignore order of containers status allocated resources	2024-07-29 16:48:00 +08:00
Sergey Kanzhelev	62f96d2748	set AllocatedResourcesStatus in the Pod Status	2024-07-24 00:29:35 +00:00
Kubernetes Prow Robot	9196650533	Merge pull request #123819 from fakecore/fc/master fix: handle socket file detection on Windows	2024-07-18 00:53:16 -07:00
Matthieu MOREL	f014b754fb	fix: enable empty and len rules from testifylint on pkg package Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com> Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>	2024-07-06 23:15:43 +00:00
Ed Bartosh	f53991d111	kube_features: DevicePluginCDIDevices: LockToDefault	2024-06-25 16:14:48 +03:00
Kubernetes Prow Robot	a8d51f4f05	Use a generic Set instead of a specified Set in kubelet Signed-off-by: bzsuni <bingzhe.sun@daocloud.io>	2024-06-04 14:25:43 +08:00
Kubernetes Prow Robot	1fd835ce59	Merge pull request #123398 from ffromani/remove-legacy-checkpoint node: devicemgr: remove obsolete pre-1.20 checkpoint file support	2024-04-29 14:46:53 -07:00
Marek Siarkowicz	3ee8178768	Cleanup defer from SetFeatureGateDuringTest function call	2024-04-24 20:25:29 +02:00
Francesco Romani	181fb0da51	node: devicemgr: remove obsolete pre-1.20 checkpoint file support In commit `2f426fdba6` we added compatibility (and tests) to deal with pre-1.20 checkpoint files. We are now well past the end of support for pre-1.20 kubelets, so we can get rid of this code. Signed-off-by: Francesco Romani <fromani@redhat.com>	2024-04-15 14:01:56 +02:00
HirazawaUi	10b6319e64	fix slow dra unit test	2024-03-16 22:21:15 +08:00
fakecore	4060ee60c1	fix: handle socket file detection on Windows Update socket file detection logic to use os.Stat as per upstream Go fix for https://github.com/golang/go/issues/33357. This resolves the issue where socket files could not be properly identified on Windows systems.	2024-03-08 18:16:10 +08:00
Kubernetes Prow Robot	70383f3701	Merge pull request #119561 from payall4u/fix-kubelet-panic-when-allocate-device Fix kubelet panic when allocate resource for pod.	2024-02-29 03:06:54 -08:00
Daniel Hu	1baf7d4586	Corrected some spelling and grammatical errors Signed-off-by: Daniel Hu <farmer.hutao@outlook.com>	2024-01-27 10:10:25 +08:00
Daniel Hu	d652596e42	Remove redundant string conversions in print statements Signed-off-by: Daniel Hu <farmer.hutao@outlook.com>	2024-01-15 09:57:35 +08:00
payall4u	d6b8a660b0	Fix kubelet panic when allocate resource for pod. Signed-off-by: payall4u <payall4u@qq.com>	2023-11-12 10:54:05 +08:00
Kubernetes Prow Robot	a5ff0324a9	Merge pull request #120461 from gjkim42/do-not-reuse-device-of-restartable-init-container Don't reuse the device of a restartable init container	2023-10-31 19:15:53 +01:00
Antonio Ojea	8e0be64b8f	remove data race on the devicemanager client plugin Change-Id: I45b85440a792e5ed2f75a344ec1f0332854d8d6d	2023-10-24 21:35:13 +00:00
Shiming Zhang	35f4d29d73	Fix unit test	2023-10-24 11:06:35 +08:00
Swati Sehgal	9a354fc9d0	node: sample-dp: Add retry to handle device plugin restart failure Add retry mechanism to handle cases where after kubelet restarts, the device plugin unix socket(s) were created but not ready to serve yet. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-10-17 12:19:10 +01:00
Swati Sehgal	d0d133298d	node: sample-dp: Use fsnotify for kubelet restart detection Add kubeletSocket file to fsnotify instead of polling and waiting for deletion of device plugin unix socket as a way of detecting kubelet restart. We need to ensure that the device plugin re-registers itself after kubelet restart depending on the configured registration mode (auto-registration or controller registration). Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-10-17 12:19:10 +01:00
Swati Sehgal	211d8cc80a	node: sample-dp: stubRegisterControlFunc for controlling registration If the user specifies the intent to control registration process, we rely on registration triggers (deletion of control file) to prompt registration. This behvaiour is expected to be consistent across kubelet restarts and therefore across the watch calls where we watch for changes to the unix socket so we make this part of Stub object instead of a parameter. Co-authored-by: Francesco Romani <fromani@redhat.com> Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-10-17 12:19:10 +01:00
Swati Sehgal	c4c9d61d66	node: sample-dp: Handle re-registration for controlled registrations In case `REGISTER_CONTROL_FILE` is specified, we want to ensure that the registration is triggered by deletion of the control file. This is applicable both when the registration happens for the first time and subsequent ones because of kubelet restarts. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-10-17 12:19:07 +01:00
Swati Sehgal	6714e678d3	node: sample-dp: register by default and re-register on restarts In issue: 115107 we added an environment variable to control the registration of sample device plugin to kubelet. The intent of this patch is to ensure that the default behaviour of the plugin is to register to kubelet (in case no environment variable is specified). In addition to that, we want to ensure that the plugin registers itself not just once. It should re-register itself to kubelet in case of node reboot or kubelet restarts. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-10-17 12:14:09 +01:00
Gunju Kim	d2b803246a	Don't reuse the device allocated to the restartable init container	2023-10-17 18:28:29 +09:00
Gunju Kim	a0610a97b3	pkg/kubelet/cm: Remove deprecated sets.String and sets.Int This removes deprecated sets.String and sets.Int - replace sets.String with sets.Set[string] - replace sets.Int with sets.Set[int] - replace sets.NewString with sets.New[string] - replace sets.NewInt with sets.New[int] - replace sets.(OLD).List with sets.List(NEW)	2023-09-27 22:02:15 +09:00
Kubernetes Prow Robot	bdcf812c95	Merge pull request #118254 from elezar/4009/add-cdi-devices-to-device-plugin Add CDI devices to device plugin API	2023-07-17 05:21:08 -07:00
Evan Lezar	b57c7e2fe4	Add CDI devices to device plugin API This change adds CDI device IDs to the ContainerAllocateResponse in the device plugin API. This allows a device plugin to specify CDI devices by their unique fully-qualified CDI device names using the related field in the CRI specification. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-07-17 11:53:09 +02:00
Francesco Romani	c635a7e7d8	node: devicemgr: topomgr: add logs One of the contributing factors of issues #118559 and #109595 hard to debug and fix is that the devicemanager has very few logs in important flow, so it's unnecessarily hard to reconstruct the state from logs. We add minimal logs to be able to improve troubleshooting. We add minimal logs to be backport-friendly, deferring a more comprehensive review of logging to later PRs. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-07-12 13:25:36 +02:00
Francesco Romani	3bcf4220ec	kubelet: devices: skip allocation for running pods When kubelet initializes, runs admission for pods and possibly allocated requested resources. We need to distinguish between node reboot (no containers running) versus kubelet restart (containers potentially running). Running pods should always survive kubelet restart. This means that device allocation on admission should not be attempted, because if a container requires devices and is still running when kubelet is restarting, that container already has devices allocated and working. Thus, we need to properly detect this scenario in the allocation step and handle it explicitely. We need to inform the devicemanager about which pods are already running. Note that if container runtime is down when kubelet restarts, the approach implemented here won't work. In this scenario, so on kubelet restart containers will again fail admission, hitting https://github.com/kubernetes/kubernetes/issues/118559 again. This scenario should however be pretty rare. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-07-12 13:25:36 +02:00
Evan Lezar	cd14e97ea8	Add a builder for ContainerAllocateResponse objects This chagne introduces a helper to construct ContainerAllocateResponse instances. Test cases are updated to use a new constructor accepting functional options allowing the response contents to be set based on the test requirements. This can then be extended to also test additional fields in the device plugin API such as annotations which are not currently covered or new fields. Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-07-11 11:48:26 +02:00
Kubernetes Prow Robot	484645e817	Merge pull request #116659 from claudiubelu/skip-flaky-tests-2 unit tests: Skip flaky tests on Windows (part 2)	2023-05-23 20:04:48 -07:00
Kubernetes Prow Robot	1241ddc567	Merge pull request #116376 from swatisehgal/device-mgr-recovery-wip node: device-mgr: Handle recovery flow by checking if healthy devices exist- attempt 2	2023-05-01 21:30:11 -07:00
Swati Sehgal	dc1a592632	node: device-mgr: Handle recovery by checking if healthy devices exist In case of node reboot/kubelet restart, the flow of events involves obtaining the state from the checkpoint file followed by setting the `healthDevices`/`unhealthyDevices` to its zero value. This is done to allow the device plugin to re-register itself so that capacity can be updated appropriately. During the allocation phase, we need to check if the resources requested by the pod have been registered AND healthy devices are present on the node to be allocated. Also we need to move this check above `needed==0` where needed is required - devices allocated to the container (which is obtained from the checkpoint file) because even in cases where no additional devices have to be allocated (as they were pre-allocated), we still need to make sure he devices that were previously allocated are healthy. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-04-28 14:41:30 +01:00
Claudiu Belu	0979d55443	unit tests: Skip flaky tests on Windows (part 2) Some of the unit tests are currently flaky on Windows. This commit skips them until they are resolved.	2023-04-13 12:07:18 +00:00
Kubernetes Prow Robot	d0fc9d16ce	Merge pull request #114800 from haoruan/feature-8976-spew-sprintf-refactor Capture spew.Sprintf() with all our favorite config into a util func	2023-04-11 15:34:57 -07:00
Hao Ruan	f638e2849f	replaced spew.Sprintf with a util pretty print function	2023-03-27 09:24:22 +08:00
Todd Neal	4096c9209c	dedupe pod resource request calculation	2023-03-09 17:15:53 -06:00
David Porter	9c20cee504	Revert "node: device-mgr: Handle recovery flow by checking if healthy devices exist"	2023-03-07 11:50:52 -08:00
Claudiu Belu	5ba74c81ca	unit tests: Skip flaky tests on Windows Some of the unit tests are currently flaky on Windows. This commit skips them until they are resolved.	2023-03-06 20:46:05 +00:00
Kubernetes Prow Robot	890d39f976	Merge pull request #114640 from swatisehgal/handle-device-mgr-recovery node: device-mgr: Handle recovery flow by checking if healthy devices exist	2023-03-06 07:10:28 -08:00
Swati Sehgal	5b2a3dbbdc	node: device-mgr: explicitly check if pre-allocated devices are healthy Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-06 11:52:23 +00:00
Swati Sehgal	a799ffb571	node: device-mgr: unit-tests: admission failure due to unhealthy devices Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-06 11:52:23 +00:00
Swati Sehgal	7ac399c205	node: device-mgr: Handle recovery by checking if healthy devices exist In case of node reboot/kubelet restart, the flow of events involves obtaining the state from the checkpoint file followed by setting the `healthDevices`/`unhealthyDevices` to its zero value. This is done to allow the device plugin to re-register itself so that capacity can be updated appropriately. During the allocation phase, we need to check if the resources requested by the pod have been registered AND healthy devices are present on the node to be allocated. Also we need to move this check above `needed==0` where needed is required - devices allocated to the container (which is obtained from the checkpoint file) because even in cases where no additional devices have to be allocated (as they were pre-allocated), we still need to make the devices that were previously allocated are healthy. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2023-03-06 11:52:23 +00:00
huyinhou	88274d96fc	update code style Signed-off-by: huyinhou <huyinhou@bytedance.com>	2023-03-06 14:23:14 +08:00
huyinhou	32495ae3f1	add lock in generate topology hints function	2023-02-20 10:56:53 +08:00

1 2 3 4 5 ...

266 Commits