kubernetes

mirror of https://github.com/optim-enterprises-bv/kubernetes.git synced 2025-11-24 18:35:10 +00:00

Author	SHA1	Message	Date
Ed Bartosh	c0d922e786	DRA: Kubelet code cleanup	2024-07-24 00:27:52 +03:00
Ed Bartosh	59555c6a62	DRA: move dra/checkpont/* to dra/state/*	2024-07-24 00:12:10 +03:00
Ed Bartosh	59daed75d6	DRA: refactor checkpointing Co-authored-by: Kevin Klues <klueska@gmail.com>	2024-07-24 00:10:30 +03:00
Patrick Ohly	877829aeaa	DRA kubelet: adapt to v1alpha3 API This adds the ability to select specific requests inside a claim for a container. NodePrepareResources is always called, even if the claim is not used by any container. This could be useful for drivers where that call has some effect other than injecting CDI device IDs into containers. It also ensures that drivers can validate configs. The pod resource API can no longer report a class for each claim because there is no such 1:1 relationship anymore. Instead, that API reports claim, API devices (with driver/pool/device as ID) and CDI device IDs. The kubelet itself doesn't extract that information from the claim. Instead, it relies on drivers to report this information when the claim gets prepared. This isolates the kubelet from API changes. Because of a faulty E2E test, kubelet was told to contact the wrong driver for a claim. This was not visible in the kubelet log output. Now changes to the claim info cache are getting logged. While at it, naming of variables and some existing log output gets harmonized. Co-authored-by: Oksana Baranova <oksana.baranova@intel.com> Co-authored-by: Ed Bartosh <eduard.bartosh@intel.com>	2024-07-22 18:09:34 +02:00
Patrick Ohly	b51d68bb87	DRA: bump API v1alpha2 -> v1alpha3 This is in preparation for revamping the resource.k8s.io completely. Because there will be no support for transitioning from v1alpha2 to v1alpha3, the roundtrip test data for that API in 1.29 and 1.30 gets removed. Repeating the version in the import name of the API packages is not really required. It was done for a while to support simpler grepping for usage of alpha APIs, but there are better ways for that now. So during this transition, "resourceapi" gets used instead of "resourcev1alpha3" and the version gets dropped from informer and lister imports. The advantage is that the next bump to v1beta1 will affect fewer source code lines. Only source code where the version really matters (like API registration) retains the versioned import.	2024-07-21 17:28:13 +02:00
Ed Bartosh	f24134d7b2	kubelet: DRA: add unit test for ClaimInfo and claimInfoCache	2024-05-03 13:30:31 +00:00
Kevin Klues	f80be2728e	kubelet: DRA: change key of claimInfo cache to "namespace/claimname" Signed-off-by: Kevin Klues <kklues@nvidia.com>	2024-05-03 13:23:29 +00:00
Kevin Klues	a8931c6c25	kubelet: DRA: update locking/checkpoint semantics of the claimInfo cache Signed-off-by: Kevin Klues <kklues@nvidia.com>	2024-05-03 13:23:27 +00:00
adrianc	08b942028f	DRA: call plugins for claims even if exist in cache Today, DRA manager does not call plugin NodePrepareResource for claims that it previously successfully handled, that is, if claims are present in cache (checkpoint) even if node rebooted. After node reboots, it is required to call DRA plugin for resource claims so that plugins may prepare them again in case the resources dont persist reboot. To achieve that, once kubelet is started, we call DRA plugins for claims once if a pod sandbox is required to be created during PodSync. Signed-off-by: adrianc <adrianc@nvidia.com>	2023-10-25 13:20:16 +03:00
Ed Bartosh	f6431c6138	DRA: don't query claims from API server When a pod is force-deleted UnprepareResources fails to get a claim from an API server. PrepareResources should cache claim info required by the UnprepareResources so that UnprepareResources would get it from the cache instead of querying API server.	2023-07-18 18:23:10 +03:00
Evan Lezar	f0e3c32fe5	Move CDI annotation code to utils package Signed-off-by: Evan Lezar <elezar@nvidia.com>	2023-07-11 11:47:53 +02:00
Moshe Levi	ffb07d1e78	kubelet dra: add lock to addCDIDevices Signed-off-by: Moshe Levi <moshele@nvidia.com>	2023-03-15 00:50:45 +02:00
Moshe Levi	2a568bcfc8	kubelet podresources: extend List to support Dynamic Resources and implement Get API Signed-off-by: Moshe Levi <moshele@nvidia.com>	2023-03-14 19:33:04 +02:00
Moshe Levi	9c57613912	Add ClassName to chekpoint state and in-memory cache Signed-off-by: Moshe Levi <moshele@nvidia.com>	2023-03-14 19:33:04 +02:00
Kevin Klues	685688c703	Update DRAManager to allow multiple plugins to process a single claim Right now, the v1alpha1 API only passes enough information for one plugin to process a claim, but the v1alpha2 API will allow for multiple plugins to process a claim. This commit prepares the code for this upcoming change. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2023-03-13 12:52:41 +00:00
Moshe Levi	e7256e08d3	kubelet dra: add checkpointing mechanism in the DRA Manager The checkpointing mechanism will repopulate DRA Manager in-memory cache on kubelet restart. This will ensure that the information needed by the PodResources API is available across a kubelet restart. The ClaimInfoState struct represent the DRA Manager in-memory cache state in checkpoint. It is embedd in the ClaimInfo which also include the annotation field. The separation between the in-memory cache and the cache state in the checkpoint is so we won't be tied to the in-memory cache struct which may change in the future. In the ClaimInfoState we save the minimal required fields to restore the in-memory cache. Signed-off-by: Moshe Levi <moshele@nvidia.com>	2023-03-10 12:22:15 +02:00
Ed Bartosh	abcb56defb	kubelet: do not enter termination status if pod might need to unprepare resources	2022-11-11 21:58:03 +01:00
Ed Bartosh	ae0f38437c	kubelet: add support for dynamic resource allocation Dependencies need to be updated to use github.com/container-orchestrated-devices/container-device-interface. It's not decided yet whether we will implement Topology support for DRA or not. Not having any toppology-related code will help to avoid wrong impression that DRA is used as a hint provider for the Topology Manager.	2022-11-11 21:58:03 +01:00

18 Commits