kubernetes

mirror of https://github.com/optim-enterprises-bv/kubernetes.git synced 2025-12-04 15:15:36 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	20b12ad5c3	Merge pull request #129685 from swatisehgal/cpu-mgr-logs-improvements CPU Manager logging improvements	2025-02-07 03:50:02 -08:00
Swati Sehgal	7997c93cfd	node: cpu-mgr: Adhere to the message style guidelines Ensure that the log messages adhere to the message style guildelines as captured [here](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md#message-style-guidelines). Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2025-02-06 16:30:02 +00:00
Swati Sehgal	ca2c46a273	node: cpu-mgr: Add logs when CPU allocation is skipped CPU Allocation is skipped in CPU Manager with static policy in case the pod doesn't belong to Guaranteed QoS or the CPUs requested are not integral. We add logs to capture these skips. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2025-02-06 16:26:40 +00:00
Swati Sehgal	01a546fe53	node: cpu-mgr: Add logs on the happy path While debugging, it can be useful to have logs to indicate that things have gone as expected especially when it comes to important events like successful startup of CPU manager and successful allocation of resources. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2025-02-06 16:25:07 +00:00
vivzbansal	d1fac494f4	resolve merge conflicts	2025-01-27 19:42:13 +00:00
vivzbansal	763e810fb5	refactor code to add sidecar container support in IPPR	2024-11-07 21:20:48 +00:00
sphrasavath	3f459f20d6	Add takeByTopologyUnCoreCachePacked if policy option align-cpus-by-uncorecache is enabled. Adding new function to evaluate uncore cache id. Reverse allocation logic. Implement preferAlignByUncorecache within TakeByTopologyNUMAPacked, along with new test cases.	2024-11-05 15:22:47 +00:00
Kubernetes Prow Robot	a93e3e7ae1	Merge pull request #127483 from nokia/strict-cpu-reservation-core KEP-4540: Add CPUManager policy option to restrict reservedSystemCPUs to system daemons and interrupt processing	2024-10-30 01:21:47 +00:00
Francesco Romani	14ec0edd10	node: metrics: add metrics about cpu pool sizes Add metrics about the sizing of the cpu pools. Currently the cpumanager maintains 2 cpu pools: - shared pool: this is where all pods with non-exclusive cpu allocation run - exclusive pool: this is the union of the set of exclusive cpus allocated to containers, if any (requires static policy in use). By reporting the size of the pools, the users (humans or machines) can get better insights and more feedback about how the resources actually allocated to the workload and how the node resources are used.	2024-10-24 15:35:51 +02:00
Mark Sasnal	f7e9766b4d	Added policy_options and policy_static unit tests	2024-10-23 15:05:46 -04:00
Francesco Romani	c025861e0c	node: metrics: add resource alignment metrics In order to improve the observability of the resource management in kubelet, cpu allocation and NUMA alignment, we add more metrics to report if resource alignment is in effect. The more precise reporting would probably be using pod status, but this would require more invasive and riskier changes, and possibly extra interactions to the APIServer. We start adding metrics to report if containers got their compute resources aligned. If metrics are growing, the assingment is working as expected; If metrics stay consistent, perhaps at zero, no resource alignment is done. Extra fixes brought by this work - retroactively add labels for existing tests - running metrics test demands precision accounting to avoid flakes; ensure the node state is restored pristine between each test, to minimize the aforementioned risk of flakes. - The test pod command line was wrong, with this the pod could not reach Running state. That gone unnoticed so far because no test using this utility function actually needed a pod in running state. Signed-off-by: Francesco Romani <fromani@redhat.com>	2024-10-23 08:05:38 +02:00
Jing Zhang	0365cf4b20	KEP-4540: Add CPUManager policy option strict-cpu-reservation Signed-off-by: Jing Zhang <jing.c.zhang.ext@nokia.com>	2024-10-21 11:57:17 -04:00
Francesco Romani	838f911dea	cpumanager: smtalign: fix error message Fix error message if availablePhysicalCPUs = 0. Without this change, the logic was mistakenly emitting the old error message, which is confusing for troubleshooting. Plus, a tiny quality of life improvement: cpumanager static policy wants to use `cpuGroupSize` multiple times. The value represents how many VCPUs per PCPUs the machine has. So, let's cache (and log!) the value in the policy data. We don't support dynamic update of the HW topology anyway. Signed-off-by: Francesco Romani <fromani@redhat.com>	2024-10-10 10:18:44 +02:00
Francesco Romani	a89c843edd	node: cpumgr: ErrorS -> InfoS Convert uncommon use of ErrorS(nil, ...) into more regular use of InfoS. Set the verbosiness level to make sure the message is still emitted in regular expected configuration. Signed-off-by: Francesco Romani <fromani@redhat.com>	2024-07-22 14:04:04 +02:00
Jiaxin Shan	6c85fd4ddd	KEP-4176: Add static policy option to distribute cpus across cores	2024-07-12 11:52:51 -07:00
Gunju Kim	8b5f30ef09	Don't reuse CPU set of a restartable init container	2023-10-06 22:16:15 +09:00
Ian K. Coolidge	cede96336a	Depend on k8s.io/utils cpuset Steps performed: $ find . -name '*.go' -exec sed -i 's\|k8s.io/kubernetes/pkg/kubelet/cm/cpuset\|k8s.io/utils/cpuset\|g' {} \ $ ./hack/update-vendor.sh $ ./hack/update-gofmt.sh $ git rm -r pkg/kubelet/cm/cpuset/	2023-05-03 16:26:09 +00:00
Samuel Karp	ea74a2d877	cpumanager: fix typo in godoc Signed-off-by: Samuel Karp <samuelkarp@google.com>	2023-04-06 16:48:24 -07:00
vinay kulkarni	01b96e7704	Rename ContainerStatus.ResourcesAllocated to ContainerStatus.AllocatedResources	2023-03-10 14:49:26 +00:00
Kubernetes Prow Robot	efe20f6c9b	Merge pull request #114114 from ffromani/full-pcpus-stricter-precheck-issue113537 node: cpumgr: stricter pre-check for the policy option full-pcpus-only	2023-03-02 09:04:56 -08:00
Francesco Romani	0e9b92090c	node: cpumgr: stricter precheck for full-pcpus-only In order to implement the `full-pcpus-only` cpumanager policy option, we leverage the implementation of the algorithm which picks CPUs. By design, CPUs are taken from the biggest chunk available (socket or NUMA zone) to physical cores, down to single cores. Leveraging this, if the requested CPU count is a multiple of the SMT level (commonly 2), we're guaranteed that only full physical cores will be taken. The hidden assumption here is this holds true by construction iff the user reserved CPUs (if any) considering full physical CPUs. IOW, if the user did intentionally or mistakely reserve single threads which are no core siblings[1], then the simple check we implemented is not sufficient. A easy example can probably outline this better. With this setup: cores: [(0, 4), (1, 5), (2, 6), (3, 8)] (in parens: thread siblings). SMT level: 2 (each tuple is 2 elements) Reserved CPUs: 0,1 (explicit pick using `--reserved-cpus`) A container then requests 6 cpus. full-pcpus-only check: 6 % 2 == 0. Passed. The CPU allocator will take first full cores, (2,6) and (3,8), and will then pick the remaining single CPUs. The allocation will succeed, but it's incorrect. We can fix this case with a stricter precheck. We need to additionally consider all the core siblings of the reserved CPUs as unavailable when computing the free cpus, before to start the actual allocation. Doing so, we fall back in the intended behavior, and by construction all possible CPUs allocation whose number is multiple of the SMT level are now correct again. +++ [1] or thread siblings in the linux parlance, in any case: hyperthread siblings of the same physical core Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-03-02 16:00:58 +01:00
Chen Wang	7db339dba2	This commit contains the following: 1. Scheduler bug-fix + scheduler-focussed E2E tests 2. Add cgroup v2 support for in-place pod resize 3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes. Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>	2023-02-24 18:21:21 +00:00
Vinay Kulkarni	f2bd94a0de	In-place Pod Vertical Scaling - core implementation 1. Core Kubelet changes to implement In-place Pod Vertical Scaling. 2. E2E tests for In-place Pod Vertical Scaling. 3. Refactor kubelet code and add missing tests (Derek's kubelet review) 4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature. 5. Fix corner-case where resize A->B->A gets ignored 6. Add cgroup v2 support to pod resize E2E test. KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>	2023-02-24 18:21:21 +00:00
Ian K. Coolidge	f3829c4be3	cpuset: Rename 'NewCPUSet' to 'New'	2023-01-06 23:32:51 +00:00
Ian K. Coolidge	e5143d16c2	cpuset: Make 'ToSlice*' methods look like 'set' methods In 'set', conversions to slice are done also, but with different names: ToSliceNoSort() -> UnsortedList() ToSlice() -> List() Reimplement List() in terms of UnsortedList to save some duplication.	2023-01-06 23:32:51 +00:00
Ian K. Coolidge	824bd57ad6	cpuset: Convert Union arguments to variadic This allows Union to implement UnionAll easily.	2023-01-06 23:32:50 +00:00
Francesco Romani	5e12338a22	node: cpumgr: address `golint` complains Add docstrings and trivial fixes. Signed-off-by: Francesco Romani <fromani@redhat.com>	2022-11-02 18:41:42 +01:00
Kubernetes Prow Robot	d0e86111ef	Merge pull request #112855 from fromanirh/cpumanager-metrics node: metrics: cpumanager: add metrics about pinning	2022-10-31 03:12:56 -07:00
Francesco Romani	47d3299781	node: metrics: cpumanager: add pinning metrics In order to improve the observability of the cpumanager, add and populate metrics to track if the combination of the kubelet configuration and podspec would trigger exclusive core allocation and pinning. We should avoid leaking any node/machine specific information (e.g. core ids, even though this is admittedly an extreme example); tracking these metrics seems to be a good first step, because it allows us to get feedback without exposing details. Signed-off-by: Francesco Romani <fromani@redhat.com>	2022-10-27 14:40:40 +02:00
Garrybest	d446f5f90e	fix GetAllocatableCPUs in cpumanager Signed-off-by: Garrybest <garrybest@foxmail.com>	2022-10-27 19:57:06 +08:00
Arpit Singh	d92fd8392d	Adding unit test for align-by-socket policy option Also addressed MR comments as part of same commit.	2022-08-02 11:02:07 -07:00
Arpit Singh	06f347f645	Adding validity checks for topology manager align-by-socket	2022-08-02 11:02:07 -07:00
Arpit Singh	35849bf7fb	KEP-3327: Add CPUManager policy option to align CPUs by Socket instead of by NUMA node	2022-08-02 11:02:07 -07:00
Davanum Srinivas	a9593d634c	Generate and format files - Run hack/update-codegen.sh - Run hack/update-generated-device-plugin.sh - Run hack/update-generated-protobuf.sh - Run hack/update-generated-runtime.sh - Run hack/update-generated-swagger-docs.sh - Run hack/update-openapi-spec.sh - Run hack/update-gofmt.sh Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2022-07-26 13:14:05 -04:00
Kubernetes Prow Robot	03ee86c09c	Merge pull request #104837 from eggiter/fix-release-reused-cpus fix(cpumanager): Do not release CPUs of init containers while they are being reused in app containers	2022-01-06 11:46:38 -08:00
Kevin Klues	70e0f47191	Support full-pcpus-only with the new NUMA distribution policy option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	462544d079	Split CPUManager takeByTopology() into two different algorithms The first implements the original algorithm which packs CPUs onto NUMA nodes if more than one NUMA node is required to satisfy the allocation. The second disitributes CPUs across NUMA nodes if they can't all fit into one. The "distributing" algorithm is currently a noop and just returns an error of "unimplemented". A subsequent commit will add the logic to implement this algorithm according to KEP 2902: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 14:46:19 +00:00
eggiter	20d3bc32ac	fix(cpumanager): Do not release cpus of init containers while they are reused in app containers	2021-09-10 10:01:35 +08:00
Francesco Romani	23abdab2b7	smtalign: propagate policy options to policies Consume in the static policy the cpu manager policy options from the cpumanager instance. Validate in the none policy if any option is given, and fail if so - this is almost surely a configuration mistake. Add new cpumanager.Options type to hold the options and translate from user arguments to flags. Co-authored-by: Swati Sehgal <swsehgal@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-07-08 23:15:37 +02:00
pacoxu	8d24c8d0ab	update structured log for cpumanager/cpu_manager.go	2021-03-16 09:40:53 +08:00
pacoxu	9e024e839b	update structured log for policy_static.go	2021-03-12 16:26:20 +08:00
Francesco Romani	6d33354e4c	node: podresources: implement GetAllocatableResources API Extend the podresources API implementing the GetAllocatableResources endpoint, as specified in the KEPs: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments https://github.com/kubernetes/enhancements/pull/2404 Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-03-09 13:13:36 +01:00
sw.han	27b7bcb41c	Implement the cpumanager.GetPodTopologyHints() function Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>	2020-11-12 12:25:55 +01:00
Krzysztof Wiatrzyk	6db58b2e92	Update logging to use a format util Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>	2020-11-12 12:25:55 +01:00
sw.han	f5997fe537	Add GetPodTopologyHints() interface to Topology/CPU/Device Manager Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>	2020-11-12 12:25:54 +01:00
Alexey Perevalov	e33ba9e974	Avoid using socket for hints Sockets don't affect performance as NUMA node does, since NUMA node has dedicated memory controller, but socket it's physical extension point. Socket it's only cpu specific thing and it's strange to merge bitmask of deviceplugin's and cpu manager, when cpu manager takes into account socket. Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>	2020-07-22 05:14:34 -04:00
Kevin Klues	00df26a985	Fix a bug whereby reusable CPUs and devices were not being honored Previously, it was possible for reusable CPUs and reusable devices (i.e. those previously consumed by init containers) to not be reused by subsequent init containers or app containers if the TopologyManager was enabled. This would happen because hint generation for the TopologyManager was not considering the reusable devices when it made its hint calculation. As such, it would sometimes: 1) Generate a hint for a differnent NUMA node, causing the CPUs and devices to be allocated from that node instead of the one where the reusable devices live; or 2) End up thinking there were not enough CPUs or devices to allocate and throw a TopologyAffinity admission error This patch fixes this by ensuring that reusable CPUs and devices are considered as part of TopologyHint generation. This frunctionality is difficult to unit test since it spans multiple components, but an e2e test will be added in a subsequent patch to test this functionality.	2020-07-20 11:41:13 +00:00
Davanum Srinivas	442a69c3bd	switch over k/k to use klog v2 Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2020-05-16 07:54:27 -04:00
Kevin Klues	751b9f3e13	Update strategy used to reuse CPUs from init containers in CPUManager With the old strategy, it was possible for an init container to end up running without some of its CPUs being exclusive if it requested more guaranteed CPUs than the sum of all guaranteed CPUs requested by app containers. Unfortunately, this case was not caught by our unit tests because they didn't validate the state of the defaultCPUSet to ensure there was no overlap with CPUs assigned to containers. This patch updates the strategy to reuse the CPUs assigned to init containers across into app containers, while avoiding this edge case. It also updates the unit tests to now catch this type of error in the future.	2020-04-23 20:27:43 +00:00
nolancon	467f66580b	CPU Manager - Add check to policy.Allocate() for init conatiners If container allocated CPUs is an init container, release those CPUs back into the shared pool for re-allocation to next container.	2020-02-27 07:24:33 +00:00

1 2

79 Commits