kubernetes

mirror of https://github.com/outbackdingo/kubernetes.git synced 2026-01-28 10:19:31 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	fc268ecd09	Merge pull request #129823 from googs1025/chore/log_improve fix(dra plugin): when there is no resourceclaim, return directly	2025-02-02 16:28:56 -08:00
googs1025	ed826dddfe	fix(dra plugin): when there is no resourceclaim, return directly	2025-01-29 08:47:52 +08:00
Davanum Srinivas	4e05bc20db	Linter to ensure go-cmp/cmp is used ONLY in tests Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2025-01-24 20:49:14 -05:00
Kubernetes Prow Robot	2056dbca18	Merge pull request #129697 from saza-ku/remove-pvc-not-found chore: remove duplicated test case of `pvc not found`	2025-01-24 09:35:21 -08:00
googs1025	27ec5de863	chore(scheduler): improve log output for podtopologyspread filter plugin	2025-01-24 19:29:23 +08:00
saza-ku	e26fb1c393	remove duplicated test case of `pvc not found`	2025-01-20 01:52:35 +09:00
Kubernetes Prow Robot	0b789d7cca	Merge pull request #129427 from macsko/improve_map_in_interpodaffinity_prefilter Improve topologyToMatchedTermCount map in InterPodAffinity PreFilter	2025-01-10 10:40:33 -08:00
Maciej Skoczeń	2d82687114	Improve topologyToMatchedTermCount map in InterPodAffinity PreFilter	2025-01-10 10:55:49 +00:00
Paco Xu	2653caa248	fix dra test lint	2025-01-09 10:42:40 +08:00
googs1025	77eae7c34f	feature(scheduler): remove dra plugin resourceslice QueueingHintFn	2025-01-08 16:24:28 +08:00
Kubernetes Prow Robot	1c2b2cce10	Merge pull request #129119 from macsko/fix_podtopologyspread_for_multiple_constraints_with_the_same_key Fix PodTopologySpread matching pods counts for constraints with the same topologyKey	2025-01-01 11:04:14 +01:00
Maciej Skoczeń	c3a54926a4	Fix PodTopologySpread matching pods counts for constraints with the same topologyKey	2024-12-30 09:35:24 +00:00
Kubernetes Prow Robot	078664b424	Merge pull request #129023 from zhifei92/cleanup-actiontype scheduler: Rename UpdatePodTolerations for code style consistency	2024-12-12 05:28:52 +00:00
zhifei92	27608fa25d	refactor(scheduler): Rename UpdatePodTolerations for code style consistency.	2024-11-29 13:13:09 +08:00
googs1025	c725e18e07	feature(scheduler): more fine-grained QHints for interpodaffinity plugin	2024-11-14 20:00:38 +08:00
ndixita	6db40446de	Scheduler changes: 1. Use pod-level resource when feature is enabled and resources are set at pod-level 2. Edge case handling: When a pod defines only CPU or memory limits at pod-level (but not both), and container-level requests/limits are unset, the pod-level requests stay empty for the resource without a pod-limit. The container's request for that resource is then set to the default request value from schedutil.	2024-11-08 03:00:54 +00:00
Kensei Nakada	d4d91d4ace	fix: use set methods	2024-11-07 14:09:35 +09:00
Kensei Nakada	a95b8b5085	fix: use Activate always	2024-11-07 14:09:35 +09:00
Kensei Nakada	677792663f	fix: register Pod/Delete event at the preemption plugin	2024-11-07 14:09:35 +09:00
Kensei Nakada	fe3119fa69	make sure DefaultPreemption implements PreEnqueuePlugin	2024-11-07 14:09:35 +09:00
Kensei Nakada	69a8d0ec0b	feature(KEP-4832): asynchronous preemption	2024-11-07 14:09:34 +09:00
Patrick Ohly	33ea278c51	DRA: use v1beta1 API No code is left which depends on the v1alpha3, except of course the code implementing that version.	2024-11-06 13:03:19 +01:00
Kubernetes Prow Robot	0fad78930f	Merge pull request #127904 from towca/jtuznik/dra-autoscaling DRA: allow Cluster Autoscaler to integrate with DRA scheduler plugin	2024-11-06 10:01:29 +00:00
Kubernetes Prow Robot	f81a68f488	Merge pull request #128377 from tallclair/allocated-status-2 [FG:InPlacePodVerticalScaling] Implement AllocatedResources status changes for Beta	2024-11-05 23:21:49 +00:00
Kuba Tużnik	8d489425aa	scheduler/dynamicresources: extract obtaining and tracking in-memory modifications of DRA objects All logic related to obtaining DRA objects and tracking modifications to ResourceClaims in-memory is extracted to DefaultDRAManager, which implements framework.SharedDRAManager. This is intended to be a no-op in terms of the DRA plugin behavior.	2024-11-05 14:11:04 +01:00
Patrick Ohly	7863d9a381	DRA scheduler: refactor CEL compilation cache A better place is the cel package because a) the name can become shorter and b) it is tightly coupled with the compiler there. Moving the compilation into the cache simplifies the callers.	2024-11-05 08:34:42 +01:00
Tim Allclair	81df195819	Stop using status.AllocatedResources to aggregate resources	2024-11-01 14:02:58 -07:00
Patrick Ohly	6f07fa3a5e	DRA scheduler: update some stale comments	2024-11-01 13:23:42 +01:00
Patrick Ohly	ae6b5522ea	DRA scheduler: rename variable "Allocated devices" are the ones which can be observed from the informer. "All allocated devices" also includes those which are in flight and haven't been written back to the apiserver.	2024-11-01 13:23:42 +01:00
Patrick Ohly	0130ebba1d	DRA scheduler: refactor "allocated devices" lookup The logic for skipping "admin access" was repeated in three different places. A single foreachAllocatedDevices with a callback puts it into one function.	2024-11-01 13:23:28 +01:00
Patrick Ohly	bd7ff9c4c7	DRA scheduler: update some log strings	2024-11-01 13:23:11 +01:00
Patrick Ohly	bc55e82621	DRA scheduler: maintain a set of allocated device IDs Reacting to events from the informer cache (indirectly, through the assume cache) is more efficient than repeatedly listing it's content and then converting to IDs with unique strings. goos: linux goarch: amd64 pkg: k8s.io/kubernetes/test/integration/scheduler_perf cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 54.70 ± 6% 76.81 ± 6% +40.42% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 106.4 ± 4% 105.6 ± 2% ~ (p=0.413 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 120.0 ± 4% 118.9 ± 7% ~ (p=0.117 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 112.5 ± 4% 105.9 ± 4% -5.87% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 87.13 ± 4% 123.55 ± 4% +41.80% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 113.4 ± 2% 103.3 ± 2% -8.95% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 65.55 ± 3% 121.30 ± 3% +85.05% (p=0.002 n=6) geomean 90.81 106.8 +17.57%	2024-11-01 13:23:06 +01:00
Patrick Ohly	814c9428fd	DRA scheduler: cache compiled CEL expressions DeviceClasses and different requests are very likely to contain the same expression string. We don't need to compile that over and over again. To avoid hanging onto that cache longer than necessary, it's currently tied to each PreFilter/Filter combination. It might make sense to move this up into the scheduler plugin and thus reuse compiled expressions for different pods. goos: linux goarch: amd64 pkg: k8s.io/kubernetes/test/integration/scheduler_perf cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 33.95 ± 4% 36.65 ± 2% +7.95% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 105.8 ± 2% 106.7 ± 3% ~ (p=0.177 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 100.7 ± 1% 119.7 ± 3% +18.82% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 90.78 ± 1% 121.10 ± 4% +33.40% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 50.51 ± 7% 63.72 ± 3% +26.17% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 103.7 ± 5% 110.2 ± 2% +6.32% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 28.50 ± 2% 28.16 ± 5% ~ (p=0.102 n=6) geomean 64.99 73.15 +12.56%	2024-11-01 13:20:06 +01:00
Patrick Ohly	941d17b3b8	DRA scheduler: code cleanups Looking up the slice can be avoided by storing it when allocating a device. The AllocationResult struct is small enough that it can be copied by value. goos: linux goarch: amd64 pkg: k8s.io/kubernetes/test/integration/scheduler_perf cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 33.30 ± 2% 33.95 ± 4% ~ (p=0.288 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 105.3 ± 2% 105.8 ± 2% ~ (p=0.524 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 100.8 ± 1% 100.7 ± 1% ~ (p=0.738 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 90.96 ± 2% 90.78 ± 1% ~ (p=0.952 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 49.84 ± 4% 50.51 ± 7% ~ (p=0.485 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 103.8 ± 1% 103.7 ± 5% ~ (p=0.582 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 27.21 ± 7% 28.50 ± 2% ~ (p=0.065 n=6) geomean 64.26 64.99 +1.14%	2024-11-01 13:19:51 +01:00
Patrick Ohly	1246898315	DRA scheduler: ResourceSlice with unique strings Using unique strings instead of normal strings speeds up allocation with structured parameters because maps that use those strings as key no longer need to build hashes of the string content. However, care must be taken to call unique.Make as little as possible because it is costly. Pre-allocating the map of allocated devices reduces the need to grow the map when adding devices. goos: linux goarch: amd64 pkg: k8s.io/kubernetes/test/integration/scheduler_perf cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 18.06 ± 2% 33.30 ± 2% +84.31% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 104.7 ± 2% 105.3 ± 2% ~ (p=0.818 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 96.62 ± 1% 100.75 ± 1% +4.28% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 83.00 ± 2% 90.96 ± 2% +9.59% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 32.45 ± 7% 49.84 ± 4% +53.60% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 95.22 ± 7% 103.80 ± 1% +9.00% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 9.111 ± 10% 27.215 ± 7% +198.69% (p=0.002 n=6) geomean 45.86 64.26 +40.12%	2024-11-01 13:19:48 +01:00
Patrick Ohly	7de6d070f2	DRA scheduler: avoid listing claims during Filter The Allocate call used to call back into the claim lister for each node. This was significant work which showed up at the top of the CPU profile. It's okay to list only once during PreFilter because the Filter call does not change the claim status between Allocate calls. goos: linux goarch: amd64 pkg: k8s.io/kubernetes/test/integration/scheduler_perf cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 15.04 ± 0% 18.06 ± 2% +20.07% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 105.5 ± 1% 104.7 ± 2% ~ (p=0.485 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 95.83 ± 1% 96.62 ± 1% ~ (p=0.063 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 79.67 ± 3% 83.00 ± 2% +4.18% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 27.11 ± 5% 32.45 ± 7% +19.68% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 84.00 ± 3% 95.22 ± 7% +13.36% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 7.110 ± 6% 9.111 ± 10% +28.15% (p=0.002 n=6) geomean 41.05 45.86 +11.73%	2024-11-01 12:43:17 +01:00
Kubernetes Prow Robot	223ac36b50	Merge pull request #128399 from JesseStutler/dra Refactor the dynamicResources struct to DynamicResources	2024-11-01 00:33:27 +00:00
Kubernetes Prow Robot	daef8c2419	Merge pull request #127266 from pohly/dra-admin-access-in-status DRA API: AdminAccess in DeviceRequestAllocationResult + DRAAdminAccess feature gate	2024-10-30 03:41:25 +00:00
Kubernetes Prow Robot	988769933e	Merge pull request #128307 from NoicFank/bugfix-scheduler-preemption bugfix(scheduler): preemption picks wrong victim node with higher priority pod on it	2024-10-29 19:05:02 +00:00
NoicFank	68f7a7c682	bugfix(scheduler): preemption picks wrong victim node with higher priority pod on it. Introducing pdb to preemption had disrupted the orderliness of pods in the victims, which would leads picking wrong victim node with higher priority pod on it.	2024-10-29 19:50:55 +08:00
Patrick Ohly	4419568259	DRA: treat AdminAccess as a new feature gated field Using the "normal" logic for a feature gated field simplifies the implementation of the feature gate. There is one (entirely theoretic!) problem with updating from 1.31: if a claim was allocated in 1.31 with admin access, the status field was not set because it didn't exist yet. If a driver now follows the current definition of "unset = off", then it will not grant admin access even though it should. This is theoretic because drivers are starting to support admin access with 1.32, so there shouldn't be any claim where this problem could occur.	2024-10-29 10:22:31 +01:00
Patrick Ohly	9a7e4ccab2	DRA admin access: add feature gate The new DRAAdminAccess feature gate has the following effects: - If disabled in the apiserver, the spec.devices.requests[*].adminAccess field gets cleared. Same in the status. In both cases the scenario that it was already set and a claim or claim template get updated is special: in those cases, the field is not cleared. Also, allocating a claim with admin access is allowed regardless of the feature gate and the field is not cleared. In practice, the scheduler will not do that. - If disabled in the resource claim controller, creating ResourceClaims with the field set gets rejected. This prevents running workloads which depend on admin access. - If disabled in the scheduler, claims with admin access don't get allocated. The effect is the same. The alternative would have been to ignore the fields in claim controller and scheduler. This is bad because a monitoring workload then runs, blocking resources that probably were meant for production workloads.	2024-10-29 09:50:11 +01:00
Patrick Ohly	f3fef01e79	DRA API: AdminAccess in DeviceRequestAllocationResult Drivers need to know that because admin access may also grant additional permissions. The allocator needs to ignore such results when determining which devices are considered as allocated. In both cases it is conceptually cleaner to not rely on the content of the ClaimSpec.	2024-10-29 09:50:07 +01:00
jessestutler	f7003c76b4	Refactor the dynamicResources struct to DynamicResources	2024-10-29 11:44:42 +08:00
Patrick Ohly	9d1b0654e0	DRA: add wg/device-management label automatically This makes PRs show up automatically in the WG's project board (https://github.com/orgs/kubernetes/projects/95/views/1).	2024-10-28 16:36:04 +01:00
Kubernetes Prow Robot	25d6f76538	Merge pull request #128337 from torredil/fix-gce-cos-master-serial-5123 Add VolumeAttachment event registration to CSI volume limits plugin	2024-10-26 16:00:52 +01:00
torredil	fe1badf635	Add VolumeAttachment event registration to CSI volume limits plugin Signed-off-by: torredil <torredil@amazon.com>	2024-10-26 13:41:28 +00:00
Kubernetes Prow Robot	aec2ea1877	Merge pull request #124609 from AxeZhan/refac Move some helper functions from api/v1 to component-helpers	2024-10-25 17:26:52 +01:00
AxeZhan	2ffb568540	rename functions	2024-10-25 12:53:24 +08:00
Kubernetes Prow Robot	352056f09d	Merge pull request #127757 from torredil/scheduler-bugfix-5123 scheduler: Improve CSILimits plugin accuracy by using VolumeAttachments	2024-10-23 18:12:52 +01:00

1 2 3 4 5 ...

1255 Commits