kubernetes

mirror of https://github.com/optim-enterprises-bv/kubernetes.git synced 2025-11-07 22:03:22 +00:00

Author	SHA1	Message	Date
Yusuke Sakurai	5d278c138c	fix labelvalues for scheduler-perf	2025-02-10 10:00:52 +09:00
Patrick Ohly	e2ff03486d	scheduler_perf: add thresholds to DRA test cases They were enabled yesterday and executed seven times, with results that (so far) seem to be fairly stable with just one run that was slower across the board. The links in the YAML can be used to navigate to each test case quickly. The thresholds were chose with a 20% security margin below what seems to be a common result.	2025-02-03 13:10:10 +01:00
Kubernetes Prow Robot	209538059e	Merge pull request #129885 from macsko/default_topology_spreading_scheduler_perf_test_case Add scheduler_perf test case for default PodTopologySpreading constraints	2025-01-30 05:05:32 -08:00
Kubernetes Prow Robot	07cc2308c6	Merge pull request #128836 from pohly/dra-scheduler-perf-enablement DRA: enable performance tracking with scheduler_perf	2025-01-30 03:07:23 -08:00
Maciej Skoczeń	274ad0391f	Add scheduler_perf test case for default PodTopologySpreading constraints	2025-01-30 08:55:24 +00:00
dom4ha	f150016fbe	feature: Make Unschedulable scheduler performance test parametrized with the number of initial nodes.	2025-01-23 00:48:02 +00:00
Kubernetes Prow Robot	bcd65ce240	Merge pull request #128667 from macsko/add_integration_tests_for_event_handling_scheduler_perf Add integration tests for event handling cases in scheduler_perf	2024-12-12 13:10:26 +01:00
Kubernetes Prow Robot	ab9171b0cf	Merge pull request #129040 from sanposhiho/patch-14 chore: ignore dat files generated by scheduler-perf	2024-12-12 05:29:13 +00:00
Kubernetes Prow Robot	078664b424	Merge pull request #129023 from zhifei92/cleanup-actiontype scheduler: Rename UpdatePodTolerations for code style consistency	2024-12-12 05:28:52 +00:00
Kensei Nakada	8f4e425daf	chore: ignore dat files generated by scheduler-perf	2024-11-30 22:23:15 +09:00
zhifei92	27608fa25d	refactor(scheduler): Rename UpdatePodTolerations for code style consistency.	2024-11-29 13:13:09 +08:00
dom4ha	67b74696f8	Adjust performance test threshold limits	2024-11-25 15:07:15 +00:00
Patrick Ohly	0ba8af9006	DRA: enable performance tracking with scheduler_perf The performance of the basic "fill up the cluster" scenario (SchedulingWithResourceClaimTemplate) and the steady-state scenario (SteadyStateClusterResourceClaimTemplate) are relevant. The large configurations should run long enough to provide meaningful results. Performance may be different with queueing hints enabled, so variants with that get added for those large configurations.	2024-11-18 14:34:31 +01:00
Patrick Ohly	ac3d43a8a6	scheduler_perf: work around incorrect gotestsum failure reports Because Go does not a "pass" action for benchmarks (https://github.com/golang/go/issues/66825#issuecomment-2343229005), gotestsum reports a successful benchmark run as failed (https://github.com/gotestyourself/gotestsum/issues/413#issuecomment-2343206787). We can work around that in each benchmark and sub-benchmark by emitting the output line that `go test` expects on stdout from the test binary for success.	2024-11-18 12:35:05 +01:00
Patrick Ohly	369a18a3a1	scheduler_perf: simplify flags, fix output The "disabled by label filter" message for benchmarks printed the pointer to the filter string, not the filter string itself. This mistake gets avoided and the code becomes simpler when not using pointers.	2024-11-18 12:32:59 +01:00
Maciej Skoczeń	de8e8c5404	Add integration tests for event handling cases in scheduler_perf	2024-11-13 13:17:48 +00:00
Kubernetes Prow Robot	8115baca00	Merge pull request #128666 from macsko/fix_scale_down_in_eventhandlingpodupdate_scheduler_perf_test_case Fix pod scale down failure in EventHandlingPodUpdate scheduler_perf test	2024-11-12 16:28:47 +00:00
Kubernetes Prow Robot	fb033826a8	Merge pull request #128170 from sanposhiho/async-preemption feature(KEP-4832): asynchronous preemption	2024-11-07 19:44:54 +00:00
Maciej Skoczeń	379bff8dc9	Fix pod scale down failure in EventHandlingPodUpdate scheduler_perf test case	2024-11-07 13:48:50 +00:00
Patrick Ohly	0301b6b504	scheduler_perf: fix steady-state pod creation/deletion This fixes an issue in TestSchedulerPerf/SteadyStateClusterResourceClaimTemplate: scheduler_perf.go:1542: FATAL ERROR: op 7: delete scheduled pods: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline That occurs when the test is almost done, but hasn't observed all scheduled pods yet. The previous attempt to address this error wasn't actually 100% correct. It covered the case when the context has already been canceled, but not this particular "will reach deadline soon".	2024-11-07 09:36:36 +01:00
Kensei Nakada	4a084d54d2	feat: set the threashold on the scheduler-perf test case	2024-11-07 14:09:35 +09:00
Kensei Nakada	4b92f6d398	fix the broken part due to the merge	2024-11-07 14:09:35 +09:00
Kensei Nakada	69a8d0ec0b	feature(KEP-4832): asynchronous preemption	2024-11-07 14:09:34 +09:00
Patrick Ohly	30f5282656	DRA API: rename DeviceCapacity.Quantity to DeviceCapacity.Value Based on review feedback (https://github.com/kubernetes/kubernetes/pull/127511#discussion_r1823521172).	2024-11-06 13:03:20 +01:00
Patrick Ohly	33ea278c51	DRA: use v1beta1 API No code is left which depends on the v1alpha3, except of course the code implementing that version.	2024-11-06 13:03:19 +01:00
Kubernetes Prow Robot	0fad78930f	Merge pull request #127904 from towca/jtuznik/dra-autoscaling DRA: allow Cluster Autoscaler to integrate with DRA scheduler plugin	2024-11-06 10:01:29 +00:00
Kubernetes Prow Robot	9bbb46d05f	Merge pull request #128566 from macsko/run_scheduler_perf_with_queueinghints_enabled_disabled Run scheduler_perf with QueueingHints both enabled and disabled	2024-11-05 14:53:29 +00:00
Kuba Tużnik	8d489425aa	scheduler/dynamicresources: extract obtaining and tracking in-memory modifications of DRA objects All logic related to obtaining DRA objects and tracking modifications to ResourceClaims in-memory is extracted to DefaultDRAManager, which implements framework.SharedDRAManager. This is intended to be a no-op in terms of the DRA plugin behavior.	2024-11-05 14:11:04 +01:00
Kubernetes Prow Robot	2bb886ce2a	Merge pull request #128482 from sanposhiho/scheduler-perf-ff fix: register QHint metrics only when available	2024-11-05 12:15:30 +00:00
Kubernetes Prow Robot	c69f150008	Merge pull request #127277 from pohly/dra-structured-performance kube-scheduler: enhance performance for DRA structured parameters	2024-11-05 10:05:29 +00:00
Kensei Nakada	0bf95100f1	fix: register QHint metrics only when available	2024-11-05 18:52:27 +09:00
Maciej Skoczeń	e44041ee47	Run scheduler_perf with QueueingHints both enabled and disabled	2024-11-05 09:13:03 +00:00
Patrick Ohly	7863d9a381	DRA scheduler: refactor CEL compilation cache A better place is the cel package because a) the name can become shorter and b) it is tightly coupled with the compiler there. Moving the compilation into the cache simplifies the callers.	2024-11-05 08:34:42 +01:00
Maciej Skoczeń	8371a35824	Split scheduler_perf config into subdirectories	2024-11-04 08:45:34 +00:00
Patrick Ohly	bc55e82621	DRA scheduler: maintain a set of allocated device IDs Reacting to events from the informer cache (indirectly, through the assume cache) is more efficient than repeatedly listing it's content and then converting to IDs with unique strings. goos: linux goarch: amd64 pkg: k8s.io/kubernetes/test/integration/scheduler_perf cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 54.70 ± 6% 76.81 ± 6% +40.42% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 106.4 ± 4% 105.6 ± 2% ~ (p=0.413 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 120.0 ± 4% 118.9 ± 7% ~ (p=0.117 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 112.5 ± 4% 105.9 ± 4% -5.87% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 87.13 ± 4% 123.55 ± 4% +41.80% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 113.4 ± 2% 103.3 ± 2% -8.95% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 65.55 ± 3% 121.30 ± 3% +85.05% (p=0.002 n=6) geomean 90.81 106.8 +17.57%	2024-11-01 13:23:06 +01:00
Patrick Ohly	814c9428fd	DRA scheduler: cache compiled CEL expressions DeviceClasses and different requests are very likely to contain the same expression string. We don't need to compile that over and over again. To avoid hanging onto that cache longer than necessary, it's currently tied to each PreFilter/Filter combination. It might make sense to move this up into the scheduler plugin and thus reuse compiled expressions for different pods. goos: linux goarch: amd64 pkg: k8s.io/kubernetes/test/integration/scheduler_perf cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 33.95 ± 4% 36.65 ± 2% +7.95% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 105.8 ± 2% 106.7 ± 3% ~ (p=0.177 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 100.7 ± 1% 119.7 ± 3% +18.82% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 90.78 ± 1% 121.10 ± 4% +33.40% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 50.51 ± 7% 63.72 ± 3% +26.17% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 103.7 ± 5% 110.2 ± 2% +6.32% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 28.50 ± 2% 28.16 ± 5% ~ (p=0.102 n=6) geomean 64.99 73.15 +12.56%	2024-11-01 13:20:06 +01:00
Patrick Ohly	941d17b3b8	DRA scheduler: code cleanups Looking up the slice can be avoided by storing it when allocating a device. The AllocationResult struct is small enough that it can be copied by value. goos: linux goarch: amd64 pkg: k8s.io/kubernetes/test/integration/scheduler_perf cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 33.30 ± 2% 33.95 ± 4% ~ (p=0.288 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 105.3 ± 2% 105.8 ± 2% ~ (p=0.524 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 100.8 ± 1% 100.7 ± 1% ~ (p=0.738 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 90.96 ± 2% 90.78 ± 1% ~ (p=0.952 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 49.84 ± 4% 50.51 ± 7% ~ (p=0.485 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 103.8 ± 1% 103.7 ± 5% ~ (p=0.582 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 27.21 ± 7% 28.50 ± 2% ~ (p=0.065 n=6) geomean 64.26 64.99 +1.14%	2024-11-01 13:19:51 +01:00
Patrick Ohly	1246898315	DRA scheduler: ResourceSlice with unique strings Using unique strings instead of normal strings speeds up allocation with structured parameters because maps that use those strings as key no longer need to build hashes of the string content. However, care must be taken to call unique.Make as little as possible because it is costly. Pre-allocating the map of allocated devices reduces the need to grow the map when adding devices. goos: linux goarch: amd64 pkg: k8s.io/kubernetes/test/integration/scheduler_perf cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 18.06 ± 2% 33.30 ± 2% +84.31% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 104.7 ± 2% 105.3 ± 2% ~ (p=0.818 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 96.62 ± 1% 100.75 ± 1% +4.28% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 83.00 ± 2% 90.96 ± 2% +9.59% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 32.45 ± 7% 49.84 ± 4% +53.60% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 95.22 ± 7% 103.80 ± 1% +9.00% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 9.111 ± 10% 27.215 ± 7% +198.69% (p=0.002 n=6) geomean 45.86 64.26 +40.12%	2024-11-01 13:19:48 +01:00
dom4ha	ff584a76e0	Fix Unschedulable test by scheduling high priority churn pods to get processed right after they were injected (before the queued test pods)	2024-10-30 13:04:38 +00:00
Patrick Ohly	9a7e4ccab2	DRA admin access: add feature gate The new DRAAdminAccess feature gate has the following effects: - If disabled in the apiserver, the spec.devices.requests[*].adminAccess field gets cleared. Same in the status. In both cases the scenario that it was already set and a claim or claim template get updated is special: in those cases, the field is not cleared. Also, allocating a claim with admin access is allowed regardless of the feature gate and the field is not cleared. In practice, the scheduler will not do that. - If disabled in the resource claim controller, creating ResourceClaims with the field set gets rejected. This prevents running workloads which depend on admin access. - If disabled in the scheduler, claims with admin access don't get allocated. The effect is the same. The alternative would have been to ignore the fields in claim controller and scheduler. This is bad because a monitoring workload then runs, blocking resources that probably were meant for production workloads.	2024-10-29 09:50:11 +01:00
Kensei Nakada	b5d0745db3	Fix: use pod-high-priority.yaml to trigger preemption in PreemptionAsync test case	2024-10-26 14:16:24 +09:00
dom4ha	b3c4fe48e9	Tune PreemptionAsync and Unschedulable tests threshold and params.	2024-10-23 12:24:10 +00:00
Maciej Skoczeń	84e23fcc88	Add scheduler_perf test case for NodeUpdate event handling	2024-10-22 09:03:53 +00:00
Kensei Nakada	83f9e4b6df	cleanup: remove event list	2024-10-18 11:10:10 +10:00
Kubernetes Prow Robot	b1b4e5d397	Merge pull request #128003 from pohly/dra-classic-dra-removal DRA: remove "classic DRA"	2024-10-18 00:55:17 +01:00
dom4ha	b7f55a37a0	Bring back the smallest integration test	2024-10-17 15:41:36 +00:00
dom4ha	59458573ff	Remove unschedulable test and replace it with the new one.	2024-10-17 15:41:21 +00:00
dom4ha	f2c947e36d	Add UnschedulableAsync test in scheduler_perf to monitor impact of unschedulable pods on scheduler throughput	2024-10-17 15:35:21 +00:00
dom4ha	b2b41444f2	Add PreemptionBlocking test in scheduler_perf to monitor how long the preemption process (which blocks scheduling of regular nodes) takes.	2024-10-17 09:58:32 +00:00
Patrick Ohly	f84eb5ecf8	DRA: remove "classic DRA" This removes the DRAControlPlaneController feature gate, the fields controlled by it (claim.spec.controller, claim.status.deallocationRequested, claim.status.allocation.controller, class.spec.suitableNodes), the PodSchedulingContext type, and all code related to the feature. The feature gets removed because there is no path towards beta and GA and DRA with "structured parameters" should be able to replace it.	2024-10-16 23:09:50 +02:00

1 2 3 4 5 ...

453 Commits