kubernetes

mirror of https://github.com/optim-enterprises-bv/kubernetes.git synced 2025-11-06 13:18:21 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	fb033826a8	Merge pull request #128170 from sanposhiho/async-preemption feature(KEP-4832): asynchronous preemption	2024-11-07 19:44:54 +00:00
Patrick Ohly	0301b6b504	scheduler_perf: fix steady-state pod creation/deletion This fixes an issue in TestSchedulerPerf/SteadyStateClusterResourceClaimTemplate: scheduler_perf.go:1542: FATAL ERROR: op 7: delete scheduled pods: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline That occurs when the test is almost done, but hasn't observed all scheduled pods yet. The previous attempt to address this error wasn't actually 100% correct. It covered the case when the context has already been canceled, but not this particular "will reach deadline soon".	2024-11-07 09:36:36 +01:00
Kensei Nakada	4a084d54d2	feat: set the threashold on the scheduler-perf test case	2024-11-07 14:09:35 +09:00
Kensei Nakada	4b92f6d398	fix the broken part due to the merge	2024-11-07 14:09:35 +09:00
Kensei Nakada	69a8d0ec0b	feature(KEP-4832): asynchronous preemption	2024-11-07 14:09:34 +09:00
Patrick Ohly	30f5282656	DRA API: rename DeviceCapacity.Quantity to DeviceCapacity.Value Based on review feedback (https://github.com/kubernetes/kubernetes/pull/127511#discussion_r1823521172).	2024-11-06 13:03:20 +01:00
Patrick Ohly	33ea278c51	DRA: use v1beta1 API No code is left which depends on the v1alpha3, except of course the code implementing that version.	2024-11-06 13:03:19 +01:00
Kubernetes Prow Robot	0fad78930f	Merge pull request #127904 from towca/jtuznik/dra-autoscaling DRA: allow Cluster Autoscaler to integrate with DRA scheduler plugin	2024-11-06 10:01:29 +00:00
Kubernetes Prow Robot	9bbb46d05f	Merge pull request #128566 from macsko/run_scheduler_perf_with_queueinghints_enabled_disabled Run scheduler_perf with QueueingHints both enabled and disabled	2024-11-05 14:53:29 +00:00
Kuba Tużnik	8d489425aa	scheduler/dynamicresources: extract obtaining and tracking in-memory modifications of DRA objects All logic related to obtaining DRA objects and tracking modifications to ResourceClaims in-memory is extracted to DefaultDRAManager, which implements framework.SharedDRAManager. This is intended to be a no-op in terms of the DRA plugin behavior.	2024-11-05 14:11:04 +01:00
Kubernetes Prow Robot	2bb886ce2a	Merge pull request #128482 from sanposhiho/scheduler-perf-ff fix: register QHint metrics only when available	2024-11-05 12:15:30 +00:00
Kubernetes Prow Robot	c69f150008	Merge pull request #127277 from pohly/dra-structured-performance kube-scheduler: enhance performance for DRA structured parameters	2024-11-05 10:05:29 +00:00
Kensei Nakada	0bf95100f1	fix: register QHint metrics only when available	2024-11-05 18:52:27 +09:00
Maciej Skoczeń	e44041ee47	Run scheduler_perf with QueueingHints both enabled and disabled	2024-11-05 09:13:03 +00:00
Patrick Ohly	7863d9a381	DRA scheduler: refactor CEL compilation cache A better place is the cel package because a) the name can become shorter and b) it is tightly coupled with the compiler there. Moving the compilation into the cache simplifies the callers.	2024-11-05 08:34:42 +01:00
Maciej Skoczeń	8371a35824	Split scheduler_perf config into subdirectories	2024-11-04 08:45:34 +00:00
Patrick Ohly	bc55e82621	DRA scheduler: maintain a set of allocated device IDs Reacting to events from the informer cache (indirectly, through the assume cache) is more efficient than repeatedly listing it's content and then converting to IDs with unique strings. goos: linux goarch: amd64 pkg: k8s.io/kubernetes/test/integration/scheduler_perf cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 54.70 ± 6% 76.81 ± 6% +40.42% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 106.4 ± 4% 105.6 ± 2% ~ (p=0.413 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 120.0 ± 4% 118.9 ± 7% ~ (p=0.117 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 112.5 ± 4% 105.9 ± 4% -5.87% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 87.13 ± 4% 123.55 ± 4% +41.80% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 113.4 ± 2% 103.3 ± 2% -8.95% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 65.55 ± 3% 121.30 ± 3% +85.05% (p=0.002 n=6) geomean 90.81 106.8 +17.57%	2024-11-01 13:23:06 +01:00
Patrick Ohly	814c9428fd	DRA scheduler: cache compiled CEL expressions DeviceClasses and different requests are very likely to contain the same expression string. We don't need to compile that over and over again. To avoid hanging onto that cache longer than necessary, it's currently tied to each PreFilter/Filter combination. It might make sense to move this up into the scheduler plugin and thus reuse compiled expressions for different pods. goos: linux goarch: amd64 pkg: k8s.io/kubernetes/test/integration/scheduler_perf cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 33.95 ± 4% 36.65 ± 2% +7.95% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 105.8 ± 2% 106.7 ± 3% ~ (p=0.177 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 100.7 ± 1% 119.7 ± 3% +18.82% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 90.78 ± 1% 121.10 ± 4% +33.40% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 50.51 ± 7% 63.72 ± 3% +26.17% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 103.7 ± 5% 110.2 ± 2% +6.32% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 28.50 ± 2% 28.16 ± 5% ~ (p=0.102 n=6) geomean 64.99 73.15 +12.56%	2024-11-01 13:20:06 +01:00
Patrick Ohly	941d17b3b8	DRA scheduler: code cleanups Looking up the slice can be avoided by storing it when allocating a device. The AllocationResult struct is small enough that it can be copied by value. goos: linux goarch: amd64 pkg: k8s.io/kubernetes/test/integration/scheduler_perf cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 33.30 ± 2% 33.95 ± 4% ~ (p=0.288 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 105.3 ± 2% 105.8 ± 2% ~ (p=0.524 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 100.8 ± 1% 100.7 ± 1% ~ (p=0.738 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 90.96 ± 2% 90.78 ± 1% ~ (p=0.952 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 49.84 ± 4% 50.51 ± 7% ~ (p=0.485 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 103.8 ± 1% 103.7 ± 5% ~ (p=0.582 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 27.21 ± 7% 28.50 ± 2% ~ (p=0.065 n=6) geomean 64.26 64.99 +1.14%	2024-11-01 13:19:51 +01:00
Patrick Ohly	1246898315	DRA scheduler: ResourceSlice with unique strings Using unique strings instead of normal strings speeds up allocation with structured parameters because maps that use those strings as key no longer need to build hashes of the string content. However, care must be taken to call unique.Make as little as possible because it is costly. Pre-allocating the map of allocated devices reduces the need to grow the map when adding devices. goos: linux goarch: amd64 pkg: k8s.io/kubernetes/test/integration/scheduler_perf cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 18.06 ± 2% 33.30 ± 2% +84.31% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 104.7 ± 2% 105.3 ± 2% ~ (p=0.818 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 96.62 ± 1% 100.75 ± 1% +4.28% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 83.00 ± 2% 90.96 ± 2% +9.59% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 32.45 ± 7% 49.84 ± 4% +53.60% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 95.22 ± 7% 103.80 ± 1% +9.00% (p=0.002 n=6) PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 9.111 ± 10% 27.215 ± 7% +198.69% (p=0.002 n=6) geomean 45.86 64.26 +40.12%	2024-11-01 13:19:48 +01:00
dom4ha	ff584a76e0	Fix Unschedulable test by scheduling high priority churn pods to get processed right after they were injected (before the queued test pods)	2024-10-30 13:04:38 +00:00
Patrick Ohly	9a7e4ccab2	DRA admin access: add feature gate The new DRAAdminAccess feature gate has the following effects: - If disabled in the apiserver, the spec.devices.requests[*].adminAccess field gets cleared. Same in the status. In both cases the scenario that it was already set and a claim or claim template get updated is special: in those cases, the field is not cleared. Also, allocating a claim with admin access is allowed regardless of the feature gate and the field is not cleared. In practice, the scheduler will not do that. - If disabled in the resource claim controller, creating ResourceClaims with the field set gets rejected. This prevents running workloads which depend on admin access. - If disabled in the scheduler, claims with admin access don't get allocated. The effect is the same. The alternative would have been to ignore the fields in claim controller and scheduler. This is bad because a monitoring workload then runs, blocking resources that probably were meant for production workloads.	2024-10-29 09:50:11 +01:00
Kensei Nakada	b5d0745db3	Fix: use pod-high-priority.yaml to trigger preemption in PreemptionAsync test case	2024-10-26 14:16:24 +09:00
dom4ha	b3c4fe48e9	Tune PreemptionAsync and Unschedulable tests threshold and params.	2024-10-23 12:24:10 +00:00
Maciej Skoczeń	84e23fcc88	Add scheduler_perf test case for NodeUpdate event handling	2024-10-22 09:03:53 +00:00
Kensei Nakada	83f9e4b6df	cleanup: remove event list	2024-10-18 11:10:10 +10:00
Kubernetes Prow Robot	b1b4e5d397	Merge pull request #128003 from pohly/dra-classic-dra-removal DRA: remove "classic DRA"	2024-10-18 00:55:17 +01:00
dom4ha	b7f55a37a0	Bring back the smallest integration test	2024-10-17 15:41:36 +00:00
dom4ha	59458573ff	Remove unschedulable test and replace it with the new one.	2024-10-17 15:41:21 +00:00
dom4ha	f2c947e36d	Add UnschedulableAsync test in scheduler_perf to monitor impact of unschedulable pods on scheduler throughput	2024-10-17 15:35:21 +00:00
dom4ha	b2b41444f2	Add PreemptionBlocking test in scheduler_perf to monitor how long the preemption process (which blocks scheduling of regular nodes) takes.	2024-10-17 09:58:32 +00:00
Patrick Ohly	f84eb5ecf8	DRA: remove "classic DRA" This removes the DRAControlPlaneController feature gate, the fields controlled by it (claim.spec.controller, claim.status.deallocationRequested, claim.status.allocation.controller, class.spec.suitableNodes), the PodSchedulingContext type, and all code related to the feature. The feature gets removed because there is no path towards beta and GA and DRA with "structured parameters" should be able to replace it.	2024-10-16 23:09:50 +02:00
Kubernetes Prow Robot	e287784a8d	Merge pull request #128050 from macsko/add_pod_add_event_handling_scheduler_perf_test_case Add scheduler_perf test case for AssignedPodAdd event handling	2024-10-16 15:37:02 +01:00
Maciej Skoczeń	0db96a0ac3	Add scheduler_perf test case for AssignedPodAdd event handling	2024-10-16 07:45:50 +00:00
Kubernetes Prow Robot	558c0b6eaa	Merge pull request #128084 from macsko/fix_panic_when_defining_featuregates_only_on_workload_level_scheduler_perf Fix panic when setting feature gates only on workload level in scheduler_perf	2024-10-15 23:05:03 +01:00
Kubernetes Prow Robot	9872b17ccc	Merge pull request #127828 from macsko/add_template_parameters_to_createnodesop_in_scheduler_perf Add template parameters to createNodesOp in scheduler_perf	2024-10-15 20:43:04 +01:00
Maciej Skoczeń	cca6f8c800	Fix panic when defining feature gates only on workload level in scheduler_perf	2024-10-15 09:50:55 +00:00
Kubernetes Prow Robot	2f7df335ad	Merge pull request #127615 from macsko/add_node_add_event_benchmark_to_scheduler_perf Add scheduler_perf test case for NodeAdd event handling	2024-10-11 18:10:19 +01:00
Kubernetes Prow Robot	1b6c993cee	Merge pull request #127952 from macsko/allow_to_specify_feature_gates_on_workload_level_scheduler_perf Allow to set feature gates on workload level in scheduler_perf	2024-10-11 15:28:19 +01:00
Maciej Skoczeń	e676d0e76a	Allow to specify feature gates on workload level in scheduler_perf	2024-10-11 08:41:08 +00:00
Maciej Skoczeń	6dbb5d84b3	Move integration tests perf utils to scheduler_perf package	2024-10-11 08:27:08 +00:00
Maciej Skoczeń	25850caf8a	Add scheduler_perf test case for NodeAdd event handling	2024-10-11 07:40:06 +00:00
Maciej Skoczeń	930ebe16db	Add template parameters to createNodesOp in scheduler_perf	2024-10-09 08:51:04 +00:00
Maciej Skoczeń	98e4892b84	Add scheduler_perf test case for pod update events handling	2024-10-09 08:35:25 +00:00
Maciej Skoczeń	2a08ce5c68	Add scheduler_perf test case for AssignedPodDelete event handling	2024-10-02 09:16:28 +00:00
Kubernetes Prow Robot	ae617c3d20	Merge pull request #127781 from macsko/use_barrier_not_sleep_where_possible_in_scheduler_perf_test_cases Use barrier instead of sleep when possible in scheduler_perf test cases	2024-10-01 22:06:10 +01:00
Maciej Skoczeń	bae0eb91d4	Use barrier instead of sleep when possible in scheduler_perf test cases	2024-10-01 13:53:04 +00:00
Maciej Skoczeń	5e2552c2b0	Allow to filter pods using labels on barrier in scheduler_perf	2024-10-01 08:48:37 +00:00
Kubernetes Prow Robot	22a30e7cbb	Merge pull request #127700 from macsko/add_option_waitforpodsprocessed Add option to wait for pods to be attempted in barrierOp in scheduler_perf	2024-10-01 05:17:49 +01:00
Maciej Skoczeń	fdbf21e03a	Allow to filter pods using labels while collecting metrics in scheduler_perf	2024-09-30 13:32:12 +00:00

1 2 3 4 5 ...

435 Commits