kubernetes

mirror of https://github.com/optim-enterprises-bv/kubernetes.git synced 2025-11-03 19:58:17 +00:00

Author	SHA1	Message	Date
Patrick Ohly	56bd8d86a5	test/integration: use default API groups in test apiserver The goal is to make the test apiserver behave as much as kube-apiserver as possible. This ensures that tests are as realistic as possible out-of-the-box. If a test needs a special setup, then that should be visible in the test because it passes additional flags or options. One historic deviation from that goal was enabling all API groups. That change (from `7185624688`) gets reverted and tests which happened to rely on this get updated.	2025-02-24 10:20:06 +01:00
Yusuke Sakurai	5d278c138c	fix labelvalues for scheduler-perf	2025-02-10 10:00:52 +09:00
Maciej Skoczeń	930ebe16db	Add template parameters to createNodesOp in scheduler_perf	2024-10-09 08:51:04 +00:00
Maciej Skoczeń	5e2552c2b0	Allow to filter pods using labels on barrier in scheduler_perf	2024-10-01 08:48:37 +00:00
Kubernetes Prow Robot	22a30e7cbb	Merge pull request #127700 from macsko/add_option_waitforpodsprocessed Add option to wait for pods to be attempted in barrierOp in scheduler_perf	2024-10-01 05:17:49 +01:00
Maciej Skoczeń	fdbf21e03a	Allow to filter pods using labels while collecting metrics in scheduler_perf	2024-09-30 13:32:12 +00:00
Maciej Skoczeń	928670061d	Allow to wait for pods to be attempted in barrierOp in scheduler_perf	2024-09-30 08:07:15 +00:00
Patrick Ohly	d100768d94	scheduler_perf: track and visualize progress over time This is useful to see whether pod scheduling happens in bursts and how it behaves over time, which is relevant in particular for dynamic resource allocation where it may become harder at the end to find the node which still has resources available. Besides "pods scheduled" it's also useful to know how many attempts were needed, so schedule_attempts_total also gets sampled and stored. To visualize the result of one or more test runs, use: gnuplot.sh *.dat	2024-09-25 11:09:15 +02:00
Patrick Ohly	385599f0a8	scheduler_perf + DRA: measure pod scheduling at a steady state The previous tests were based on scheduling pods until the cluster was full. This is a valid scenario, but not necessarily realistic. More realistic is how quickly the scheduler can schedule new pods when some old pods finished running, in particular in a cluster that is properly utilized (= almost full). To test this, pods must get created, scheduled, and then immediately deleted. This can run for a certain period of time. Scenarios with empty and full cluster have different scheduling rates. This was previously visible for DRA because the 50% percentile of the scheduling throughput was lower than the average, but one had to guess in which scenario the throughput was lower. Now this can be measured for DRA with the new SteadyStateClusterResourceClaimTemplateStructured test. The metrics collector must watch pod events to figure out how many pods got scheduled. Polling misses pods that already got deleted again. There seems to be no relevant difference in the collected metrics (SchedulingWithResourceClaimTemplateStructured/2000pods_200nodes, 6 repetitions): │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ 157.1 ± 0% 157.1 ± 0% ~ (p=0.329 n=6) │ before │ after │ │ SchedulingThroughput/Perc50 │ SchedulingThroughput/Perc50 vs base │ 48.99 ± 8% 47.52 ± 9% ~ (p=0.937 n=6) │ before │ after │ │ SchedulingThroughput/Perc90 │ SchedulingThroughput/Perc90 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6) │ before │ after │ │ SchedulingThroughput/Perc95 │ SchedulingThroughput/Perc95 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6) │ before │ after │ │ SchedulingThroughput/Perc99 │ SchedulingThroughput/Perc99 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6)	2024-09-25 09:45:39 +02:00
Maciej Skoczeń	3047ab73f5	Reset only metrics configured in collector before the createPodsOp	2024-09-06 08:26:20 +00:00
Kubernetes Prow Robot	a1fc2551ba	Merge pull request #126144 from likakuli/cleanup-unusedparamters cleanup: remove scheduler_perf unused parameters	2024-08-22 19:29:40 +01:00
Maciej Skoczeń	5894e201fa	Measure metrics only during a specific op in scheduler_perf	2024-08-13 12:34:06 +00:00
Patrick Ohly	b51d68bb87	DRA: bump API v1alpha2 -> v1alpha3 This is in preparation for revamping the resource.k8s.io completely. Because there will be no support for transitioning from v1alpha2 to v1alpha3, the roundtrip test data for that API in 1.29 and 1.30 gets removed. Repeating the version in the import name of the API packages is not really required. It was done for a while to support simpler grepping for usage of alpha APIs, but there are better ways for that now. So during this transition, "resourceapi" gets used instead of "resourcev1alpha3" and the version gets dropped from informer and lister imports. The advantage is that the next bump to v1beta1 will affect fewer source code lines. Only source code where the version really matters (like API registration) retains the versioned import.	2024-07-21 17:28:13 +02:00
likakuli	ef9e1c39e9	cleanup: remove unused parameters Signed-off-by: likakuli <1154584512@qq.com>	2024-07-17 16:27:12 +08:00
Maciej Skoczeń	ad59b4026e	Increase API server timeout in scheduler_perf tests	2024-07-10 07:34:59 +00:00
kerthcet	e106b3a31f	Log the error margin to avoid failures in schedule_perf Signed-off-by: kerthcet <kerthcet@gmail.com>	2024-07-01 18:22:31 +08:00
Kubernetes Prow Robot	0fd6746b2a	Merge pull request #125518 from pohly/scheduler-perf-cleanup-fix scheduler_perf: shut down apiserver clients before apiserver	2024-06-16 10:03:29 -07:00
kerthcet	1ffa1e17cd	Remove noisy log in scheduler_perf Signed-off-by: kerthcet <kerthcet@gmail.com>	2024-06-12 11:53:35 +08:00
Patrick Ohly	246e2aedf5	scheduler_perf: shut down apiserver clients before apiserver The cancellation of the context happened after the cleanup of the apiserver, so clients using that context were kept running. That wasn't the intent and causes a slow shutdown because the apiserver delays its shutdown when it has active clients. The fix is to create a new cancellation context and to use that for the clients. The automatic cancellation of it then happens before the apiserver cleanup.	2024-06-05 11:00:46 +02:00
Kensei Nakada	c72b688e12	support `scheduler_plugin_execution_duration_seconds` in scheduler_perf	2024-04-27 08:22:53 +00:00
Patrick Ohly	c46ae1b26a	scheduler_perf: use ktesting.TContext + staging StartTestServer ktesting.TContext combines several different interfaces. This makes the code simpler because less parameters need to be passed around. An intentional side effect is that the apiextensions client interface becomes available, which makes it possible to use CRDs. This will be needed for future DRA tests. Support for CRDs depends on starting the apiserver via k8s.io/kubernetes/cmd/kube-apiserver/app/testing because only that enables the CRD extensions. As discussed on Slack, the long-term goal is to replace the in-tree StartTestServer with the one in staging, so this is going in the right direction.	2024-02-11 10:51:38 +01:00
Kensei Nakada	5310abe14a	make scheduler_perf usable from other repositories	2023-12-01 12:43:08 +00:00
Patrick Ohly	c74d045c4b	scheduler_perf: show name of one pending pod in error message If pods get stuck, then giving the name of one makes it possible to search for it in the log output. Without the name it's hard to figure out which pods got stuck.	2023-09-04 09:54:26 +02:00
Patrick Ohly	6b01ece580	scheduler-perf: fix perfdash display problem perfdash expects all data items to have the same set of labels. It then renders drop-down buttons for each label with all values found for each label. Previously, data items that didn't have a label didn't match any label filter in perfdash and couldn't get selected because perfdash doesn't have "unset" in it's drop-down menus. To avoid that, scheduler-perf now collects all labels and then adds missing labels with "not applicable" as value: { "data": { "Average": 939.7071223010004, "Perc50": 927.7987421383649, "Perc90": 2166.153846153846, "Perc95": 2363.076923076923, "Perc99": 2520.6153846153848 }, "unit": "ms", "labels": { "Metric": "scheduler_pod_scheduling_duration_seconds", "Name": "SchedulingBasic/5000Nodes/namespace-2", "extension_point": "not applicable", "result": "not applicable" } }, ... { "data": { "Average": 1.1172570650000004, "Perc50": 1.1418367346938776, "Perc90": 1.5500000000000003, "Perc95": 1.6410256410256412, "Perc99": 3.7333333333333334 }, "unit": "ms", "labels": { "Metric": "scheduler_framework_extension_point_duration_seconds", "Name": "SchedulingBasic/5000Nodes/namespace-2", "extension_point": "Score", "result": "not applicable" } },	2023-07-03 21:16:53 +02:00
Patrick Ohly	cecebe8ea2	scheduler_perf: add TestScheduling integration test This runs workloads that are labeled as "integration-test". The apiserver and scheduler are only started once per unique configuration, followed by each workload using that configuration. This makes execution faster. In contrast to benchmarking, we care less about starting with a clean slate for each test.	2023-06-28 09:22:25 +02:00
Patrick Ohly	dfd646e0a8	scheduler_perf: fix namespace deletion Merely deleting the namespace is not enough: - Workloads might rely on the garbage collector to get rid of obsolete objects, so we should run it to be on the safe side. - Pods must be force-deleted because kubelet is not running. - Finally, the namespace controller is needed to get rid of deleted namespaces.	2023-06-28 09:22:25 +02:00
kerthcet	0616d15712	Fix perf-test by increasing the error margin Signed-off-by: kerthcet <kerthcet@gmail.com>	2023-05-17 12:14:06 +08:00
Kubernetes Prow Robot	8b33eaa0a7	Merge pull request #116207 from pohly/dra-scheduler-perf scheduler_perf: dynamic resource allocation test cases	2023-05-10 10:58:59 -07:00
Patrick Ohly	034528a9f0	scheduler perf: add DynamicResourceAllocation test cases The default scheduler configuration must be based on the v1 API where the plugin is enabled by default. Then if (and only if) the DynamicResourceAllocation feature gate for a test is set, the corresponding API group also gets enabled. The normal dynamic resource claim controller is started if needed to create ResourceClaims from ResourceClaimTemplates. Without the upcoming optimizations in the scheduler, scheduling with dynamic resources is fairly slow. The new test cases take around 15 minutes wall clock time on my desktop.	2023-05-04 13:08:06 +02:00
Kante Yin	a7035f5459	Pass Context to StartTestServer Signed-off-by: Kante Yin <kerthcet@gmail.com>	2023-05-04 10:25:09 +08:00
Kubernetes Prow Robot	47f1bd9f80	Merge pull request #117649 from SataQiu/scheduler-remove-v1beta2-20230427 scheduler: remove deprecated v1beta2 KubeSchedulerConfiguration component config	2023-05-03 09:54:41 -07:00
Kubernetes Prow Robot	aece6838e8	Merge pull request #117232 from pohly/scheduler-perf-code-cleanups scheduler_perf: code cleanups	2023-05-03 09:54:13 -07:00
SataQiu	1f7c07f355	scheduler: remove deprecated v1beta2 KubeSchedulerConfiguration	2023-05-03 21:43:19 +08:00
Patrick Ohly	b3e0bc8864	scheduler_perf: let the test decide which informers are needed This will change when adding dynamic resource allocation test cases. Instead of changing mustSetupScheduler and StartScheduler for that, let's return the informer factory and create informers as needed in the test.	2023-04-27 15:31:40 +02:00
Patrick Ohly	78b8af9fed	scheduler_perf: update throughputCollector The previous solution had some shortcomings: - It was based on the assumption that the goroutine gets woken up at regular intervals. This is not actually guaranteed. Now the code keeps track of the actual start and end of an interval and verifies that assumption. - If no pod was scheduled (unlikely, but could happen), then "0 pods/s" got recorded. In such a case, the metric was always either zero or >= 1. A better solution is to extend the interval until some pod gets scheduled. With the larger time interval it is then possible to also track, for example, 0.5 pods/s.	2023-04-26 08:11:50 +02:00
Kubernetes Prow Robot	2bfaaf21c1	Merge pull request #117197 from pohly/scheduler-perf-cleanup scheduler perf: remove cleanup func	2023-04-11 21:17:57 -07:00
Patrick Ohly	a869a89825	scheduler perf: remove cleanup func b.Cleanup may as well get called inside the function instead of leaving that to the caller.	2023-04-11 09:43:45 +02:00
sarab	8d18ae6fc2	Use the generic Set in scheduler	2023-04-09 11:34:17 +05:30
Kante Yin	3d0894fabf	Fix failure(context canceled) in scheduler_perf benchmark (#114843 ) * Fix failure in scheduler_perf benchmark Signed-off-by: Kante Yin <kerthcet@gmail.com> * Fatal when error in cleaning up nodes in scheduler perf tests Signed-off-by: Kante Yin <kerthcet@gmail.com> * Use derived context to better organize the codes Signed-off-by: Kante Yin <kerthcet@gmail.com> * Change log level to 2 in scheduler perf-test Signed-off-by: Kante Yin <kerthcet@gmail.com> --------- Signed-off-by: Kante Yin <kerthcet@gmail.com>	2023-01-30 16:21:00 -08:00
kerthcet	d6ffb47832	Replace klog with benchmark log in scheduler_perf Signed-off-by: kerthcet <kerthcet@gmail.com>	2022-11-09 09:11:55 +08:00
Wojciech Tyczyński	5b042f0bf4	Remove RunAnAPIServer from integration tests	2022-07-25 17:52:31 +02:00
Kensei Nakada	4af3c5efeb	Skip adding data to avoid "json: unsupported value: NaN" panic when data is NaN	2022-05-05 16:11:22 +00:00
Kubernetes Prow Robot	21c0f6f6ff	Merge pull request #107677 from pohly/scheduler-integration-benchmark scheduler integration benchmark improvements	2022-02-14 01:23:28 -08:00
Patrick Ohly	e1e84c8e5f	scheduler_perf: run with -v=0 by default This provides a mechanism for overriding the forced increase of the klog verbosity to 4 when starting the apiserver and uses that for the scheduler_perf benchmark. Other tests run as before. A global variable was used because adding an explicit parameter to several helper functions would have caused a lot of code churn (test -> integration/util.StartApiserver -> integration/framework.RunAnAPIServerUsingServer -> integration/framework.startAPIServerOrDie).	2022-02-11 16:58:33 +01:00
ahrtr	fe95aa614c	io/ioutil has already been deprecated in golang 1.16, so replace all ioutil with io and os	2022-02-03 05:32:12 +08:00
kerthcet	75a255d2ed	remove scheduler component config v1beta1 Signed-off-by: kerthcet <kerthcet@gmail.com>	2021-09-28 13:13:17 +08:00
Dave Chen	dda8090037	Format json file with proper indentation Signed-off-by: Dave Chen <dave.chen@arm.com>	2021-09-07 16:14:34 +08:00
Dave Chen	63b4710f38	Don't expose struct from prometheus client library	2021-08-27 22:21:24 +08:00
Dave Chen	58ab18bc1e	Add the metric data for different extension points Signed-off-by: Dave Chen <dave.chen@arm.com>	2021-08-23 13:43:48 +08:00
Wei Huang	55765f1b49	sched: support HistogramVec in scheduler performance test	2021-07-26 20:27:37 -07:00

1 2 3

122 Commits