kubernetes

mirror of https://github.com/optim-enterprises-bv/kubernetes.git synced 2025-11-08 14:23:30 +00:00

Author	SHA1	Message	Date
Yusuke Sakurai	5d278c138c	fix labelvalues for scheduler-perf	2025-02-10 10:00:52 +09:00
Patrick Ohly	ac3d43a8a6	scheduler_perf: work around incorrect gotestsum failure reports Because Go does not a "pass" action for benchmarks (https://github.com/golang/go/issues/66825#issuecomment-2343229005), gotestsum reports a successful benchmark run as failed (https://github.com/gotestyourself/gotestsum/issues/413#issuecomment-2343206787). We can work around that in each benchmark and sub-benchmark by emitting the output line that `go test` expects on stdout from the test binary for success.	2024-11-18 12:35:05 +01:00
Patrick Ohly	369a18a3a1	scheduler_perf: simplify flags, fix output The "disabled by label filter" message for benchmarks printed the pointer to the filter string, not the filter string itself. This mistake gets avoided and the code becomes simpler when not using pointers.	2024-11-18 12:32:59 +01:00
Patrick Ohly	0301b6b504	scheduler_perf: fix steady-state pod creation/deletion This fixes an issue in TestSchedulerPerf/SteadyStateClusterResourceClaimTemplate: scheduler_perf.go:1542: FATAL ERROR: op 7: delete scheduled pods: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline That occurs when the test is almost done, but hasn't observed all scheduled pods yet. The previous attempt to address this error wasn't actually 100% correct. It covered the case when the context has already been canceled, but not this particular "will reach deadline soon".	2024-11-07 09:36:36 +01:00
Kensei Nakada	0bf95100f1	fix: register QHint metrics only when available	2024-11-05 18:52:27 +09:00
Maciej Skoczeń	8371a35824	Split scheduler_perf config into subdirectories	2024-11-04 08:45:34 +00:00
Kensei Nakada	83f9e4b6df	cleanup: remove event list	2024-10-18 11:10:10 +10:00
Kubernetes Prow Robot	558c0b6eaa	Merge pull request #128084 from macsko/fix_panic_when_defining_featuregates_only_on_workload_level_scheduler_perf Fix panic when setting feature gates only on workload level in scheduler_perf	2024-10-15 23:05:03 +01:00
Kubernetes Prow Robot	9872b17ccc	Merge pull request #127828 from macsko/add_template_parameters_to_createnodesop_in_scheduler_perf Add template parameters to createNodesOp in scheduler_perf	2024-10-15 20:43:04 +01:00
Maciej Skoczeń	cca6f8c800	Fix panic when defining feature gates only on workload level in scheduler_perf	2024-10-15 09:50:55 +00:00
Maciej Skoczeń	e676d0e76a	Allow to specify feature gates on workload level in scheduler_perf	2024-10-11 08:41:08 +00:00
Maciej Skoczeń	6dbb5d84b3	Move integration tests perf utils to scheduler_perf package	2024-10-11 08:27:08 +00:00
Maciej Skoczeń	930ebe16db	Add template parameters to createNodesOp in scheduler_perf	2024-10-09 08:51:04 +00:00
Maciej Skoczeń	5e2552c2b0	Allow to filter pods using labels on barrier in scheduler_perf	2024-10-01 08:48:37 +00:00
Kubernetes Prow Robot	22a30e7cbb	Merge pull request #127700 from macsko/add_option_waitforpodsprocessed Add option to wait for pods to be attempted in barrierOp in scheduler_perf	2024-10-01 05:17:49 +01:00
Maciej Skoczeń	fdbf21e03a	Allow to filter pods using labels while collecting metrics in scheduler_perf	2024-09-30 13:32:12 +00:00
Maciej Skoczeń	928670061d	Allow to wait for pods to be attempted in barrierOp in scheduler_perf	2024-09-30 08:07:15 +00:00
Maciej Skoczeń	837d917d91	Make sleepOp duration parametrizable in scheduler_perf	2024-09-26 13:07:22 +00:00
Maciej Skoczeń	40154baab0	Add updateAnyOp to scheduler_perf	2024-09-25 12:42:25 +00:00
Patrick Ohly	d100768d94	scheduler_perf: track and visualize progress over time This is useful to see whether pod scheduling happens in bursts and how it behaves over time, which is relevant in particular for dynamic resource allocation where it may become harder at the end to find the node which still has resources available. Besides "pods scheduled" it's also useful to know how many attempts were needed, so schedule_attempts_total also gets sampled and stored. To visualize the result of one or more test runs, use: gnuplot.sh *.dat	2024-09-25 11:09:15 +02:00
Patrick Ohly	ded96042f7	scheduler_perf + DRA: load up cluster by allocating claims Having to schedule 4999 pods to simulate a "full" cluster is slow. Creating claims and then allocating them more or less like the scheduler would when scheduling pods is much faster and in practice has the same effect on the dynamicresources plugin because it looks at claims, not pods. This allows defining the "steady state" workloads with higher number of devices ("claimsPerNode") again. This was prohibitively slow before.	2024-09-25 09:45:39 +02:00
Patrick Ohly	385599f0a8	scheduler_perf + DRA: measure pod scheduling at a steady state The previous tests were based on scheduling pods until the cluster was full. This is a valid scenario, but not necessarily realistic. More realistic is how quickly the scheduler can schedule new pods when some old pods finished running, in particular in a cluster that is properly utilized (= almost full). To test this, pods must get created, scheduled, and then immediately deleted. This can run for a certain period of time. Scenarios with empty and full cluster have different scheduling rates. This was previously visible for DRA because the 50% percentile of the scheduling throughput was lower than the average, but one had to guess in which scenario the throughput was lower. Now this can be measured for DRA with the new SteadyStateClusterResourceClaimTemplateStructured test. The metrics collector must watch pod events to figure out how many pods got scheduled. Polling misses pods that already got deleted again. There seems to be no relevant difference in the collected metrics (SchedulingWithResourceClaimTemplateStructured/2000pods_200nodes, 6 repetitions): │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ 157.1 ± 0% 157.1 ± 0% ~ (p=0.329 n=6) │ before │ after │ │ SchedulingThroughput/Perc50 │ SchedulingThroughput/Perc50 vs base │ 48.99 ± 8% 47.52 ± 9% ~ (p=0.937 n=6) │ before │ after │ │ SchedulingThroughput/Perc90 │ SchedulingThroughput/Perc90 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6) │ before │ after │ │ SchedulingThroughput/Perc95 │ SchedulingThroughput/Perc95 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6) │ before │ after │ │ SchedulingThroughput/Perc99 │ SchedulingThroughput/Perc99 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6)	2024-09-25 09:45:39 +02:00
Patrick Ohly	51cafb0053	scheduler_perf: more useful errors for configuration mistakes Before, the first error was reported, which typically was the "invalid op code" error from the createAny operation: scheduler_perf.go:900: parsing test cases error: error unmarshaling JSON: while decoding JSON: cannot unmarshal {"collectMetrics":true,"count":10,"duration":"30s","namespace":"test","opcode":"createPods","podTemplatePath":"config/dra/pod-with-claim-template.yaml","steadyState":true} into any known op type: invalid opcode "createPods"; expected "createAny" Now the opcode is determined first, then decoding into exactly the matching operation is tried and validated. Unknown fields are an error. In the case above, decoding a string into time.Duration failed: scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: decoding {"collectMetrics":true,"count":10,"duration":"30s","namespace":"test","opcode":"createPods","podTemplatePath":"config/dra/pod-with-claim-template.yaml","steadyState":true} into benchmark.createPodsOp: json: cannot unmarshal string into Go struct field createPodsOp.Duration of type time.Duration Some typos: scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: unknown opcode "sleeep" in {"duration":"5s","opcode":"sleeep"} scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: decoding {"countParram":"$deletingPods","deletePodsPerSecond":50,"opcode":"createPods"} into benchmark.createPodsOp: json: unknown field "countParram"	2024-09-25 09:45:39 +02:00
Maciej Skoczeń	a273e5381a	Add separate ops for collecting metrics from multiple namespaces in scheduler_perf	2024-09-24 12:28:53 +00:00
Maciej Skoczeń	287b61918a	Add deletePodsOp to scheduler_perf	2024-09-20 09:46:27 +00:00
Kubernetes Prow Robot	2850d302ca	Merge pull request #127269 from sanposhiho/patch-11 chore: tidy up labels in scheduler-perf	2024-09-19 04:18:44 +01:00
Maciej Skoczeń	2d4d7e0b5f	Fix opIndex in log when deleting pod failed in scheduler_perf	2024-09-16 13:48:24 +00:00
Kensei Nakada	898cb15b18	chore: clarify the labels in scheduler-perf	2024-09-14 15:39:54 +09:00
Maciej Skoczeń	c1f7b8e9f1	Measure event_handling and QHints duration metrics in scheduler_perf	2024-09-10 10:45:19 +00:00
Maciej Skoczeń	7d4c713520	Check if InFlightEvents is empty after scheduler_perf workload	2024-09-09 08:00:34 +00:00
Maciej Skoczeń	3047ab73f5	Reset only metrics configured in collector before the createPodsOp	2024-09-06 08:26:20 +00:00
Kubernetes Prow Robot	08dd9951f5	Merge pull request #126886 from pohly/scheduler-perf-output scheduler_perf: output	2024-08-26 22:23:40 +01:00
Kubernetes Prow Robot	8bbc0636b9	Merge pull request #126911 from macsko/scheduler_perf_throughput_fixes Fix wrong throughput threshold for one scheduler_perf test case	2024-08-26 18:42:17 +01:00
Kubernetes Prow Robot	0bcbc3b77a	Merge pull request #124003 from carlory/scheduler-rm-non-csi-limit kube-scheduler remove non-csi volumelimit plugins	2024-08-26 12:02:13 +01:00
Maciej Skoczeń	7a88548755	Add workload name to failed threshold log	2024-08-26 07:44:52 +00:00
Patrick Ohly	bf1188d292	scheduler_perf: only store log output after failures Reconfiguring the logging infrastructure with a per-test output file mimicks the behavior of per-test output (log output captured only on failures) while still using the normal logging code, which is important for benchmarking. To enable this behavior, the ARTIFACT env variable must be set.	2024-08-23 16:02:45 +02:00
Kubernetes Prow Robot	983875b2f5	Merge pull request #126337 from macsko/add_larger_scheduler_perf_test_cases Add larger scheduler_perf test cases	2024-08-16 09:44:38 -07:00
Maciej Skoczeń	3b7b50a2cc	Create fresh etcd instance for each workload in scheduler_perf	2024-08-16 08:19:52 +00:00
Maciej Skoczeń	5894e201fa	Measure metrics only during a specific op in scheduler_perf	2024-08-13 12:34:06 +00:00
carlory	cba2b3f773	kube-scheduler remove non-csi volumelimit plugins	2024-08-05 15:02:32 +08:00
Maciej Skoczeń	c15cdf7431	Init etcd and apiserver per test case in scheduler_perf integration tests	2024-07-23 09:10:01 +00:00
Maciej Skoczeń	767d2a3e5e	Allow to set scheduling throughput thresholds in scheduler_perf tests	2024-07-15 08:06:21 +00:00
Maciej Skoczeń	7532e74117	Don't fail on churn delete in scheduler_perf tests when context canceled	2024-06-19 08:08:13 +00:00
Maciej Skoczeń	c09440c691	Add possibility to delete pods at specified frequency in scheduler_perf tests	2024-06-18 09:40:50 +00:00
Patrick Ohly	d88a153086	scheduler_perf: add DRA structured parameters test with shared claims Several pods sharing the same claim is not common, but can be useful and thus should get tested. Before, createPods and createAny operations were not able to do this because each generated object was the same. What we need are different, predictable names of the claims (from createAny) and different references to those in the pods (from createPods). Now text/template processing with the index number of the pod respectively claim as input is used to inject these varying fields. A "div" function is needed to use the same claim in several different pods. While at it, some existing test cases get cleaned up a bit (removal of incorrect comments, adding comments for testing with queuing hints).	2024-06-17 10:13:22 +02:00
Kubernetes Prow Robot	ade0d2140a	Merge pull request #124578 from sanposhiho/scheduler_perf_scheduler_plugin_execution_duration_seconds support `scheduler_plugin_execution_duration_seconds` in scheduler_perf	2024-05-05 06:40:44 -07:00
Kensei Nakada	c72b688e12	support `scheduler_plugin_execution_duration_seconds` in scheduler_perf	2024-04-27 08:22:53 +00:00
Marek Siarkowicz	3ee8178768	Cleanup defer from SetFeatureGateDuringTest function call	2024-04-24 20:25:29 +02:00
Patrick Ohly	eb6abf0462	scheduler_perf: automatically delete created objects This is not relevant for namespaced objects, but matters for the cluster-scoped ResourceClass during unit testing. This works right now because there is only one such unit test, but will fail when adding a second one. Instead of passing a boolean flag down into all functions where it might be needed, it's now a context value.	2024-03-04 09:54:38 +01:00
Patrick Ohly	da0c9a93ae	scheduler_perf: use dynamic client to create arbitrary objects With a dynamic client and a rest mapper it is possible to load arbitrary YAML files and create the object defined by it. This is simpler than adding specific Go code for each supported type. Because the version now matters, the incorrect version in the DRA YAMLs were found and fixed.	2024-02-11 10:51:38 +01:00

1 2

53 Commits