Commit Graph

364 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
08dd9951f5 Merge pull request #126886 from pohly/scheduler-perf-output
scheduler_perf: output
2024-08-26 22:23:40 +01:00
Kubernetes Prow Robot
8bbc0636b9 Merge pull request #126911 from macsko/scheduler_perf_throughput_fixes
Fix wrong throughput threshold for one scheduler_perf test case
2024-08-26 18:42:17 +01:00
Kubernetes Prow Robot
0bcbc3b77a Merge pull request #124003 from carlory/scheduler-rm-non-csi-limit
kube-scheduler remove non-csi volumelimit plugins
2024-08-26 12:02:13 +01:00
Maciej Skoczeń
7a88548755 Add workload name to failed threshold log 2024-08-26 07:44:52 +00:00
Maciej Skoczeń
71c9b9e2b0 Fix wrong throughput threshold for SchedulingRequiredPodAntiAffinityWithNSSelector test 2024-08-26 07:40:04 +00:00
Maciej Skoczeń
48dc6ff43c Disable scheduler_perf performance DRA tests 2024-08-26 07:35:36 +00:00
Kubernetes Prow Robot
605e94f6df Merge pull request #126871 from macsko/set_thresholds_in_scheduler_perf
Set scheduling throughput thresholds in scheduler_perf tests
2024-08-23 16:39:54 +01:00
Maciej Skoczeń
48a8cb2bc5 Document throughput thresholds in scheduler_perf readme 2024-08-23 14:22:48 +00:00
Patrick Ohly
bf1188d292 scheduler_perf: only store log output after failures
Reconfiguring the logging infrastructure with a per-test output file mimicks
the behavior of per-test output (log output captured only on failures) while
still using the normal logging code, which is important for benchmarking.

To enable this behavior, the ARTIFACT env variable must be set.
2024-08-23 16:02:45 +02:00
Maciej Skoczeń
d0e3fc3561 Set scheduling throughput thresholds in scheduler_perf tests 2024-08-23 12:48:28 +00:00
Kubernetes Prow Robot
a1fc2551ba Merge pull request #126144 from likakuli/cleanup-unusedparamters
cleanup: remove scheduler_perf unused parameters
2024-08-22 19:29:40 +01:00
Maciej Skoczeń
77372cf3cf Label short workloads in scheduler_perf tests 2024-08-20 10:04:30 +00:00
Maciej Skoczeń
09fc399837 Add label to select short workloads in scheduler_perf tests 2024-08-20 10:04:30 +00:00
Maciej Skoczeń
a2cd8aa539 Make smaller workloads for scheduler_perf integration tests 2024-08-20 10:04:25 +00:00
Kubernetes Prow Robot
983875b2f5 Merge pull request #126337 from macsko/add_larger_scheduler_perf_test_cases
Add larger scheduler_perf test cases
2024-08-16 09:44:38 -07:00
Maciej Skoczeń
3b7b50a2cc Create fresh etcd instance for each workload in scheduler_perf 2024-08-16 08:19:52 +00:00
Maciej Skoczeń
5894e201fa Measure metrics only during a specific op in scheduler_perf 2024-08-13 12:34:06 +00:00
carlory
cba2b3f773 kube-scheduler remove non-csi volumelimit plugins 2024-08-05 15:02:32 +08:00
Maciej Skoczeń
1747483922 Add larger scheduler_perf test cases 2024-07-25 14:20:51 +00:00
Maciej Skoczeń
c15cdf7431 Init etcd and apiserver per test case in scheduler_perf integration tests 2024-07-23 09:10:01 +00:00
Patrick Ohly
9f36c8d718 DRA: add DRAControlPlaneController feature gate for "classic DRA"
In the API, the effect of the feature gate is that alpha fields get dropped on
create. They get preserved during updates if already set. The
PodSchedulingContext registration is *not* restricted by the feature gate.
This enables deleting stale PodSchedulingContext objects after disabling
the feature gate.

The scheduler checks the new feature gate before setting up an informer for
PodSchedulingContext objects and when deciding whether it can schedule a
pod. If any claim depends on a control plane controller, the scheduler bails
out, leading to:

    Status:       Pending
    ...
      Warning  FailedScheduling             73s   default-scheduler  0/1 nodes are available: resourceclaim depends on disabled DRAControlPlaneController feature. no new claims to deallocate, preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

The rest of the changes prepare for testing the new feature separately from
"structured parameters". The goal is to have base "dra" jobs which just enable
and test those, then "classic-dra" jobs which add DRAControlPlaneController.
2024-07-22 18:09:34 +02:00
Patrick Ohly
599fe605f9 DRA scheduler: adapt to v1alpha3 API
The structured parameter allocation logic was written from scratch in
staging/src/k8s.io/dynamic-resource-allocation/structured where it might be
useful for out-of-tree components.

Besides the new features (amount, admin access) and API it now supports
backtracking when the initial device selection doesn't lead to a complete
allocation of all claims.

Co-authored-by: Ed Bartosh <eduard.bartosh@intel.com>
Co-authored-by: John Belamaric <jbelamaric@google.com>
2024-07-22 18:09:34 +02:00
Patrick Ohly
8a629b9f15 DRA: remove "sharable" from claim allocation result
Now all claims are shareable up to the limit imposed by the size of the
"reserverFor" array.

This is one of the agreed simplifications for 1.31.
2024-07-21 17:28:14 +02:00
Patrick Ohly
b51d68bb87 DRA: bump API v1alpha2 -> v1alpha3
This is in preparation for revamping the resource.k8s.io completely. Because
there will be no support for transitioning from v1alpha2 to v1alpha3, the
roundtrip test data for that API in 1.29 and 1.30 gets removed.

Repeating the version in the import name of the API packages is not really
required. It was done for a while to support simpler grepping for usage of
alpha APIs, but there are better ways for that now. So during this transition,
"resourceapi" gets used instead of "resourcev1alpha3" and the version gets
dropped from informer and lister imports. The advantage is that the next bump
to v1beta1 will affect fewer source code lines.

Only source code where the version really matters (like API registration)
retains the versioned import.
2024-07-21 17:28:13 +02:00
likakuli
ef9e1c39e9 cleanup: remove unused parameters
Signed-off-by: likakuli <1154584512@qq.com>
2024-07-17 16:27:12 +08:00
Kubernetes Prow Robot
a6460c4f3e Merge pull request #126036 from macsko/scheduler_perf_throughput_thresholds
Allow to set scheduling throughput thresholds in scheduler_perf tests
2024-07-16 21:43:13 -07:00
Maciej Skoczeń
767d2a3e5e Allow to set scheduling throughput thresholds in scheduler_perf tests 2024-07-15 08:06:21 +00:00
Maciej Skoczeń
ad59b4026e Increase API server timeout in scheduler_perf tests 2024-07-10 07:34:59 +00:00
Kubernetes Prow Robot
a2a5b67442 Merge pull request #125822 from kerthcet/fix/schedule_perf-failure
Log the error margin to avoid failures in schedule_perf
2024-07-01 05:07:27 -07:00
kerthcet
e106b3a31f Log the error margin to avoid failures in schedule_perf
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-07-01 18:22:31 +08:00
Kubernetes Prow Robot
fa75b4371d Merge pull request #125550 from sanposhiho/scheduler_perf-gated
add a test case with gated pod-affinity pods to scheduler_perf
2024-06-29 07:06:42 -07:00
Kensei Nakada
d6d55196ae add a test case with PodAffinity gated pods to scheduler_perf 2024-06-29 03:35:10 +00:00
Patrick Ohly
bde9b64cdf DRA: remove "source" indirection from v1 Pod API
This makes the API nicer:

    resourceClaims:
    - name: with-template
      resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      resourceClaimName: test-shared-claim

Previously, this was:

    resourceClaims:
    - name: with-template
      source:
        resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      source:
        resourceClaimName: test-shared-claim

A more long-term benefit is that other, future alternatives
might not make sense under the "source" umbrella.

This is a breaking change. It's justified because DRA is still
alpha and will have several other API breaks in 1.31.
2024-06-27 17:53:24 +02:00
Maciej Skoczeń
7532e74117 Don't fail on churn delete in scheduler_perf tests when context canceled 2024-06-19 08:08:13 +00:00
Maciej Skoczeń
05b2c14d64 Measure performance of scheduling when many gated pods 2024-06-18 12:39:21 +00:00
Maciej Skoczeń
c09440c691 Add possibility to delete pods at specified frequency in scheduler_perf tests 2024-06-18 09:40:50 +00:00
Kubernetes Prow Robot
5df8e15a84 Merge pull request #125562 from pohly/scheduler-perf-default-verbosity
scheduler_perf: fix setting default verbosity
2024-06-18 02:16:07 -07:00
Kubernetes Prow Robot
3b90ae4f58 Merge pull request #124548 from pohly/dra-scheduler-perf-structured-parameters
scheduler_perf: add DRA structured parameters test with shared claims
2024-06-18 02:15:58 -07:00
Patrick Ohly
381c28407e scheduler_perf: fix setting default verbosity
It needs to be set twice, once for ktesting+klog, once for
component-base/logs. The latter was not done before and thus quite a bit of log
output was produced with verbosity 0.
2024-06-18 08:44:16 +02:00
Patrick Ohly
d88a153086 scheduler_perf: add DRA structured parameters test with shared claims
Several pods sharing the same claim is not common, but can be useful and thus
should get tested.

Before, createPods and createAny operations were not able to do this because
each generated object was the same. What we need are different, predictable
names of the claims (from createAny) and different references to those in the
pods (from createPods). Now text/template processing with the index number of
the pod respectively claim as input is used to inject these varying fields. A
"div" function is needed to use the same claim in several different pods.

While at it, some existing test cases get cleaned up a bit (removal of
incorrect comments, adding comments for testing with queuing hints).
2024-06-17 10:13:22 +02:00
Kubernetes Prow Robot
0fd6746b2a Merge pull request #125518 from pohly/scheduler-perf-cleanup-fix
scheduler_perf: shut down apiserver clients before apiserver
2024-06-16 10:03:29 -07:00
kerthcet
1ffa1e17cd Remove noisy log in scheduler_perf
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-06-12 11:53:35 +08:00
Patrick Ohly
246e2aedf5 scheduler_perf: shut down apiserver clients before apiserver
The cancellation of the context happened after the cleanup of the apiserver, so
clients using that context were kept running. That wasn't the intent and causes
a slow shutdown because the apiserver delays its shutdown when it has active
clients.

The fix is to create a new cancellation context and to use that for the
clients. The automatic cancellation of it then happens before the apiserver
cleanup.
2024-06-05 11:00:46 +02:00
Kensei Nakada
ef9e14db79 scheduler_perf: measure the degradation of daemonset scheduling 2024-06-05 02:36:31 +00:00
kerthcet
e678496c6e reorganize the scheduler_perf testcases
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-05-31 16:47:19 +08:00
Lubomir I. Ivanov
5e290ebc90 switch k/k to pause version 3.10 2024-05-24 10:02:51 +03:00
Kubernetes Prow Robot
ade0d2140a Merge pull request #124578 from sanposhiho/scheduler_perf_scheduler_plugin_execution_duration_seconds
support `scheduler_plugin_execution_duration_seconds` in scheduler_perf
2024-05-05 06:40:44 -07:00
Kensei Nakada
c72b688e12 support scheduler_plugin_execution_duration_seconds in scheduler_perf 2024-04-27 08:22:53 +00:00
Marek Siarkowicz
3ee8178768 Cleanup defer from SetFeatureGateDuringTest function call 2024-04-24 20:25:29 +02:00
Patrick Ohly
a0add8d2c7 dra api: NodeResourceModel -> ResourceModel
When renaming NodeResourceSlice to ResourceSlice, the embedded
[Node]ResourceModel also should have been renamed.
2024-03-14 18:07:36 +01:00