Commit Graph

348 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
983875b2f5 Merge pull request #126337 from macsko/add_larger_scheduler_perf_test_cases
Add larger scheduler_perf test cases
2024-08-16 09:44:38 -07:00
Maciej Skoczeń
3b7b50a2cc Create fresh etcd instance for each workload in scheduler_perf 2024-08-16 08:19:52 +00:00
Maciej Skoczeń
5894e201fa Measure metrics only during a specific op in scheduler_perf 2024-08-13 12:34:06 +00:00
Maciej Skoczeń
1747483922 Add larger scheduler_perf test cases 2024-07-25 14:20:51 +00:00
Maciej Skoczeń
c15cdf7431 Init etcd and apiserver per test case in scheduler_perf integration tests 2024-07-23 09:10:01 +00:00
Patrick Ohly
9f36c8d718 DRA: add DRAControlPlaneController feature gate for "classic DRA"
In the API, the effect of the feature gate is that alpha fields get dropped on
create. They get preserved during updates if already set. The
PodSchedulingContext registration is *not* restricted by the feature gate.
This enables deleting stale PodSchedulingContext objects after disabling
the feature gate.

The scheduler checks the new feature gate before setting up an informer for
PodSchedulingContext objects and when deciding whether it can schedule a
pod. If any claim depends on a control plane controller, the scheduler bails
out, leading to:

    Status:       Pending
    ...
      Warning  FailedScheduling             73s   default-scheduler  0/1 nodes are available: resourceclaim depends on disabled DRAControlPlaneController feature. no new claims to deallocate, preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

The rest of the changes prepare for testing the new feature separately from
"structured parameters". The goal is to have base "dra" jobs which just enable
and test those, then "classic-dra" jobs which add DRAControlPlaneController.
2024-07-22 18:09:34 +02:00
Patrick Ohly
599fe605f9 DRA scheduler: adapt to v1alpha3 API
The structured parameter allocation logic was written from scratch in
staging/src/k8s.io/dynamic-resource-allocation/structured where it might be
useful for out-of-tree components.

Besides the new features (amount, admin access) and API it now supports
backtracking when the initial device selection doesn't lead to a complete
allocation of all claims.

Co-authored-by: Ed Bartosh <eduard.bartosh@intel.com>
Co-authored-by: John Belamaric <jbelamaric@google.com>
2024-07-22 18:09:34 +02:00
Patrick Ohly
8a629b9f15 DRA: remove "sharable" from claim allocation result
Now all claims are shareable up to the limit imposed by the size of the
"reserverFor" array.

This is one of the agreed simplifications for 1.31.
2024-07-21 17:28:14 +02:00
Patrick Ohly
b51d68bb87 DRA: bump API v1alpha2 -> v1alpha3
This is in preparation for revamping the resource.k8s.io completely. Because
there will be no support for transitioning from v1alpha2 to v1alpha3, the
roundtrip test data for that API in 1.29 and 1.30 gets removed.

Repeating the version in the import name of the API packages is not really
required. It was done for a while to support simpler grepping for usage of
alpha APIs, but there are better ways for that now. So during this transition,
"resourceapi" gets used instead of "resourcev1alpha3" and the version gets
dropped from informer and lister imports. The advantage is that the next bump
to v1beta1 will affect fewer source code lines.

Only source code where the version really matters (like API registration)
retains the versioned import.
2024-07-21 17:28:13 +02:00
Kubernetes Prow Robot
a6460c4f3e Merge pull request #126036 from macsko/scheduler_perf_throughput_thresholds
Allow to set scheduling throughput thresholds in scheduler_perf tests
2024-07-16 21:43:13 -07:00
Maciej Skoczeń
767d2a3e5e Allow to set scheduling throughput thresholds in scheduler_perf tests 2024-07-15 08:06:21 +00:00
Maciej Skoczeń
ad59b4026e Increase API server timeout in scheduler_perf tests 2024-07-10 07:34:59 +00:00
Kubernetes Prow Robot
a2a5b67442 Merge pull request #125822 from kerthcet/fix/schedule_perf-failure
Log the error margin to avoid failures in schedule_perf
2024-07-01 05:07:27 -07:00
kerthcet
e106b3a31f Log the error margin to avoid failures in schedule_perf
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-07-01 18:22:31 +08:00
Kubernetes Prow Robot
fa75b4371d Merge pull request #125550 from sanposhiho/scheduler_perf-gated
add a test case with gated pod-affinity pods to scheduler_perf
2024-06-29 07:06:42 -07:00
Kensei Nakada
d6d55196ae add a test case with PodAffinity gated pods to scheduler_perf 2024-06-29 03:35:10 +00:00
Patrick Ohly
bde9b64cdf DRA: remove "source" indirection from v1 Pod API
This makes the API nicer:

    resourceClaims:
    - name: with-template
      resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      resourceClaimName: test-shared-claim

Previously, this was:

    resourceClaims:
    - name: with-template
      source:
        resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      source:
        resourceClaimName: test-shared-claim

A more long-term benefit is that other, future alternatives
might not make sense under the "source" umbrella.

This is a breaking change. It's justified because DRA is still
alpha and will have several other API breaks in 1.31.
2024-06-27 17:53:24 +02:00
Maciej Skoczeń
7532e74117 Don't fail on churn delete in scheduler_perf tests when context canceled 2024-06-19 08:08:13 +00:00
Maciej Skoczeń
05b2c14d64 Measure performance of scheduling when many gated pods 2024-06-18 12:39:21 +00:00
Maciej Skoczeń
c09440c691 Add possibility to delete pods at specified frequency in scheduler_perf tests 2024-06-18 09:40:50 +00:00
Kubernetes Prow Robot
5df8e15a84 Merge pull request #125562 from pohly/scheduler-perf-default-verbosity
scheduler_perf: fix setting default verbosity
2024-06-18 02:16:07 -07:00
Kubernetes Prow Robot
3b90ae4f58 Merge pull request #124548 from pohly/dra-scheduler-perf-structured-parameters
scheduler_perf: add DRA structured parameters test with shared claims
2024-06-18 02:15:58 -07:00
Patrick Ohly
381c28407e scheduler_perf: fix setting default verbosity
It needs to be set twice, once for ktesting+klog, once for
component-base/logs. The latter was not done before and thus quite a bit of log
output was produced with verbosity 0.
2024-06-18 08:44:16 +02:00
Patrick Ohly
d88a153086 scheduler_perf: add DRA structured parameters test with shared claims
Several pods sharing the same claim is not common, but can be useful and thus
should get tested.

Before, createPods and createAny operations were not able to do this because
each generated object was the same. What we need are different, predictable
names of the claims (from createAny) and different references to those in the
pods (from createPods). Now text/template processing with the index number of
the pod respectively claim as input is used to inject these varying fields. A
"div" function is needed to use the same claim in several different pods.

While at it, some existing test cases get cleaned up a bit (removal of
incorrect comments, adding comments for testing with queuing hints).
2024-06-17 10:13:22 +02:00
Kubernetes Prow Robot
0fd6746b2a Merge pull request #125518 from pohly/scheduler-perf-cleanup-fix
scheduler_perf: shut down apiserver clients before apiserver
2024-06-16 10:03:29 -07:00
kerthcet
1ffa1e17cd Remove noisy log in scheduler_perf
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-06-12 11:53:35 +08:00
Patrick Ohly
246e2aedf5 scheduler_perf: shut down apiserver clients before apiserver
The cancellation of the context happened after the cleanup of the apiserver, so
clients using that context were kept running. That wasn't the intent and causes
a slow shutdown because the apiserver delays its shutdown when it has active
clients.

The fix is to create a new cancellation context and to use that for the
clients. The automatic cancellation of it then happens before the apiserver
cleanup.
2024-06-05 11:00:46 +02:00
Kensei Nakada
ef9e14db79 scheduler_perf: measure the degradation of daemonset scheduling 2024-06-05 02:36:31 +00:00
kerthcet
e678496c6e reorganize the scheduler_perf testcases
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-05-31 16:47:19 +08:00
Lubomir I. Ivanov
5e290ebc90 switch k/k to pause version 3.10 2024-05-24 10:02:51 +03:00
Kubernetes Prow Robot
ade0d2140a Merge pull request #124578 from sanposhiho/scheduler_perf_scheduler_plugin_execution_duration_seconds
support `scheduler_plugin_execution_duration_seconds` in scheduler_perf
2024-05-05 06:40:44 -07:00
Kensei Nakada
c72b688e12 support scheduler_plugin_execution_duration_seconds in scheduler_perf 2024-04-27 08:22:53 +00:00
Marek Siarkowicz
3ee8178768 Cleanup defer from SetFeatureGateDuringTest function call 2024-04-24 20:25:29 +02:00
Patrick Ohly
a0add8d2c7 dra api: NodeResourceModel -> ResourceModel
When renaming NodeResourceSlice to ResourceSlice, the embedded
[Node]ResourceModel also should have been renamed.
2024-03-14 18:07:36 +01:00
Patrick Ohly
0b6a0d686a dra api: rename NodeResourceSlice -> ResourceSlice
While currently those objects only get published by the kubelet for node-local
resources, this could change once we also support network-attached
resources. Dropping the "Node" prefix enables such a future extension.

The NodeName in ResourceSlice and StructuredResourceHandle then becomes
optional. The kubelet still needs to provide one and it must match its own node
name, otherwise it doesn't have permission to access ResourceSlice objects.
2024-03-07 22:22:55 +01:00
Patrick Ohly
4ed2b3eaeb scheduler_perf: test DRA with structured parameters 2024-03-07 22:21:58 +01:00
Kubernetes Prow Robot
55d1518126 Merge pull request #123588 from pohly/scheduler-perf-any-cleanup
scheduler_perf: automatically delete created objects
2024-03-04 04:49:12 -08:00
Patrick Ohly
eb6abf0462 scheduler_perf: automatically delete created objects
This is not relevant for namespaced objects, but matters for the cluster-scoped
ResourceClass during unit testing. This works right now because there is only
one such unit test, but will fail when adding a second one.

Instead of passing a boolean flag down into all functions where it might be
needed, it's now a context value.
2024-03-04 09:54:38 +01:00
Patrick Ohly
d6851ec735 scheduler_perf: fail when input YAML is invalid
The YAML files get decoded into an unstructured object, without validation, and
then sent to the apiserver with a generic client. The default behavior is to
issue a warning to the client, which gets logged by client-go. What we want
instead is an error that causes the test to fail in a clean way right at the
beginning.
2024-02-29 09:53:16 +01:00
Patrick Ohly
da0c9a93ae scheduler_perf: use dynamic client to create arbitrary objects
With a dynamic client and a rest mapper it is possible to load arbitrary YAML
files and create the object defined by it. This is simpler than adding specific
Go code for each supported type.

Because the version now matters, the incorrect version in the DRA YAMLs were
found and fixed.
2024-02-11 10:51:38 +01:00
Patrick Ohly
c46ae1b26a scheduler_perf: use ktesting.TContext + staging StartTestServer
ktesting.TContext combines several different interfaces. This makes the code
simpler because less parameters need to be passed around.

An intentional side effect is that the apiextensions client interface becomes
available, which makes it possible to use CRDs. This will be needed for future
DRA tests.

Support for CRDs depends on starting the apiserver via
k8s.io/kubernetes/cmd/kube-apiserver/app/testing because only that enables the
CRD extensions. As discussed on Slack, the long-term goal is to replace the
in-tree StartTestServer with the one in staging, so this is going in the right
direction.
2024-02-11 10:51:38 +01:00
Kensei Nakada
f29d6970c9 doc(scheduler_perf): enrich the documentation 2024-01-15 08:50:08 +00:00
Kensei Nakada
74a6a4581f fix by linters 2023-12-02 09:58:34 +00:00
Kensei Nakada
5310abe14a make scheduler_perf usable from other repositories 2023-12-01 12:43:08 +00:00
Kubernetes Prow Robot
4b9e15e0fe Merge pull request #120873 from pohly/dra-e2e-test-driver-enhancements
e2e dra: enhance test driver
2023-10-10 13:32:55 +02:00
Kubernetes Prow Robot
44cfd556b3 Merge pull request #120339 from pohly/scheduler-perf-dra-driver-names
scheduler_perf: use different log names for different DRA drivers
2023-10-02 06:32:56 -07:00
Kubernetes Prow Robot
5cc92713d1 Merge pull request #120335 from pohly/scheduler-perf-pod-name
scheduler_perf: show name of one pending pod in log
2023-10-02 06:32:45 -07:00
Patrick Ohly
36146ad686 e2e dra: enhance test driver
Several enhancements:
- `--resource-config` is now listed under `controller` options instead of
  `leader election`: merely a cosmetic change
- The driver name can be configured as part of the resource config. The
  command line flag overrides the config, but only when set explicitly.
  This makes it possible to pre-define complete driver setups where the
  name is associated with certain resource availability. This will be
  used for testing cluster autoscaling.
- The set of nodes where resources are available can optionally be specified
  via node labels. This will be used for testing cluster autoscaling.
2023-09-25 19:50:33 +02:00
Junhao Zou
43c05e98ca cleanup: Replace the deprecated NewMemCacheClient with memory.NewMemCacheClient 2023-09-08 11:57:46 +08:00
Patrick Ohly
c74d045c4b scheduler_perf: show name of one pending pod in error message
If pods get stuck, then giving the name of one makes it possible
to search for it in the log output. Without the name it's hard
to figure out which pods got stuck.
2023-09-04 09:54:26 +02:00