Commit Graph

5145 Commits

Author SHA1 Message Date
Kuba Tużnik
8d489425aa scheduler/dynamicresources: extract obtaining and tracking in-memory modifications of DRA objects
All logic related to obtaining DRA objects and tracking modifications
to ResourceClaims in-memory is extracted to DefaultDRAManager, which
implements framework.SharedDRAManager.

This is intended to be a no-op in terms of the DRA plugin behavior.
2024-11-05 14:11:04 +01:00
Kubernetes Prow Robot
2bb886ce2a Merge pull request #128482 from sanposhiho/scheduler-perf-ff
fix: register QHint metrics only when available
2024-11-05 12:15:30 +00:00
Kubernetes Prow Robot
c69f150008 Merge pull request #127277 from pohly/dra-structured-performance
kube-scheduler: enhance performance for DRA structured parameters
2024-11-05 10:05:29 +00:00
Kensei Nakada
0bf95100f1 fix: register QHint metrics only when available 2024-11-05 18:52:27 +09:00
Maciej Skoczeń
e44041ee47 Run scheduler_perf with QueueingHints both enabled and disabled 2024-11-05 09:13:03 +00:00
Patrick Ohly
7863d9a381 DRA scheduler: refactor CEL compilation cache
A better place is the cel package because a) the name can become shorter
and b) it is tightly coupled with the compiler there.

Moving the compilation into the cache simplifies the callers.
2024-11-05 08:34:42 +01:00
Kubernetes Prow Robot
bc79d3ba87 Merge pull request #128396 from ritazh/deprecate-EnforceMountableSecretsAnnotation
deprecate EnforceMountableSecretsAnnotation in 1.32
2024-11-05 06:07:40 +00:00
Joe Betz
a0f419fe56 Add integration tests
Co-authored-by: cici37 <cicih@google.com>
Co-authored-by: Alexander Zielensk <alexzielenski@gmail.com>
2024-11-04 21:40:54 -05:00
Joe Betz
3a1733f302 Add MutatingAdmissionPolicy API
This is closely aligned with ValidatingAdmissionPolicy
except that instead of validations that can fail with
messages, there are mutations, which can be defined
either with as an ApplyConfiguration or JSONPatch.

Co-authored-by: cici37 <cicih@google.com>
2024-11-04 21:40:38 -05:00
yongruilin
105a3a2fd8 test: Add integration test for allow-metric-label 2024-11-04 13:37:40 -08:00
Rita Zhang
e7cdc59555 deprecate EnforceMountableSecretsAnnotation in 1.32
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
2024-11-04 13:13:32 -08:00
Kubernetes Prow Robot
57438d0b8f Merge pull request #128411 from macsko/split_scheduler_perf_tests
Split scheduler_perf config into subdirectories
2024-11-04 20:17:36 +00:00
Maciej Skoczeń
8371a35824 Split scheduler_perf config into subdirectories 2024-11-04 08:45:34 +00:00
googs1025
960e702596 flake(apiserver): fix TestReconcilerAPIServerLeaseMultiCombined 2024-11-02 08:46:20 +08:00
Kubernetes Prow Robot
3f5d0ee2cf Merge pull request #128497 from benluddy/cbor-request-contenttype-circuit-breaker
KEP-4222: Fall back to JSON request encoding after CBOR 415.
2024-11-01 20:05:34 +00:00
Ben Luddy
1745dfdd15 Fall back to JSON request encoding after CBOR 415.
If a client is configured to encode request bodies to CBOR, but the server does not support CBOR,
the server will respond with HTTP 415 (Unsupported Media Type). By feeding this response back to the
RESTClient, subsequent requests can fall back to JSON, which is assumed to be acceptable.
2024-11-01 12:54:04 -04:00
Ben Luddy
faf07915e1 Add integration test for CBOR-enabled dynamic client watches. 2024-11-01 12:03:30 -04:00
Patrick Ohly
bc55e82621 DRA scheduler: maintain a set of allocated device IDs
Reacting to events from the informer cache (indirectly, through the assume
cache) is more efficient than repeatedly listing it's content and then
converting to IDs with unique strings.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                        after                        │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base               │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                      54.70 ± 6%                     76.81 ± 6%  +40.42% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                     106.4 ± 4%                     105.6 ± 2%        ~ (p=0.413 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                     120.0 ± 4%                     118.9 ± 7%        ~ (p=0.117 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                      112.5 ± 4%                     105.9 ± 4%   -5.87% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                      87.13 ± 4%                    123.55 ± 4%  +41.80% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                      113.4 ± 2%                     103.3 ± 2%   -8.95% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                      65.55 ± 3%                    121.30 ± 3%  +85.05% (p=0.002 n=6)
    geomean                                                                                                90.81                          106.8       +17.57%
2024-11-01 13:23:06 +01:00
Patrick Ohly
814c9428fd DRA scheduler: cache compiled CEL expressions
DeviceClasses and different requests are very likely to contain the same
expression string. We don't need to compile that over and over again.

To avoid hanging onto that cache longer than necessary, it's currently tied to
each PreFilter/Filter combination. It might make sense to move this up into the
scheduler plugin and thus reuse compiled expressions for different pods.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                        after                        │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base               │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                      33.95 ± 4%                     36.65 ± 2%   +7.95% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                     105.8 ± 2%                     106.7 ± 3%        ~ (p=0.177 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                     100.7 ± 1%                     119.7 ± 3%  +18.82% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                      90.78 ± 1%                    121.10 ± 4%  +33.40% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                      50.51 ± 7%                     63.72 ± 3%  +26.17% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                      103.7 ± 5%                     110.2 ± 2%   +6.32% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                      28.50 ± 2%                     28.16 ± 5%        ~ (p=0.102 n=6)
    geomean                                                                                                64.99                          73.15       +12.56%
2024-11-01 13:20:06 +01:00
Patrick Ohly
941d17b3b8 DRA scheduler: code cleanups
Looking up the slice can be avoided by storing it when allocating a device.
The AllocationResult struct is small enough that it can be copied by value.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                       after                        │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base              │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                      33.30 ± 2%                     33.95 ± 4%       ~ (p=0.288 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                     105.3 ± 2%                     105.8 ± 2%       ~ (p=0.524 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                     100.8 ± 1%                     100.7 ± 1%       ~ (p=0.738 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                      90.96 ± 2%                     90.78 ± 1%       ~ (p=0.952 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                      49.84 ± 4%                     50.51 ± 7%       ~ (p=0.485 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                      103.8 ± 1%                     103.7 ± 5%       ~ (p=0.582 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                      27.21 ± 7%                     28.50 ± 2%       ~ (p=0.065 n=6)
    geomean                                                                                                64.26                          64.99       +1.14%
2024-11-01 13:19:51 +01:00
Patrick Ohly
1246898315 DRA scheduler: ResourceSlice with unique strings
Using unique strings instead of normal strings speeds up allocation with
structured parameters because maps that use those strings as key no longer need
to build hashes of the string content. However, care must be taken to call
unique.Make as little as possible because it is costly.

Pre-allocating the map of allocated devices reduces the need to grow the map
when adding devices.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                        after                         │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base                │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                     18.06 ±  2%                     33.30 ± 2%   +84.31% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                    104.7 ±  2%                     105.3 ± 2%         ~ (p=0.818 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                    96.62 ±  1%                    100.75 ± 1%    +4.28% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                     83.00 ±  2%                     90.96 ± 2%    +9.59% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                     32.45 ±  7%                     49.84 ± 4%   +53.60% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                     95.22 ±  7%                    103.80 ± 1%    +9.00% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                     9.111 ± 10%                    27.215 ± 7%  +198.69% (p=0.002 n=6)
    geomean                                                                                               45.86                           64.26        +40.12%
2024-11-01 13:19:48 +01:00
Kubernetes Prow Robot
c0e0785fe4 Merge pull request #128427 from dom4ha/scheduler-perf
Fix Unschedulable test by using high priority churn pods to get processed right after they were injected
2024-10-30 22:23:25 +00:00
Kubernetes Prow Robot
dc1d7f41ef Merge pull request #128456 from benluddy/nondeterministic-response-encoding
KEP-4222: Allow nondeterministic object encoding in HTTP response bodies.
2024-10-30 20:13:27 +00:00
Ben Luddy
dee76a460e Allow nondeterministic object encoding in HTTP response bodies. 2024-10-30 15:10:16 -04:00
Kubernetes Prow Robot
16f9fdc705 Merge pull request #128273 from benluddy/cbor-apply
KEP-4222: Support CBOR encoding for apply requests.
2024-10-30 17:25:25 +00:00
Ben Luddy
37ed906a33 Support application/apply-patch+cbor in patch requests. 2024-10-30 12:21:15 -04:00
dom4ha
ff584a76e0 Fix Unschedulable test by scheduling high priority churn pods to get processed right after they were injected (before the queued test pods) 2024-10-30 13:04:38 +00:00
Kubernetes Prow Robot
d001d5684e Merge pull request #128417 from tenzen-y/self-nominate-job-controller-reviewer
Self nominate tenzen-y as a reviewer for the Job controller
2024-10-30 11:21:39 +00:00
Kubernetes Prow Robot
a18b50e7e4 Merge pull request #128373 from mimowo/job-cover-negative-codes
Job Pod Failure policy - cover testing of negative exit codes
2024-10-30 11:21:31 +00:00
Kubernetes Prow Robot
daef8c2419 Merge pull request #127266 from pohly/dra-admin-access-in-status
DRA API: AdminAccess in DeviceRequestAllocationResult + DRAAdminAccess feature gate
2024-10-30 03:41:25 +00:00
Yuki Iwai
eca7ee877a Self nominate tenzen-y as a reviewer for the Job controller
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-10-30 01:14:47 +09:00
Patrick Ohly
9a7e4ccab2 DRA admin access: add feature gate
The new DRAAdminAccess feature gate has the following effects:
- If disabled in the apiserver, the spec.devices.requests[*].adminAccess
  field gets cleared. Same in the status. In both cases the scenario
  that it was already set and a claim or claim template get updated
  is special: in those cases, the field is not cleared.

  Also, allocating a claim with admin access is allowed regardless of the
  feature gate and the field is not cleared. In practice, the scheduler
  will not do that.
- If disabled in the resource claim controller, creating ResourceClaims
  with the field set gets rejected. This prevents running workloads
  which depend on admin access.
- If disabled in the scheduler, claims with admin access don't get
  allocated. The effect is the same.

The alternative would have been to ignore the fields in claim controller and
scheduler. This is bad because a monitoring workload then runs, blocking
resources that probably were meant for production workloads.
2024-10-29 09:50:11 +01:00
Ben Luddy
f831368428 Test response content negotiation with CBOR enablement. 2024-10-28 19:21:21 -04:00
Kubernetes Prow Robot
a15840a34b Merge pull request #128348 from sanposhiho/patch-12
Fix: use pod-high-priority.yaml to trigger preemption in PreemptionAsync test case
2024-10-28 22:19:17 +00:00
Ben Luddy
67b9dc1f3e Wire client feature gates affecting RESTClient content config. 2024-10-28 13:15:28 -04:00
Michal Wozniak
cad648035a Job Pod Failure policy - cover testing of negative exit codes 2024-10-28 07:24:26 +01:00
Kensei Nakada
b5d0745db3 Fix: use pod-high-priority.yaml to trigger preemption in PreemptionAsync test case 2024-10-26 14:16:24 +09:00
Kubernetes Prow Robot
eb10b9b979 Merge pull request #128107 from alaypatel07/kep-4017-integration-tests
[KEP-4017]: update e2e and integration test for PodIndexLabel
2024-10-25 20:18:52 +01:00
Kubernetes Prow Robot
119f114f01 Merge pull request #128196 from richabanker/move-version
Move k8s.io/apiserver/pkg/util/version to component-base
2024-10-25 18:33:01 +01:00
Alay Patel
0aa065ab7e update e2e and integration test for PodIndexLabel
Signed-off-by: Alay Patel <alayp@nvidia.com>
2024-10-25 12:17:19 -04:00
Richa Banker
9274a584b8 Split k8s.io/component-base/registry and add into k8s.io/component-base/version and k8s.io/component-base/featuregate 2024-10-24 19:09:30 -07:00
Kubernetes Prow Robot
5147eebf22 Merge pull request #128243 from benluddy/cbor-dynamic-integration
KEP-4222: Add CBOR variant of admission webhook integration test.
2024-10-25 01:04:53 +01:00
Kubernetes Prow Robot
b7a85a9db3 Merge pull request #128262 from dom4ha/scheduler-perf
Tune PreemptionAsync and Unschedulable tests threshold and params.
2024-10-24 21:24:52 +01:00
Ben Luddy
77401d7073 Add CBOR variant of admission webhook integration test.
The existing admission webhook integration test provides good coverage of serving built-in resources
and custom resources, including subresources. Serialization concerns, including roundtrippability,
of built-in types have existing test coverage; the CBOR variant of the admission webhook integration
test additionally exercises client and server codec wiring.
2024-10-24 13:27:39 -04:00
Ben Luddy
ea13190d8b Add test-only client feature gates for CBOR.
As with the apiserver feature gate for CBOR as a serving and storage encoding, the client feature
gates for CBOR are being initially added through a test-only feature gate instance that is not wired
to environment variables or to command-line flags and is intended only to be enabled
programmatically from integration tests. The test-only instance will be removed as part of alpha
graduation and replaced by conventional client feature gating.
2024-10-24 13:27:39 -04:00
Ben Luddy
0cad1a89b6 Wire test-only feature gate for CBOR serving.
To mitigate the risk of introducing a new protocol, integration tests for CBOR will be written using
a test-only feature gate instance that is not wired to runtime options. On alpha graduation, the
test-only feature gate instance will be replaced by a normal feature gate in the existing apiserver
feature gate instance.
2024-10-24 13:27:36 -04:00
Aldo Culquicondor
5fab6175b7 Remove alculquicondor from job approvers
Change-Id: I2b1514ff70108602a589522cbb63dcdc88849313
2024-10-23 17:58:55 +00:00
dom4ha
b3c4fe48e9 Tune PreemptionAsync and Unschedulable tests threshold and params. 2024-10-23 12:24:10 +00:00
Kubernetes Prow Robot
061be57eb1 Merge pull request #125808 from bzsuni/cleanup/PollUntil
Use PollUntilContextCancel to replace PollUntil in test
2024-10-23 01:17:58 +01:00
Kubernetes Prow Robot
c326b0d2a3 Merge pull request #123818 from carlory/cleanup-deployment
remove stale comment
2024-10-23 01:17:37 +01:00