Commit Graph

1263 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
b8c95e1954 Merge pull request #129688 from cpanato/update-main-go124
[go] Bump images, dependencies and versions to go 1.24.0
2025-02-27 09:10:30 -08:00
Kubernetes Prow Robot
facb1a8c55 Merge pull request #129905 from ania-borowiec/129778_replace_equal
Replace reflect.DeepEqual with cmp.Diff in pkg/scheduler tests
2025-02-26 08:24:30 -08:00
googs1025
239aad8e4b chore(scheduler): use framework.Features in scheduler plugins 2025-02-26 19:16:07 +08:00
Jordan Liggitt
8090db5dcf Switch to private instances of rand for seeding for tests 2025-02-26 11:27:10 +01:00
Ania Borowiec
4205f04ce3 Replace uses of reflect.DeepEqual with cmp.Diff in pkg/scheduler tests 2025-02-26 09:27:51 +00:00
Kubernetes Prow Robot
4032177faf Merge pull request #129557 from googs1025/feature/add_QueueingHint_for_VolumeAttachment_deletion_events
feature(scheduler): add queueinghint for volumeattachment deletion
2025-02-22 00:10:26 -08:00
googs1025
86f504284c feature(scheduler): add queueinghint for volumeattachment deletion 2025-02-22 14:57:41 +08:00
googs1025
004c5f5a39 chore: remove unnecessary check for node is zero 2025-02-18 10:24:26 +08:00
Kubernetes Prow Robot
fc268ecd09 Merge pull request #129823 from googs1025/chore/log_improve
fix(dra plugin): when there is no resourceclaim, return directly
2025-02-02 16:28:56 -08:00
googs1025
ed826dddfe fix(dra plugin): when there is no resourceclaim, return directly 2025-01-29 08:47:52 +08:00
Davanum Srinivas
4e05bc20db Linter to ensure go-cmp/cmp is used ONLY in tests
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2025-01-24 20:49:14 -05:00
Kubernetes Prow Robot
2056dbca18 Merge pull request #129697 from saza-ku/remove-pvc-not-found
chore: remove duplicated test case of `pvc not found`
2025-01-24 09:35:21 -08:00
googs1025
27ec5de863 chore(scheduler): improve log output for podtopologyspread filter plugin 2025-01-24 19:29:23 +08:00
saza-ku
e26fb1c393 remove duplicated test case of pvc not found 2025-01-20 01:52:35 +09:00
Kubernetes Prow Robot
0b789d7cca Merge pull request #129427 from macsko/improve_map_in_interpodaffinity_prefilter
Improve topologyToMatchedTermCount map in InterPodAffinity PreFilter
2025-01-10 10:40:33 -08:00
Maciej Skoczeń
2d82687114 Improve topologyToMatchedTermCount map in InterPodAffinity PreFilter 2025-01-10 10:55:49 +00:00
Paco Xu
2653caa248 fix dra test lint 2025-01-09 10:42:40 +08:00
googs1025
77eae7c34f feature(scheduler): remove dra plugin resourceslice QueueingHintFn 2025-01-08 16:24:28 +08:00
Kubernetes Prow Robot
1c2b2cce10 Merge pull request #129119 from macsko/fix_podtopologyspread_for_multiple_constraints_with_the_same_key
Fix PodTopologySpread matching pods counts for constraints with the same topologyKey
2025-01-01 11:04:14 +01:00
Maciej Skoczeń
c3a54926a4 Fix PodTopologySpread matching pods counts for constraints with the same topologyKey 2024-12-30 09:35:24 +00:00
Kubernetes Prow Robot
078664b424 Merge pull request #129023 from zhifei92/cleanup-actiontype
scheduler:  Rename UpdatePodTolerations for code style consistency
2024-12-12 05:28:52 +00:00
zhifei92
27608fa25d refactor(scheduler): Rename UpdatePodTolerations for code style consistency. 2024-11-29 13:13:09 +08:00
googs1025
c725e18e07 feature(scheduler): more fine-grained QHints for interpodaffinity plugin 2024-11-14 20:00:38 +08:00
ndixita
6db40446de Scheduler changes:
1. Use pod-level resource when feature is enabled and resources are set at pod-level
2. Edge case handling: When a pod defines only CPU or memory limits at pod-level (but not both), and container-level requests/limits are unset, the pod-level requests stay empty for the resource without a pod-limit. The container's request for that resource is then set to the default request value from schedutil.
2024-11-08 03:00:54 +00:00
Kensei Nakada
d4d91d4ace fix: use set methods 2024-11-07 14:09:35 +09:00
Kensei Nakada
a95b8b5085 fix: use Activate always 2024-11-07 14:09:35 +09:00
Kensei Nakada
677792663f fix: register Pod/Delete event at the preemption plugin 2024-11-07 14:09:35 +09:00
Kensei Nakada
fe3119fa69 make sure DefaultPreemption implements PreEnqueuePlugin 2024-11-07 14:09:35 +09:00
Kensei Nakada
69a8d0ec0b feature(KEP-4832): asynchronous preemption 2024-11-07 14:09:34 +09:00
Patrick Ohly
33ea278c51 DRA: use v1beta1 API
No code is left which depends on the v1alpha3, except of course the code
implementing that version.
2024-11-06 13:03:19 +01:00
Kubernetes Prow Robot
0fad78930f Merge pull request #127904 from towca/jtuznik/dra-autoscaling
DRA: allow Cluster Autoscaler to integrate with DRA scheduler plugin
2024-11-06 10:01:29 +00:00
Kubernetes Prow Robot
f81a68f488 Merge pull request #128377 from tallclair/allocated-status-2
[FG:InPlacePodVerticalScaling] Implement AllocatedResources status changes for Beta
2024-11-05 23:21:49 +00:00
Kuba Tużnik
8d489425aa scheduler/dynamicresources: extract obtaining and tracking in-memory modifications of DRA objects
All logic related to obtaining DRA objects and tracking modifications
to ResourceClaims in-memory is extracted to DefaultDRAManager, which
implements framework.SharedDRAManager.

This is intended to be a no-op in terms of the DRA plugin behavior.
2024-11-05 14:11:04 +01:00
Patrick Ohly
7863d9a381 DRA scheduler: refactor CEL compilation cache
A better place is the cel package because a) the name can become shorter
and b) it is tightly coupled with the compiler there.

Moving the compilation into the cache simplifies the callers.
2024-11-05 08:34:42 +01:00
Tim Allclair
81df195819 Stop using status.AllocatedResources to aggregate resources 2024-11-01 14:02:58 -07:00
Patrick Ohly
6f07fa3a5e DRA scheduler: update some stale comments 2024-11-01 13:23:42 +01:00
Patrick Ohly
ae6b5522ea DRA scheduler: rename variable
"Allocated devices" are the ones which can be observed from the informer. "All
allocated devices" also includes those which are in flight and haven't been
written back to the apiserver.
2024-11-01 13:23:42 +01:00
Patrick Ohly
0130ebba1d DRA scheduler: refactor "allocated devices" lookup
The logic for skipping "admin access" was repeated in three different places. A
single foreachAllocatedDevices with a callback puts it into one function.
2024-11-01 13:23:28 +01:00
Patrick Ohly
bd7ff9c4c7 DRA scheduler: update some log strings 2024-11-01 13:23:11 +01:00
Patrick Ohly
bc55e82621 DRA scheduler: maintain a set of allocated device IDs
Reacting to events from the informer cache (indirectly, through the assume
cache) is more efficient than repeatedly listing it's content and then
converting to IDs with unique strings.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                        after                        │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base               │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                      54.70 ± 6%                     76.81 ± 6%  +40.42% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                     106.4 ± 4%                     105.6 ± 2%        ~ (p=0.413 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                     120.0 ± 4%                     118.9 ± 7%        ~ (p=0.117 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                      112.5 ± 4%                     105.9 ± 4%   -5.87% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                      87.13 ± 4%                    123.55 ± 4%  +41.80% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                      113.4 ± 2%                     103.3 ± 2%   -8.95% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                      65.55 ± 3%                    121.30 ± 3%  +85.05% (p=0.002 n=6)
    geomean                                                                                                90.81                          106.8       +17.57%
2024-11-01 13:23:06 +01:00
Patrick Ohly
814c9428fd DRA scheduler: cache compiled CEL expressions
DeviceClasses and different requests are very likely to contain the same
expression string. We don't need to compile that over and over again.

To avoid hanging onto that cache longer than necessary, it's currently tied to
each PreFilter/Filter combination. It might make sense to move this up into the
scheduler plugin and thus reuse compiled expressions for different pods.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                        after                        │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base               │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                      33.95 ± 4%                     36.65 ± 2%   +7.95% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                     105.8 ± 2%                     106.7 ± 3%        ~ (p=0.177 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                     100.7 ± 1%                     119.7 ± 3%  +18.82% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                      90.78 ± 1%                    121.10 ± 4%  +33.40% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                      50.51 ± 7%                     63.72 ± 3%  +26.17% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                      103.7 ± 5%                     110.2 ± 2%   +6.32% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                      28.50 ± 2%                     28.16 ± 5%        ~ (p=0.102 n=6)
    geomean                                                                                                64.99                          73.15       +12.56%
2024-11-01 13:20:06 +01:00
Patrick Ohly
941d17b3b8 DRA scheduler: code cleanups
Looking up the slice can be avoided by storing it when allocating a device.
The AllocationResult struct is small enough that it can be copied by value.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                       after                        │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base              │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                      33.30 ± 2%                     33.95 ± 4%       ~ (p=0.288 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                     105.3 ± 2%                     105.8 ± 2%       ~ (p=0.524 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                     100.8 ± 1%                     100.7 ± 1%       ~ (p=0.738 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                      90.96 ± 2%                     90.78 ± 1%       ~ (p=0.952 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                      49.84 ± 4%                     50.51 ± 7%       ~ (p=0.485 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                      103.8 ± 1%                     103.7 ± 5%       ~ (p=0.582 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                      27.21 ± 7%                     28.50 ± 2%       ~ (p=0.065 n=6)
    geomean                                                                                                64.26                          64.99       +1.14%
2024-11-01 13:19:51 +01:00
Patrick Ohly
1246898315 DRA scheduler: ResourceSlice with unique strings
Using unique strings instead of normal strings speeds up allocation with
structured parameters because maps that use those strings as key no longer need
to build hashes of the string content. However, care must be taken to call
unique.Make as little as possible because it is costly.

Pre-allocating the map of allocated devices reduces the need to grow the map
when adding devices.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                        after                         │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base                │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                     18.06 ±  2%                     33.30 ± 2%   +84.31% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                    104.7 ±  2%                     105.3 ± 2%         ~ (p=0.818 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                    96.62 ±  1%                    100.75 ± 1%    +4.28% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                     83.00 ±  2%                     90.96 ± 2%    +9.59% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                     32.45 ±  7%                     49.84 ± 4%   +53.60% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                     95.22 ±  7%                    103.80 ± 1%    +9.00% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                     9.111 ± 10%                    27.215 ± 7%  +198.69% (p=0.002 n=6)
    geomean                                                                                               45.86                           64.26        +40.12%
2024-11-01 13:19:48 +01:00
Patrick Ohly
7de6d070f2 DRA scheduler: avoid listing claims during Filter
The Allocate call used to call back into the claim lister for each node. This
was significant work which showed up at the top of the CPU profile. It's
okay to list only once during PreFilter because the Filter call does not change
the claim status between Allocate calls.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                        after                        │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base               │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                      15.04 ± 0%                    18.06 ±  2%  +20.07% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                     105.5 ± 1%                    104.7 ±  2%        ~ (p=0.485 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                     95.83 ± 1%                    96.62 ±  1%        ~ (p=0.063 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                      79.67 ± 3%                    83.00 ±  2%   +4.18% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                      27.11 ± 5%                    32.45 ±  7%  +19.68% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                      84.00 ± 3%                    95.22 ±  7%  +13.36% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                      7.110 ± 6%                    9.111 ± 10%  +28.15% (p=0.002 n=6)
    geomean                                                                                                41.05                         45.86        +11.73%
2024-11-01 12:43:17 +01:00
Kubernetes Prow Robot
223ac36b50 Merge pull request #128399 from JesseStutler/dra
Refactor the dynamicResources struct to DynamicResources
2024-11-01 00:33:27 +00:00
Kubernetes Prow Robot
daef8c2419 Merge pull request #127266 from pohly/dra-admin-access-in-status
DRA API: AdminAccess in DeviceRequestAllocationResult + DRAAdminAccess feature gate
2024-10-30 03:41:25 +00:00
Kubernetes Prow Robot
988769933e Merge pull request #128307 from NoicFank/bugfix-scheduler-preemption
bugfix(scheduler): preemption picks wrong victim node with higher priority pod on it
2024-10-29 19:05:02 +00:00
NoicFank
68f7a7c682 bugfix(scheduler): preemption picks wrong victim node with higher priority pod on it.
Introducing pdb to preemption had disrupted the orderliness of pods in the victims,
which would leads picking wrong victim node with higher priority pod on it.
2024-10-29 19:50:55 +08:00
Patrick Ohly
4419568259 DRA: treat AdminAccess as a new feature gated field
Using the "normal" logic for a feature gated field simplifies the
implementation of the feature gate.

There is one (entirely theoretic!) problem with updating from 1.31: if a claim
was allocated in 1.31 with admin access, the status field was not set because
it didn't exist yet. If a driver now follows the current definition of "unset =
off", then it will not grant admin access even though it should. This is
theoretic because drivers are starting to support admin access with 1.32, so
there shouldn't be any claim where this problem could occur.
2024-10-29 10:22:31 +01:00
Patrick Ohly
9a7e4ccab2 DRA admin access: add feature gate
The new DRAAdminAccess feature gate has the following effects:
- If disabled in the apiserver, the spec.devices.requests[*].adminAccess
  field gets cleared. Same in the status. In both cases the scenario
  that it was already set and a claim or claim template get updated
  is special: in those cases, the field is not cleared.

  Also, allocating a claim with admin access is allowed regardless of the
  feature gate and the field is not cleared. In practice, the scheduler
  will not do that.
- If disabled in the resource claim controller, creating ResourceClaims
  with the field set gets rejected. This prevents running workloads
  which depend on admin access.
- If disabled in the scheduler, claims with admin access don't get
  allocated. The effect is the same.

The alternative would have been to ignore the fields in claim controller and
scheduler. This is bad because a monitoring workload then runs, blocking
resources that probably were meant for production workloads.
2024-10-29 09:50:11 +01:00