146 Commits

Author SHA1 Message Date
Abu Kashem
747a295cac fix flake in dra test 'TestPlugin'
TestPlugin/multi-claims-binding-conditions-all-success/PreEnqueue
flakes due to the assumed cache not been synced with the initial
store. The test waits until the registered handler used by the
assumed cache has synced before proceeding with the test
2025-08-18 15:54:03 -04:00
Abu Kashem
c8ab780edb dra plugin: assume claim after api call in bindClaim 2025-08-13 16:35:35 -04:00
yliao
2a026f6d65 1/ added retries to AssumeClaimAfterAPICall for the object which is not present in the cache (dynamicresources.go)
2/ modified the assume cache verification to not error out as long as
the expected claim is in the cache, no matter its latest and api object
are different or not. (dynamicresources_test.go).
3/ fixed nil panic as seen from https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/133321/pull-kubernetes-integration/1952472629470302208
2025-08-06 07:08:58 +00:00
yliao
0a12f00e9d fix nil panic in hasBindingConditions, it cannot assume claim has allocations 2025-07-30 14:44:41 +09:00
Sunyanan Choochotkaew
7f052afaef KEP 5075: implement scheduler
Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>
2025-07-30 09:52:49 +09:00
yliao
34a64db2c7 extended resource backed by DRA: implementation 2025-07-29 18:55:21 +00:00
Kobayashi,Daisuke
e8c3af1f5c KEP-5007 DRA Device Binding Conditions: Implement scheduler logic 2025-07-29 11:34:30 +00:00
Kubernetes Prow Robot
a11bc701e8 Merge pull request #132457 from ania-borowiec/depends_on_cluster_move_podinfo
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler
2025-07-24 09:38:27 -07:00
Ania Borowiec
aecd37e6fb Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler 2025-07-24 12:10:58 +00:00
Kubernetes Prow Robot
89a01ec72a Merge pull request #133019 from pohly/dra-scheduler-plugin-owners
DRA scheduler plugin: add pohly as approver
2025-07-24 03:42:33 -07:00
Patrick Ohly
5c4f81743c DRA: use v1 API
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.

However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
2025-07-24 08:33:45 +02:00
Ed Bartosh
c2a06e7912 DRA: skip flaky test case on Windows
Added a skipOnWindows flag to DynamicResources scheduler test case
to skip test that relies on nanosecond timer precision.
Windows timer granularity is much coarser than Linux, which causes
the test to fail often.
2025-07-23 11:06:11 +03:00
Patrick Ohly
bc338e7505 DRA scheduler: implement filter timeout and cancellation
The intent is to catch abnormal runtimes with the generously large default
timeout of 10 seconds.

We have to set up a context with the configured timeout (optional!), then
ensure that both CEL evaluation and the allocation logic itself properly
returns the context error. The scheduler plugin then can convert that into
"unschedulable".

The allocator and thus Filter now also check for context cancellation by the
scheduler. This happens when enough nodes have been found.
2025-07-17 21:18:28 +02:00
Patrick Ohly
025c606e39 DRA scheduler: add plugin configuration
The only option is the filter timeout.
The implementation of it follows in a separate commit.
2025-07-17 16:47:47 +02:00
Patrick Ohly
a2a3839a8e DRA scheduler: add pohly as approver
This is meant for simple changes, like code cleanup or API changes of the
allocator code. For more complex changes and new features, SIG Scheduling
approvers will be required to approve, as before.
2025-07-17 09:43:44 +02:00
yliao
dd3691b169 refactor allocator, removed claimsToAllocate from NewAllocator(), instead, passed it through Allocate() 2025-07-16 15:11:11 +00:00
Kubernetes Prow Robot
ab685237f0 Merge pull request #132391 from sanposhiho/pre-bind-pre-flight
feat: add PreBindPreFlight and implement in in-tree plugins
2025-07-15 04:06:23 -07:00
Patrick Ohly
5caf7bca15 DRA allocator: refactor code
The goal is to maintain different version of the allocator logic. We already
had one incidence where adding an alpha feature caused a regression also when
it was disabled. Not everything can be implemented within obviously correct if
branches.

This also opens the door for implementing different alternatives.

The code just gets moved around for now.
2025-07-10 17:34:21 +02:00
Kensei Nakada
ebae419337 feat: add PreBindPreFlight and implement in in-tree plugins 2025-07-05 17:14:21 -07:00
Ania Borowiec
ee8c265d35 Move Code and Status from pkg/scheduler/framework to k8s.io/kube-scheduler/framework 2025-06-30 10:06:22 +00:00
Ania Borowiec
00d3750503 Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes (#132190)
* Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes

apply review comment and fix linter warning

* update-vendor.sh

* update doc comments

* run update-vendor.sh
2025-06-26 08:06:29 -07:00
Davanum Srinivas
03afe6471b Add a replacement for cmp.Diff using json+go-difflib
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2025-06-16 17:10:42 -04:00
Ania Borowiec
d75af825fb Extract interface CycleState and move is to staging repo. CycleState implementation remains in k/k/pkg/scheduler/framework 2025-05-29 16:18:36 +00:00
Kubernetes Prow Robot
8a6b916765 Merge pull request #130720 from saintube/scheduler-expose-nodeinfo-in-prefilter
Expose NodeInfo to PreFilter plugins
2025-04-23 13:31:29 -07:00
saintube
8dc6806d26 Expose NodeInfo to PreFilter plugins and Framework
Co-authored-by: Zhan Sheng <49895476+AxeZhan@users.noreply.github.com>
Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com>
Signed-off-by: saintube <saintube@foxmail.com>
2025-03-21 14:55:25 +08:00
Cici Huang
f04cfdf6e7 Update gofmt. 2025-03-19 23:21:30 +00:00
Cici Huang
6d7f11689d Complete feature impl, fix issues, add perDeviceNodeSelection support, add tests, address comments, etc. 2025-03-19 22:10:48 +00:00
Morten Torkildsen
ecba6cde1d Allocator updates 2025-03-19 22:10:48 +00:00
Kubernetes Prow Robot
ab3cec0701 Merge pull request #130447 from pohly/dra-device-taints
device taints and tolerations (KEP 5055)
2025-03-19 13:00:32 -07:00
Jon Huhn
5760a4f282 DRA scheduler: device taints and tolerations
Thanks to the tracker, the plugin sees all taints directly in the device
definition and can compare it against the tolerations of a request while
trying to find a device for the request.

When the feature is turnedd off, taints are ignored during scheduling.
2025-03-19 09:18:38 +01:00
Patrick Ohly
a027b439e5 DRA: add device taint eviction controller
The controller is derived from the node taint eviction controller.
In contrast to that controller it tracks the UID of pods to prevent
deleting the wrong pod when it got replaced.
2025-03-19 09:18:38 +01:00
Patrick Ohly
d95d6ba526 DRA scheduler: fix potential panic during unit test verification
If there was an unexpected status, the code extracting the expected error
message crashed with a panic. Happened once so far, for unknown reasons
because the unexpected status then didn't get logged.
2025-03-18 15:07:51 +01:00
Patrick Ohly
dfb8ab6521 DRA scheduler: fail in PreFilter when DRAPrioritizedList is disabled and used
This was previously caught during Filter by the allocator check. Doing it
sooner avoids wasting resources on a pod which ultimately cannot get scheduled.

While at it, be a bit more clear about which feature is disabled. The user
might not know that.
2025-03-07 08:45:32 +01:00
Morten Torkildsen
2229a78dfe DRA: Update allocator for Prioritized Alternatives in Device Requests 2025-02-28 19:30:10 +00:00
Kubernetes Prow Robot
fc268ecd09 Merge pull request #129823 from googs1025/chore/log_improve
fix(dra plugin): when there is no resourceclaim, return directly
2025-02-02 16:28:56 -08:00
googs1025
ed826dddfe fix(dra plugin): when there is no resourceclaim, return directly 2025-01-29 08:47:52 +08:00
Davanum Srinivas
4e05bc20db Linter to ensure go-cmp/cmp is used ONLY in tests
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2025-01-24 20:49:14 -05:00
Paco Xu
2653caa248 fix dra test lint 2025-01-09 10:42:40 +08:00
googs1025
77eae7c34f feature(scheduler): remove dra plugin resourceslice QueueingHintFn 2025-01-08 16:24:28 +08:00
Patrick Ohly
33ea278c51 DRA: use v1beta1 API
No code is left which depends on the v1alpha3, except of course the code
implementing that version.
2024-11-06 13:03:19 +01:00
Kuba Tużnik
8d489425aa scheduler/dynamicresources: extract obtaining and tracking in-memory modifications of DRA objects
All logic related to obtaining DRA objects and tracking modifications
to ResourceClaims in-memory is extracted to DefaultDRAManager, which
implements framework.SharedDRAManager.

This is intended to be a no-op in terms of the DRA plugin behavior.
2024-11-05 14:11:04 +01:00
Patrick Ohly
7863d9a381 DRA scheduler: refactor CEL compilation cache
A better place is the cel package because a) the name can become shorter
and b) it is tightly coupled with the compiler there.

Moving the compilation into the cache simplifies the callers.
2024-11-05 08:34:42 +01:00
Patrick Ohly
6f07fa3a5e DRA scheduler: update some stale comments 2024-11-01 13:23:42 +01:00
Patrick Ohly
ae6b5522ea DRA scheduler: rename variable
"Allocated devices" are the ones which can be observed from the informer. "All
allocated devices" also includes those which are in flight and haven't been
written back to the apiserver.
2024-11-01 13:23:42 +01:00
Patrick Ohly
0130ebba1d DRA scheduler: refactor "allocated devices" lookup
The logic for skipping "admin access" was repeated in three different places. A
single foreachAllocatedDevices with a callback puts it into one function.
2024-11-01 13:23:28 +01:00
Patrick Ohly
bd7ff9c4c7 DRA scheduler: update some log strings 2024-11-01 13:23:11 +01:00
Patrick Ohly
bc55e82621 DRA scheduler: maintain a set of allocated device IDs
Reacting to events from the informer cache (indirectly, through the assume
cache) is more efficient than repeatedly listing it's content and then
converting to IDs with unique strings.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                        after                        │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base               │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                      54.70 ± 6%                     76.81 ± 6%  +40.42% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                     106.4 ± 4%                     105.6 ± 2%        ~ (p=0.413 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                     120.0 ± 4%                     118.9 ± 7%        ~ (p=0.117 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                      112.5 ± 4%                     105.9 ± 4%   -5.87% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                      87.13 ± 4%                    123.55 ± 4%  +41.80% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                      113.4 ± 2%                     103.3 ± 2%   -8.95% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                      65.55 ± 3%                    121.30 ± 3%  +85.05% (p=0.002 n=6)
    geomean                                                                                                90.81                          106.8       +17.57%
2024-11-01 13:23:06 +01:00
Patrick Ohly
814c9428fd DRA scheduler: cache compiled CEL expressions
DeviceClasses and different requests are very likely to contain the same
expression string. We don't need to compile that over and over again.

To avoid hanging onto that cache longer than necessary, it's currently tied to
each PreFilter/Filter combination. It might make sense to move this up into the
scheduler plugin and thus reuse compiled expressions for different pods.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                        after                        │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base               │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                      33.95 ± 4%                     36.65 ± 2%   +7.95% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                     105.8 ± 2%                     106.7 ± 3%        ~ (p=0.177 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                     100.7 ± 1%                     119.7 ± 3%  +18.82% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                      90.78 ± 1%                    121.10 ± 4%  +33.40% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                      50.51 ± 7%                     63.72 ± 3%  +26.17% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                      103.7 ± 5%                     110.2 ± 2%   +6.32% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                      28.50 ± 2%                     28.16 ± 5%        ~ (p=0.102 n=6)
    geomean                                                                                                64.99                          73.15       +12.56%
2024-11-01 13:20:06 +01:00
Patrick Ohly
941d17b3b8 DRA scheduler: code cleanups
Looking up the slice can be avoided by storing it when allocating a device.
The AllocationResult struct is small enough that it can be copied by value.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                       after                        │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base              │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                      33.30 ± 2%                     33.95 ± 4%       ~ (p=0.288 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                     105.3 ± 2%                     105.8 ± 2%       ~ (p=0.524 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                     100.8 ± 1%                     100.7 ± 1%       ~ (p=0.738 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                      90.96 ± 2%                     90.78 ± 1%       ~ (p=0.952 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                      49.84 ± 4%                     50.51 ± 7%       ~ (p=0.485 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                      103.8 ± 1%                     103.7 ± 5%       ~ (p=0.582 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                      27.21 ± 7%                     28.50 ± 2%       ~ (p=0.065 n=6)
    geomean                                                                                                64.26                          64.99       +1.14%
2024-11-01 13:19:51 +01:00
Patrick Ohly
1246898315 DRA scheduler: ResourceSlice with unique strings
Using unique strings instead of normal strings speeds up allocation with
structured parameters because maps that use those strings as key no longer need
to build hashes of the string content. However, care must be taken to call
unique.Make as little as possible because it is costly.

Pre-allocating the map of allocated devices reduces the need to grow the map
when adding devices.

    goos: linux
    goarch: amd64
    pkg: k8s.io/kubernetes/test/integration/scheduler_perf
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                                       │            before            │                        after                         │
                                                                                       │ SchedulingThroughput/Average │ SchedulingThroughput/Average  vs base                │
    PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36                     18.06 ±  2%                     33.30 ± 2%   +84.31% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36                    104.7 ±  2%                     105.3 ± 2%         ~ (p=0.818 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36                    96.62 ±  1%                    100.75 ± 1%    +4.28% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36                     83.00 ±  2%                     90.96 ± 2%    +9.59% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36                     32.45 ±  7%                     49.84 ± 4%   +53.60% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36                     95.22 ±  7%                    103.80 ± 1%    +9.00% (p=0.002 n=6)
    PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36                     9.111 ± 10%                    27.215 ± 7%  +198.69% (p=0.002 n=6)
    geomean                                                                                               45.86                           64.26        +40.12%
2024-11-01 13:19:48 +01:00