1863 Commits

Author SHA1 Message Date
Abu Kashem
747a295cac fix flake in dra test 'TestPlugin'
TestPlugin/multi-claims-binding-conditions-all-success/PreEnqueue
flakes due to the assumed cache not been synced with the initial
store. The test waits until the registered handler used by the
assumed cache has synced before proceeding with the test
2025-08-18 15:54:03 -04:00
Abu Kashem
c8ab780edb dra plugin: assume claim after api call in bindClaim 2025-08-13 16:35:35 -04:00
yliao
2a026f6d65 1/ added retries to AssumeClaimAfterAPICall for the object which is not present in the cache (dynamicresources.go)
2/ modified the assume cache verification to not error out as long as
the expected claim is in the cache, no matter its latest and api object
are different or not. (dynamicresources_test.go).
3/ fixed nil panic as seen from https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/133321/pull-kubernetes-integration/1952472629470302208
2025-08-06 07:08:58 +00:00
Maciej Skoczeń
9eda4789c0 Fix potential race in PodStatusPatchCall implementation 2025-07-31 09:27:40 +00:00
yliao
0a12f00e9d fix nil panic in hasBindingConditions, it cannot assume claim has allocations 2025-07-30 14:44:41 +09:00
Sunyanan Choochotkaew
7f052afaef KEP 5075: implement scheduler
Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>
2025-07-30 09:52:49 +09:00
Sunyanan Choochotkaew
5ad969588d KEP-5075: API updates
Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>
2025-07-30 09:26:40 +09:00
yliao
34a64db2c7 extended resource backed by DRA: implementation 2025-07-29 18:55:21 +00:00
Kubernetes Prow Robot
e2ab840708 Merge pull request #130160 from KobayashiD27/dra-device-binding-conditions
Implement DRA Device Binding Conditions (KEP-5007)
2025-07-29 07:34:26 -07:00
Kobayashi,Daisuke
e8c3af1f5c KEP-5007 DRA Device Binding Conditions: Implement scheduler logic 2025-07-29 11:34:30 +00:00
Kensei Nakada
ac9fad6030 feat: trigger PreFilterPreBind in the binding cycle 2025-07-29 19:01:02 +09:00
Kensei Nakada
f3466f8adc fix: flake integration test 2025-07-28 23:12:58 +09:00
Kensei Nakada
ed74d4cd52 Revert "Revert "fix: handle corner cases in the async preemption""
This reverts commit 006d7620a8.
2025-07-28 20:22:27 +09:00
Maciej Skoczeń
17d733e243 KEP-5229: Send API calls through dispatcher and cache 2025-07-25 15:35:36 +00:00
Paco Xu
006d7620a8 Revert "fix: handle corner cases in the async preemption" 2025-07-25 10:38:34 +08:00
Kubernetes Prow Robot
5be5fd0229 Merge pull request #133167 from sanposhiho/preemption-conor-case
fix: handle corner cases in the async preemption
2025-07-24 13:05:18 -07:00
Kubernetes Prow Robot
8d28109a2b Merge pull request #133174 from sanposhiho/log-level-preemption
fix: adjust the log level in the preemption
2025-07-24 09:38:57 -07:00
Kubernetes Prow Robot
a11bc701e8 Merge pull request #132457 from ania-borowiec/depends_on_cluster_move_podinfo
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler
2025-07-24 09:38:27 -07:00
Kensei Nakada
8a2db4da42 fix: adjust the log level in the preemption 2025-07-24 22:45:39 +09:00
Kensei Nakada
4c9bf4719b fix: handle cornor cases in the async preepmtion 2025-07-24 22:44:39 +09:00
Ania Borowiec
aecd37e6fb Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler 2025-07-24 12:10:58 +00:00
Kubernetes Prow Robot
89a01ec72a Merge pull request #133019 from pohly/dra-scheduler-plugin-owners
DRA scheduler plugin: add pohly as approver
2025-07-24 03:42:33 -07:00
Patrick Ohly
5c4f81743c DRA: use v1 API
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.

However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
2025-07-24 08:33:45 +02:00
Ed Bartosh
c2a06e7912 DRA: skip flaky test case on Windows
Added a skipOnWindows flag to DynamicResources scheduler test case
to skip test that relies on nanosecond timer precision.
Windows timer granularity is much coarser than Linux, which causes
the test to fail often.
2025-07-23 11:06:11 +03:00
Kensei Nakada
4b8dd9612f cleanup: remove example plugins 2025-07-19 13:08:34 +09:00
Patrick Ohly
bc338e7505 DRA scheduler: implement filter timeout and cancellation
The intent is to catch abnormal runtimes with the generously large default
timeout of 10 seconds.

We have to set up a context with the configured timeout (optional!), then
ensure that both CEL evaluation and the allocation logic itself properly
returns the context error. The scheduler plugin then can convert that into
"unschedulable".

The allocator and thus Filter now also check for context cancellation by the
scheduler. This happens when enough nodes have been found.
2025-07-17 21:18:28 +02:00
Patrick Ohly
025c606e39 DRA scheduler: add plugin configuration
The only option is the filter timeout.
The implementation of it follows in a separate commit.
2025-07-17 16:47:47 +02:00
Patrick Ohly
ee38a00131 DRA scheduler: add DRASchedulerFilterTimeout feature gate
Initializing the scheduler Features struct will be needed in different places,
therefore NewSchedulerFeaturesFromGates gets introduced. Besides, having it
next to the struct makes it easier to add new features.

The DRASchedulerFilterTimeout feature gate simplifies disabling the timeout
because setting a feature gate is often easier than modifying the scheduler
configuration with a zero timeout value.

The timeout and feature gate are new. The gate starts as beta and enabled by
default, which is consistent with the "smaller changes with low enough risk
that still may need to be disabled..." guideline.
2025-07-17 16:47:47 +02:00
Patrick Ohly
837ef29f5a scheduler: enhance and document Filter cancellation
When using context.CancelCause in the scheduler and context.Cause in plugins,
the status returned by plugins is more informative than just "context
canceled".

Context cancellation itself is not new, but many plugin authors probably
weren't aware of it because it wasn't documented.
2025-07-17 16:47:47 +02:00
Patrick Ohly
a2a3839a8e DRA scheduler: add pohly as approver
This is meant for simple changes, like code cleanup or API changes of the
allocator code. For more complex changes and new features, SIG Scheduling
approvers will be required to approve, as before.
2025-07-17 09:43:44 +02:00
yliao
dd3691b169 refactor allocator, removed claimsToAllocate from NewAllocator(), instead, passed it through Allocate() 2025-07-16 15:11:11 +00:00
Kubernetes Prow Robot
ab685237f0 Merge pull request #132391 from sanposhiho/pre-bind-pre-flight
feat: add PreBindPreFlight and implement in in-tree plugins
2025-07-15 04:06:23 -07:00
Kubernetes Prow Robot
e3b20c07d6 Merge pull request #132870 from pohly/dra-allocator
DRA: refactor claim allocator
2025-07-15 01:28:29 -07:00
Patrick Ohly
5caf7bca15 DRA allocator: refactor code
The goal is to maintain different version of the allocator logic. We already
had one incidence where adding an alpha feature caused a regression also when
it was disabled. Not everything can be implemented within obviously correct if
branches.

This also opens the door for implementing different alternatives.

The code just gets moved around for now.
2025-07-10 17:34:21 +02:00
Pawel Mechlinski
f2b24b9849 Increase verbosity of frequently printed loglines in binder plugin 2025-07-09 12:10:10 +00:00
Junhao Zou
1b730abf8d cleanup: use HandleErrorWithXXX instead of logger.Error where errors are intentionally ignored 2025-07-08 09:34:49 +08:00
Kensei Nakada
ebae419337 feat: add PreBindPreFlight and implement in in-tree plugins 2025-07-05 17:14:21 -07:00
Ania Borowiec
ee8c265d35 Move Code and Status from pkg/scheduler/framework to k8s.io/kube-scheduler/framework 2025-06-30 10:06:22 +00:00
Ania Borowiec
00d3750503 Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes (#132190)
* Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes

apply review comment and fix linter warning

* update-vendor.sh

* update doc comments

* run update-vendor.sh
2025-06-26 08:06:29 -07:00
Davanum Srinivas
03afe6471b Add a replacement for cmp.Diff using json+go-difflib
Co-authored-by: Jordan Liggitt <jordan@liggitt.net>
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2025-06-16 17:10:42 -04:00
Kubernetes Prow Robot
2d1bb8dac1 Merge pull request #132040 from avrittrohwer/master
Make nodeports scheduling plugin restartable initContainer aware
2025-06-10 17:39:02 -07:00
carlory
0896693693 fix TestNodeAffinityPriority: calculate the priorities correctly even if PreScore is not called 2025-06-06 16:03:46 +08:00
Avritt Rohwer
087554448c Make nodeports scheduling plugin sidecar initContainer aware 2025-06-06 02:26:05 +00:00
Kubernetes Prow Robot
e0859f91b7 Merge pull request #131887 from ania-borowiec/extract_cyclestate_interface
Moving Scheduler interfaces to staging: split CycleState into interface and implementation, move interface to staging repo
2025-05-30 04:00:18 -07:00
Ania Borowiec
d75af825fb Extract interface CycleState and move is to staging repo. CycleState implementation remains in k/k/pkg/scheduler/framework 2025-05-29 16:18:36 +00:00
Kubernetes Prow Robot
86da819709 Merge pull request #131693 from ania-borowiec/staging_repo_refactoring_action_type
Remove package protected fields from ActionType
2025-05-23 05:30:35 -07:00
Ania Borowiec
151d9d79f4 Remove package protected field updatePodOther from ActionType. Make ActionType.None public 2025-05-23 09:51:35 +00:00
Kubernetes Prow Robot
0afe2b839d Merge pull request #129983 from nickbp/master
feature(scheduler): Customizable pod selection and ordering in DefaultPreemption plugin
2025-05-20 05:09:15 -07:00
Kubernetes Prow Robot
82db38a23c Merge pull request #128748 from sanposhiho/attempt-incre
feat: introduce pInfo.UnschedulableCount to make the backoff calculation more appropriate
2025-05-19 01:21:15 -07:00
Kensei Nakada
adc4916dfe feat: introduce pInfo.UnschedulableCount to make the backoff calculation more appropriate 2025-05-17 12:39:58 +02:00