3894 Commits

Author SHA1 Message Date
Abu Kashem
747a295cac fix flake in dra test 'TestPlugin'
TestPlugin/multi-claims-binding-conditions-all-success/PreEnqueue
flakes due to the assumed cache not been synced with the initial
store. The test waits until the registered handler used by the
assumed cache has synced before proceeding with the test
2025-08-18 15:54:03 -04:00
Abu Kashem
c8ab780edb dra plugin: assume claim after api call in bindClaim 2025-08-13 16:35:35 -04:00
yliao
2a026f6d65 1/ added retries to AssumeClaimAfterAPICall for the object which is not present in the cache (dynamicresources.go)
2/ modified the assume cache verification to not error out as long as
the expected claim is in the cache, no matter its latest and api object
are different or not. (dynamicresources_test.go).
3/ fixed nil panic as seen from https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/133321/pull-kubernetes-integration/1952472629470302208
2025-08-06 07:08:58 +00:00
Kubernetes Prow Robot
b37978f226 Merge pull request #133334 from macsko/fix_potential_race_in_patchpodstatus_api_call_implementation
Fix potential race in PodStatusPatchCall implementation
2025-08-01 02:47:45 -07:00
Kubernetes Prow Robot
ea81dd6d01 Merge pull request #133309 from macsko/fix_race_with_closing_api_dispatcher
Fix potential race in closing API dispatcher
2025-08-01 02:47:38 -07:00
Maciej Skoczeń
9eda4789c0 Fix potential race in PodStatusPatchCall implementation 2025-07-31 09:27:40 +00:00
Maciej Skoczeń
dbfeb9c351 Fix potential race in closing API dispatcher 2025-07-30 11:57:26 +00:00
Kensei Nakada
46b858aa13 fix: return false to apply the patch 2025-07-30 19:12:57 +09:00
yliao
0a12f00e9d fix nil panic in hasBindingConditions, it cannot assume claim has allocations 2025-07-30 14:44:41 +09:00
Sunyanan Choochotkaew
7f052afaef KEP 5075: implement scheduler
Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>
2025-07-30 09:52:49 +09:00
Sunyanan Choochotkaew
5ad969588d KEP-5075: API updates
Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>
2025-07-30 09:26:40 +09:00
Kubernetes Prow Robot
1b273b385e Merge pull request #130653 from yliaog/master
kubelet and scheduler for extended resource backed by DRA
2025-07-29 13:04:27 -07:00
yliao
34a64db2c7 extended resource backed by DRA: implementation 2025-07-29 18:55:21 +00:00
Kubernetes Prow Robot
74f7a44966 Merge pull request #133276 from macsko/stop_clearing_nnn_in_all_cases
KEP-5278 Stop clearing NominatedNodeName in all cases
2025-07-29 11:24:40 -07:00
Kubernetes Prow Robot
e2ab840708 Merge pull request #130160 from KobayashiD27/dra-device-binding-conditions
Implement DRA Device Binding Conditions (KEP-5007)
2025-07-29 07:34:26 -07:00
Maciej Skoczeń
aea0a3cca2 Run all relevant test cases with the feature gate enabled and disabled 2025-07-29 12:21:03 +00:00
utam0k
856e7d2383 scheduler: Stop clearing NominatedNodeName on all cases
Signed-off-by: utam0k <k0ma@utam0k.jp>
2025-07-29 12:21:03 +00:00
Kobayashi,Daisuke
e8c3af1f5c KEP-5007 DRA Device Binding Conditions: Implement scheduler logic 2025-07-29 11:34:30 +00:00
Kensei Nakada
ac9fad6030 feat: trigger PreFilterPreBind in the binding cycle 2025-07-29 19:01:02 +09:00
Kensei Nakada
f3466f8adc fix: flake integration test 2025-07-28 23:12:58 +09:00
Kensei Nakada
ed74d4cd52 Revert "Revert "fix: handle corner cases in the async preemption""
This reverts commit 006d7620a8.
2025-07-28 20:22:27 +09:00
Kubernetes Prow Robot
2a03dd1d5e Merge pull request #133120 from utam0k/kep-5229-metrics
KEP-5229: Add the metrics
2025-07-28 03:58:36 -07:00
Maciej Skoczeń
17d733e243 KEP-5229: Send API calls through dispatcher and cache 2025-07-25 15:35:36 +00:00
utam0k
b956484c25 KEP-5229: Add metrics for async API dispatcher
Signed-off-by: utam0k <k0ma@utam0k.jp>
2025-07-25 19:29:14 +09:00
Paco Xu
006d7620a8 Revert "fix: handle corner cases in the async preemption" 2025-07-25 10:38:34 +08:00
Kubernetes Prow Robot
5be5fd0229 Merge pull request #133167 from sanposhiho/preemption-conor-case
fix: handle corner cases in the async preemption
2025-07-24 13:05:18 -07:00
Kubernetes Prow Robot
8d28109a2b Merge pull request #133174 from sanposhiho/log-level-preemption
fix: adjust the log level in the preemption
2025-07-24 09:38:57 -07:00
Kubernetes Prow Robot
a11bc701e8 Merge pull request #132457 from ania-borowiec/depends_on_cluster_move_podinfo
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler
2025-07-24 09:38:27 -07:00
Kensei Nakada
8a2db4da42 fix: adjust the log level in the preemption 2025-07-24 22:45:39 +09:00
Kensei Nakada
4c9bf4719b fix: handle cornor cases in the async preepmtion 2025-07-24 22:44:39 +09:00
Ania Borowiec
aecd37e6fb Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler 2025-07-24 12:10:58 +00:00
Kubernetes Prow Robot
89a01ec72a Merge pull request #133019 from pohly/dra-scheduler-plugin-owners
DRA scheduler plugin: add pohly as approver
2025-07-24 03:42:33 -07:00
Patrick Ohly
24de875ceb DRA: graduate DynamicResourceAllocation feature to GA
It hasn't been on-by-default before, therefore it does not get locked to the
new default on yet. This has some impact on the scheduler configuration
because the plugin is now enabled by default.

Because the feature is now GA, it doesn't need to be a label on E2E tests,
which wouldn't be possible anyway once it gets removed entirely.
2025-07-24 08:33:56 +02:00
Patrick Ohly
5c4f81743c DRA: use v1 API
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.

However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
2025-07-24 08:33:45 +02:00
Kubernetes Prow Robot
6d99828b80 Merge pull request #133127 from bart0sh/PR184-DRA-skip-flaky-test-on-Windows
DRA: skip flaky test case on Windows
2025-07-23 11:42:27 -07:00
Ed Bartosh
c2a06e7912 DRA: skip flaky test case on Windows
Added a skipOnWindows flag to DynamicResources scheduler test case
to skip test that relies on nanosecond timer precision.
Windows timer granularity is much coarser than Linux, which causes
the test to fail often.
2025-07-23 11:06:11 +03:00
Kubernetes Prow Robot
c1136abd3b Merge pull request #132354 from toVersus/test/plr-scheduler-metrics
[PodLevelResources] Verify scheduler resource metrics account for Pod Level Resources
2025-07-22 13:40:27 -07:00
Maciej Skoczeń
4fc9546e0e KEP-5229: Implement API dispatcher 2025-07-21 14:00:34 +00:00
Kubernetes Prow Robot
e6161070d4 Merge pull request #133071 from sanposhiho/remove-example-plug
cleanup: remove example plugins
2025-07-19 03:22:25 -07:00
Kubernetes Prow Robot
6535517f18 Merge pull request #132011 from togettoyou/fix-91633
cleanup: fix missed HandleError replacement with contextual logging
2025-07-18 21:38:25 -07:00
Kensei Nakada
4b8dd9612f cleanup: remove example plugins 2025-07-19 13:08:34 +09:00
Junhao Zou
a2e9e9f667 fix pass ctx into a revised HandleError function 2025-07-18 17:02:54 +08:00
Patrick Ohly
5cea72d564 DRA integration: add test case for FilterTimeout
This covers disabling the feature via the configuration, failing to schedule
because of timeouts for all nodes, and retrying after ResourceSlice changes with
partial success (timeout for one node, success for the other).

While at it, some helper code gets improved.
2025-07-17 21:18:28 +02:00
Patrick Ohly
bc338e7505 DRA scheduler: implement filter timeout and cancellation
The intent is to catch abnormal runtimes with the generously large default
timeout of 10 seconds.

We have to set up a context with the configured timeout (optional!), then
ensure that both CEL evaluation and the allocation logic itself properly
returns the context error. The scheduler plugin then can convert that into
"unschedulable".

The allocator and thus Filter now also check for context cancellation by the
scheduler. This happens when enough nodes have been found.
2025-07-17 21:18:28 +02:00
Patrick Ohly
025c606e39 DRA scheduler: add plugin configuration
The only option is the filter timeout.
The implementation of it follows in a separate commit.
2025-07-17 16:47:47 +02:00
Patrick Ohly
ee38a00131 DRA scheduler: add DRASchedulerFilterTimeout feature gate
Initializing the scheduler Features struct will be needed in different places,
therefore NewSchedulerFeaturesFromGates gets introduced. Besides, having it
next to the struct makes it easier to add new features.

The DRASchedulerFilterTimeout feature gate simplifies disabling the timeout
because setting a feature gate is often easier than modifying the scheduler
configuration with a zero timeout value.

The timeout and feature gate are new. The gate starts as beta and enabled by
default, which is consistent with the "smaller changes with low enough risk
that still may need to be disabled..." guideline.
2025-07-17 16:47:47 +02:00
Patrick Ohly
837ef29f5a scheduler: enhance and document Filter cancellation
When using context.CancelCause in the scheduler and context.Cause in plugins,
the status returned by plugins is more informative than just "context
canceled".

Context cancellation itself is not new, but many plugin authors probably
weren't aware of it because it wasn't documented.
2025-07-17 16:47:47 +02:00
Patrick Ohly
a2a3839a8e DRA scheduler: add pohly as approver
This is meant for simple changes, like code cleanup or API changes of the
allocator code. For more complex changes and new features, SIG Scheduling
approvers will be required to approve, as before.
2025-07-17 09:43:44 +02:00
yliao
dd3691b169 refactor allocator, removed claimsToAllocate from NewAllocator(), instead, passed it through Allocate() 2025-07-16 15:11:11 +00:00
Kubernetes Prow Robot
ab685237f0 Merge pull request #132391 from sanposhiho/pre-bind-pre-flight
feat: add PreBindPreFlight and implement in in-tree plugins
2025-07-15 04:06:23 -07:00