Commit Graph

180 Commits

Author SHA1 Message Date
Maciej Skoczeń
3eefd62f94 Make update and delete active queue methods 2024-08-22 09:26:05 +00:00
Maciej Skoczeń
9773a39b28 Don't expose lock outside activeQueue in scheduling queue 2024-08-22 09:21:35 +00:00
Maciej Skoczeń
a7ad94f93b Unexport podRef in scheduling queue's nominator 2024-08-21 07:25:57 +00:00
Maciej Skoczeń
e303808896 Move scheduling queue's nominator to a separate file 2024-08-21 07:25:55 +00:00
Maciej Skoczeń
33815db3c1 Move NominatedPodsForNode to scheduling queue directly 2024-08-21 07:24:52 +00:00
Maciej Skoczeń
8e630a9f68 Move activeQ related fields to separate struct in scheduling queue 2024-08-19 07:35:31 +00:00
Maciej Skoczeń
6b33e2e632 Use generics in scheduling queue's heap 2024-07-24 06:55:47 +00:00
Kubernetes Prow Robot
39a80796b6 Merge pull request #122628 from sanposhiho/pod-smaller-events
add(scheduler/framework): implement smaller Pod update events
2024-07-23 18:01:46 -07:00
Kubernetes Prow Robot
43691598da Merge pull request #126227 from sanposhiho/queueing_hint_execution_duration_seconds
feature: support queueing_hint_execution_duration_seconds metric
2024-07-23 02:12:29 -07:00
Kensei Nakada
2a51bd81fa fix: async metric recording 2024-07-22 21:32:19 +09:00
Kensei Nakada
82a54e8cc8 cleanup: remove duplicated addNominatedPodUnlocked 2024-07-21 16:04:25 +09:00
Kensei Nakada
fa8092f838 support UpdatePodScaleDown instead of UpdatePodRequest 2024-07-20 19:20:38 +09:00
Kensei Nakada
0dee497876 fix: make updatePodOther private 2024-07-20 17:49:46 +09:00
Kensei Nakada
0cd1ee4259 add(scheduler/framework): implement smaller Pod update events 2024-07-20 17:44:23 +09:00
Kensei Nakada
7ef3cf5d07 feature: support queueing_hint_execution_duration_seconds metric 2024-07-19 23:13:07 +09:00
Maciej Skoczeń
7421ded6f9 Don't lock activeQ twice when activating pod in scheduling queue 2024-07-19 09:18:42 +00:00
Kubernetes Prow Robot
d879103c28 Merge pull request #125820 from macsko/add_separate_lock_for_pod_nominator_scheduling_queue
Add a separate lock for pod nominator in scheduling queue
2024-07-17 12:06:10 -07:00
Maciej Skoczeń
5def93b10a Add a separate lock for pod nominator in scheduling queue 2024-07-17 07:58:59 +00:00
Kubernetes Prow Robot
ae1caa40a2 Merge pull request #125961 from Jerry-yz/master
Chore: fix scheduler code comment typos
2024-07-15 19:27:30 -07:00
Maciej Skoczeń
31e89b1f4d Add activeQLock to scheduling queue to improve Pop() throughput 2024-07-09 11:37:19 +00:00
Kubernetes Prow Robot
e48d42d81d Merge pull request #122627 from sanposhiho/remove-AssignedPodUpdated
take PodTopologySpread into consideration when requeueing Pods based on Pod related events
2024-07-08 16:21:11 -07:00
Jerry-yz
bd90e99b2a fix: schedule code comment typo
Signed-off-by: Jerry-yz <yz386071268@gmail.com>
2024-07-09 00:04:18 +08:00
Kensei Nakada
41f7607c04 cleanup: remove non-necessary ifs 2024-07-06 13:19:24 +00:00
Kensei Nakada
e16aa35865 address review suggestions 2024-07-06 13:17:17 +00:00
Kensei Nakada
533140f065 take PodTopologySpread into consideration when requeueing Pods based on Pod related events 2024-07-06 13:17:14 +00:00
Kubernetes Prow Robot
59673f0f37 Merge pull request #125578 from nayihz/fix_sche_queue_update
skip update pod that exist in scheduling cycle
2024-06-25 14:18:19 -07:00
nayihz
26dcab1146 skip update pod that exist in scheduling cycle 2024-06-24 17:11:09 +08:00
Kensei Nakada
98a3182398 correct comment 2024-06-20 23:48:42 +00:00
Kensei Nakada
2304806cbe elaborate comment more 2024-06-20 23:43:41 +00:00
Kensei Nakada
2c4dc6b65b elaborate comments 2024-06-20 23:36:05 +00:00
Kensei Nakada
dd3af9a85b fix: skip isPodWorthRequeuing only when SchedulingGates gates the pod 2024-06-17 01:14:34 +00:00
AxeZhan
d66f8f9413 schedulingQueue update pod by queueHint 2024-06-12 21:26:09 +08:00
Gabe
4e99ada05f Filter gated pods before calling isPodWorthRequeueing 2024-04-29 16:54:40 +00:00
Kensei Nakada
2b56de43e5 register Node/UpdateNodeTaint event to plugins which has Node/Add only, doesn't have Node/UpdateNodeTaint 2024-03-16 14:13:06 +00:00
Kensei Nakada
18ba3b388e fix(scheduling queue): ignore events that interest no registered plugin 2024-02-24 06:42:19 +00:00
kerthcet
d81023db30 When matching clusterEvent, we should consider the "*" additionally
Signed-off-by: kerthcet <kerthcet@gmail.com>
2024-02-04 14:59:26 +08:00
Kubernetes Prow Robot
ce80b7752a Merge pull request #122081 from colin404/fix/fix-incorrect-comment
fix incorrect function comment
2023-12-14 06:17:13 +01:00
孔令飞
917027b42e fix incorrect function comment
Change-Id: I7d5e908f979026faa467fdd77049b6aa3087fd7c
2023-12-12 17:38:03 +08:00
kerthcet
fade7463cd Add String() to framework status
Signed-off-by: kerthcet <kerthcet@gmail.com>
2023-11-01 17:01:36 +08:00
Kubernetes Prow Robot
fd5c406112 Merge pull request #120933 from mengjiao-liu/contextual-logging-scheduler-remaining-part
kube-scheduler: convert the remaining part to use contextual logging
2023-10-27 10:30:58 +02:00
Kensei Nakada
27bb66fd7b cleanup: rename failedPlugin to plugin in framework.Status 2023-10-25 12:03:56 +00:00
Mengjiao Liu
b0a73213d6 kube-scheduler: convert the remaining part to use contextual logging 2023-10-24 17:56:48 +08:00
Kensei Nakada
4f5bc7e8d7 fix based on reviews 2023-10-20 02:53:06 +00:00
Kensei Nakada
cb5dc46edf feature(scheduler): simplify QueueingHint by introducing new statuses 2023-10-19 11:02:11 +00:00
carlory
0105a002bc when the hint fn returns error, the scheduling queue logs the error and treats it as QueueAfterBackoff.
Co-authored-by: Kensei Nakada <handbomusic@gmail.com>

Co-authored-by: Kante Yin <kerthcet@gmail.com>

Co-authored-by: XsWack <xushiwei5@huawei.com>
2023-09-21 09:40:44 +08:00
Kensei Nakada
0d3eafdfa3 fix(scheduling_queue): always put Pods with no unschedulable plugins into activeQ/backoffQ (#119105)
* always put Pods with no unschedulable plugins into activeQ/backoffQ

* address review comments
2023-09-11 09:30:11 -07:00
Patrick Ohly
4e73634b53 scheduler: start scheduling attempt with clean UnschedulablePlugins
When some plugin was registered as "unschedulable" in some previous scheduling
attempt, it kept that attribute for a pod forever. When that plugin then later
failed with an error that requires backoff, the pod was incorrectly moved to the
"unschedulable" queue where it got stuck until the periodic flushing because
there was no event that the plugin was waiting for.

Here's an example where that happened:

     framework.go:1280: E0831 20:03:47.184243] Reserve/DynamicResources: Plugin failed err="Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" node="scheduler-perf-dra-7l2v2" plugin="DynamicResources" pod="test/test-dragxd5c"
    schedule_one.go:1001: E0831 20:03:47.184345] Error scheduling pod; retrying err="running Reserve plugin \"DynamicResources\": Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" pod="test/test-dragxd5c"
    ...
    scheduling_queue.go:745: I0831 20:03:47.198968] Pod moved to an internal scheduling queue pod="test/test-dragxd5c" event="ScheduleAttemptFailure" queue="Unschedulable" schedulingCycle=9576 hint="QueueSkip"

Pop still needs the information about unschedulable plugins to update the
UnschedulableReason metric. It can reset that information before returning the
PodInfo for the next scheduling attempt.
2023-09-08 16:52:36 +02:00
Patrick Ohly
cd943dd95e scheduler: fix tracking of concurrent events
The previous approach was based on the assumption that an in-flight pod can use
the head of the received event list as marker for identifying all events that
occur while the pod is in flight. That assumption is incorrect: when that
existing element gets removed from the list because all pods that were
in-flight when it was received are done, that marker's Next method returns nil
and the code which should have seen several concurrent events (if there were
any) missed all of those.

As a result, a pod with concurrent events could incorrectly get moved to the
unschedulable queue where it could got stuck until the next periodic purging
after 5 minutes if there was no other event for it.

The approach with maintaining a single list of concurrent events can be fixed
by inserting each in-flight pod into the list and using that element to
identify "more recent" events for the pod.
2023-09-05 19:58:38 +02:00
Kensei Nakada
cf3f0bd778 fix: register the plugin rejects Pods in WaitOnPermit to UnschedulablePlugins 2023-08-12 07:18:01 +00:00
Kensei Nakada
050c0437e6 fix: broadcast when pod is pushed back to activeQ directly in AddUnschedulableIfNotPresent 2023-08-09 03:32:14 +00:00