98 Commits

Author SHA1 Message Date
Kensei Nakada
bfea4890c5 cleanup: remove pod_scheduling_duration_seconds 2024-11-21 17:54:31 +09:00
Kensei Nakada
83f9e4b6df cleanup: remove event list 2024-10-18 11:10:10 +10:00
Kensei Nakada
0b71f256a8 fix(scheduler): fix a possible memory leak for QueueingHint 2024-09-05 12:13:05 +09:00
Kensei Nakada
baf69640d3 fix(scheduler_one): call Done() as soon as possible 2024-08-27 09:30:47 +09:00
Kubernetes Prow Robot
5e2cead785 Merge pull request #126534 from googs1025/scheduler_cleanup
scheduler: use logger instead of new klog.FromContext(ctx)
2024-08-13 22:10:56 -07:00
Kubernetes Prow Robot
ea1143efc7 Merge pull request #126022 from macsko/new_node_to_status_map_structure
Change structure of NodeToStatus map in scheduler
2024-08-13 21:02:55 -07:00
googs1025
6427243676 use logger instead of new klog.FromContext(ctx) 2024-08-04 21:09:02 +08:00
Maciej Skoczeń
98be7dfc5d Change structure of NodeToStatus map in scheduler 2024-07-25 07:48:35 +00:00
Kensei Nakada
0dee497876 fix: make updatePodOther private 2024-07-20 17:49:46 +09:00
olderTaoist
b478621596 register unscheduable plugin when prefileter with NodeNames 2024-07-02 13:02:45 +08:00
Patrick Ohly
c339eafb76 scheduler: allow PreBind to return "Pending" and "Unschedulable"
Any error result from PreBind was treated as a pod scheduling failure. This was
overlooked when moving blocking API calls in the DRA plugin into a PreBind
implementation, leading to:

    E0604 15:45:50.980929  306340 schedule_one.go:1048] "Error scheduling pod; retrying" err="waiting for resource driver" pod="test/test-draqld28"

That's because DRA's PreBind does some updates in the apiserver, then returns
Pending to wait for the outcome.

The fix is to allow PreBind to return the same special status codes as other
extension points.
2024-06-06 15:28:08 +02:00
AxeZhan
cf73c9d93c remove EvaluatedNodes field in Diagnosis struct 2024-06-04 14:20:55 +08:00
Gabe
c8f0ea1a54 Don't fill in NodeToStatusMap with UnschedulableAndUnresolvable 2024-05-31 15:52:16 +00:00
Gabe
7ea3bf4db4 Revert "scheduler: preallocation for NodeToStatusMap"
This reverts commit 9fcd791c01.
2024-05-29 14:09:58 +00:00
AxeZhan
d6d1e6ad8a base on allNodes when calculating nextStartNodeIndex 2024-05-18 00:30:38 +08:00
Kensei Nakada
9cd62186e8 cleanup: eliminate unncessary NodeToStatusMap creation 2024-05-11 12:14:22 +00:00
AxeZhan
bcf1c55837 evaluated nodes only consider filter stage 2024-05-10 12:40:12 +08:00
Kensei Nakada
9fcd791c01 scheduler: preallocation for NodeToStatusMap 2024-05-07 00:01:24 +00:00
Kensei Nakada
2b56de43e5 register Node/UpdateNodeTaint event to plugins which has Node/Add only, doesn't have Node/UpdateNodeTaint 2024-03-16 14:13:06 +00:00
Oleg Guba
ba525460e0 change result size to numAllNodes 2024-03-01 02:06:17 -08:00
Oleg Guba
e6dd36759f [kubernetes/scheduler] use lockless diagnosis collection in findNodesThatPassFilters 2024-02-29 20:43:50 -08:00
Aleksandra Malinowska
dd1e617ba0 Scheduler first fit (#123384)
* Don't evaluate extra nodes if there's no score plugin defined

* Fix existing unit test (add no op scoring plugin)

* Add unit tests for no score plugin scenario

* address review comments

* add a test with non-filter, non-scoring extender
2024-02-26 11:07:19 -08:00
AxeZhan
630ff96f9d Revert "Scheduler first fit" 2024-02-14 20:43:59 +08:00
Kubernetes Prow Robot
919d4624a0 Merge pull request #122503 from sunbinnnnn/scheduler-extender-support-ignore-bind
Support ignore scheduler extender error when binding
2024-01-08 17:30:44 +01:00
Neil Sun
87816ffb2c Support ignore scheduler extender error when binding
Signed-off-by: sunbinnnnn <sunbinnnnn@hotmail.com>
2024-01-08 21:06:25 +08:00
Kensei Nakada
09abd6be5a address reviews 2024-01-02 02:10:41 +00:00
Kensei Nakada
041efcd1d4 scheduler: update an old comment 2023-12-22 02:01:13 +00:00
Aleksandra Malinowska
f89c744b7b Only run Prioritize() for extenders with prioritizeVerb configured 2023-12-21 13:42:27 +01:00
Aleksandra Malinowska
e19be41f58 Don't evaluate extra nodes if there's no score plugin defined 2023-12-21 13:29:46 +01:00
AxeZhan
be48c93689 Sched framework: expose NodeInfo in all functions of PluginsRunner interface 2023-12-15 11:30:06 +08:00
Paco Xu
1160521a4f Revert "Scheduler first fit" 2023-12-14 17:27:25 +08:00
Kubernetes Prow Robot
517091cdc5 Merge pull request #122058 from aleksandra-malinowska/scheduler-first-fit
Scheduler first fit
2023-12-14 05:10:19 +01:00
Kubernetes Prow Robot
5322af7f9e Merge pull request #122022 from sanposhiho/extender-fix
fix: requeue pods rejected by Extenders properly
2023-12-14 05:10:01 +01:00
Kubernetes Prow Robot
6bd8f96f35 Merge pull request #122001 from olderTaoist/scheduler-metric
report scheduling_algorithm_duration_seconds metric when pods is unschedulable
2023-12-14 05:09:25 +01:00
Toru Komatsu
01916625da Remove unnecessary error catch in scheduling failure (#121981)
* Deleted from the cache in the handling of scheduling failures due to missing Node

Signed-off-by: utam0k <k0ma@utam0k.jp>

* Support only `nodes`

* Remove unnecessary error catch

Signed-off-by: utam0k <k0ma@utam0k.jp>

* Fix a build error

Signed-off-by: utam0k <k0ma@utam0k.jp>

* Fix a build error

Signed-off-by: utam0k <k0ma@utam0k.jp>

---------

Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-12-14 05:09:08 +01:00
olderTaoist
78b4ab11d5 also report scheduling_algorithm_duration_seconds metric when the pods is unschedulable 2023-12-06 19:17:03 +08:00
Aleksandra Malinowska
3df00d1bdd Only run Prioritize() for extenders with prioritizeVerb configured 2023-11-28 17:13:13 +01:00
Aleksandra Malinowska
199dc03bdd Don't evaluate extra nodes if there's no score plugin defined 2023-11-28 10:39:49 +01:00
Kensei Nakada
468e2dac81 fix: requeue pods rejected by Extenders properly 2023-11-23 13:20:02 +00:00
Patrick Ohly
2a23061f6c scheduler: fix performance regression at -v3 + contextual logging
The logging instrumentation for contextual logging that was added for 1.29
slowed down the scheduler (i.e. logging verbosity <= 3) by a significant
percentage (-28.66% for SchedulingBasic/5000Nodes at -v3) if (and only if!)
contextual logging was enabled.

Retrieving the logger from the context causes no measurable slowdown, it's only
the various WithName/WithValues calls which cause this.

By being more careful about when to use those, the performance impact can be
avoided:
- At -v3 or lower, only `WithValues("pod")` is used once per scheduling cycle.
  This has the intended effect that all log messages for the cycle include the
  pod information. Once contextual logging is GA, "pod" key/value pairs can
  be removed from all log calls.
- At -v4 or higher, richer log entries get produced where `WithValues` is also
  used for the node (when applicable) and `WithName` is used for the current
  operation and plugin.

With these changes, enabling contextual logging causes no measurable slowdown
at -v3 or lower. At -v4, the slowdown depends on the test case (-30.51%
throughput for SchedulingBasic/5000Nodes, no change for
SchedulingCSIPVs/5000Nodes). For some unknown reason (measuring bias?),
SchedulingCSIPVs/500Nodes has a ~3& *higher* throughput with contextual
logging.
2023-11-03 17:28:55 +01:00
Kubernetes Prow Robot
fd5c406112 Merge pull request #120933 from mengjiao-liu/contextual-logging-scheduler-remaining-part
kube-scheduler: convert the remaining part to use contextual logging
2023-10-27 10:30:58 +02:00
Kensei Nakada
27bb66fd7b cleanup: rename failedPlugin to plugin in framework.Status 2023-10-25 12:03:56 +00:00
Mengjiao Liu
b0a73213d6 kube-scheduler: convert the remaining part to use contextual logging 2023-10-24 17:56:48 +08:00
Kensei Nakada
4f5bc7e8d7 fix based on reviews 2023-10-20 02:53:06 +00:00
Kensei Nakada
cb5dc46edf feature(scheduler): simplify QueueingHint by introducing new statuses 2023-10-19 11:02:11 +00:00
Kubernetes Prow Robot
130a5a423f Merge pull request #119785 from sanposhiho/waitonpermit-fiterror
fix: register the plugin rejects Pods in WaitOnPermit to UnschedulablePlugins
2023-08-15 23:13:04 -07:00
Kubernetes Prow Robot
719d1a84f7 Merge pull request #119778 from sanposhiho/bugfix-unschedulableandunresolvable
fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap
2023-08-15 23:12:57 -07:00
Heba Elayoty
224087abfa Add Pod Scheduling SLI Duration metric (#119049)
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
2023-08-15 15:17:41 -07:00
Kensei Nakada
cf3f0bd778 fix: register the plugin rejects Pods in WaitOnPermit to UnschedulablePlugins 2023-08-12 07:18:01 +00:00
Kensei Nakada
b008223705 fix: when PreFilter returns UnschedulableAndUnresolvable, copy the state in all nodes in statusmap 2023-08-12 06:58:49 +00:00