Commit Graph

1154 Commits

Author SHA1 Message Date
HirazawaUi
f9693e0c0a Implement QueueingHintFn for pod deleted event 2024-06-17 22:42:04 +08:00
Hiroyuki Moriya
4ccae88114 fix 2024-06-16 00:03:13 +00:00
Hiroyuki Moriya
9f7843bde4 remove lister from test 2024-06-15 22:55:46 +00:00
moriya
e93016b68c fix_review_comments 2024-06-14 09:41:38 +09:00
Patrick Ohly
e0fce54d02 DRA: fix indexing of generated parameters
The claim parameter key didn't include the namespace of the claim. In the case
where two namespaces used the exact same parameter reference, the "too many
generated parameters" case got triggered incorrectly and lookup could have
returned an object from the wrong namespace.

Found while running the E2E tests in parallel:

              message: 'running PreFilter plugin "DynamicResources": multiple generated claim
                parameters for ConfigMap. dra-8794/parameters-3 found: [dra-4729/parameters-4
                dra-7328/parameters-4 dra-8794/parameters-4 dra-3402/parameters-4 dra-6156/parameters-4
                dra-1839/parameters-4 dra-7434/parameters-4 dra-6504/parameters-4]'
2024-06-13 17:27:04 +02:00
Kubernetes Prow Robot
9c8c61aee4 Merge pull request #122234 from AxeZhan/podUpdateEvent
[Scheduler]Put pod into the correct queue during podUpdate
2024-06-12 12:28:17 -07:00
AxeZhan
d66f8f9413 schedulingQueue update pod by queueHint 2024-06-12 21:26:09 +08:00
YamasouA
f409dedb5d Implement QHint for CSINode 2024-06-04 23:04:52 +09:00
Kubernetes Prow Robot
cfe5a7d03a Merge pull request #125213 from carlory/fix-dra-flaky
fix dra flaky test on TestPlugin
2024-06-03 13:32:10 -07:00
Kubernetes Prow Robot
8bd36c60bd Merge pull request #125197 from gabesaba/prefilter_perf
[scheduler] absent key in NodeToStatusMap implies UnschedulableAndUnresolvable
2024-06-03 07:35:41 -07:00
moriya
3f3ce8659f update_comment 2024-06-02 23:58:06 +09:00
moriya
a3e6fd724c remove_comment 2024-06-02 23:56:45 +09:00
moriya
e2632d0ed8 simplify_test 2024-06-02 23:55:41 +09:00
moriya
657bba80de simplify_test 2024-06-02 23:54:39 +09:00
moriya
105f9396b8 review_comment 2024-06-02 23:34:01 +09:00
moriya
a4b3ce8876 simplify 2024-06-02 22:24:35 +09:00
Gabe
c8f0ea1a54 Don't fill in NodeToStatusMap with UnschedulableAndUnresolvable 2024-05-31 15:52:16 +00:00
carlory
2794baf4c0 fix dra flaky test on TestPlugin 2024-05-30 23:22:37 +08:00
Kubernetes Prow Robot
ee2c1ffa80 Merge pull request #124630 from carlory/fix-123731
DRA: scheduler: index claim and class parameters to simplify lookup
2024-05-29 14:38:14 -07:00
moriya
116665da4d fix_review_comment 2024-05-28 23:24:33 +09:00
carlory
3072987fcc DRA: scheduler: index claim and class parameters to simplify lookup 2024-05-27 15:57:10 +08:00
moriya
1b8fb3a838 pvc 2024-05-20 23:13:56 +09:00
moriya
58143ff3eb volumezone: scheduler queueing hints 2024-05-18 23:34:39 +09:00
NoicFank
31a4b13238 enhancement(scheduler): share waitingPods among profiles 2024-05-17 17:07:27 +08:00
Toru Komatsu
5722db7aa3 QueueingHint for CSILimit when deleting pods (#121508)
Signed-off-by: utam0k <k0ma@utam0k.jp>
2024-05-14 11:07:11 -07:00
carlory
c8e91b9bc2 CephRBD volume plugin ( ) and its csi migration support were removed in this release 2024-05-09 22:55:34 +08:00
Kubernetes Prow Robot
b27608875c Merge pull request #124287 from sanposhiho/tainttoleration
implement QueueingHint in TaintToleration
2024-05-01 00:06:16 -07:00
carlory
06d3cd33b2 use slices library instead 2024-04-29 16:50:53 +08:00
wackxu
a4bfaae8a4 implement QueueingHint in TaintToleration 2024-04-29 07:18:35 +00:00
Kubernetes Prow Robot
cffc2c0b40 Merge pull request #124102 from pohly/dra-scheduler-assume-cache
scheduler: move assume cache to utils
2024-04-26 08:49:12 -07:00
Patrick Ohly
7f54c5dfec scheduler: remove AssumeCache interface
There's no reason for having the interface because there is only one
implementation. Makes the implementation of the test functions a bit
simpler (no casting). They are still stand-alone functions instead of methods
because they should not be considered part of the "normal" API.
2024-04-25 11:46:58 +02:00
Patrick Ohly
26e0409c36 scheduler: move assume cache to utils, part 2
This is now used by both the volumebinding and dynamicresources plugin, so
promoting it to a common helper package is better.

In terms of functionality, nothing was changed. Documentation got
updated (warns about storing locally modified objects, clarifies what the Get
parameters are). Code coverage should be a bit better than before (tested with
and without indexer, exercises event handlers, more error paths).

Checking for specific errors can now be done via errors.Is.
2024-04-25 11:45:43 +02:00
Patrick Ohly
910b90fca3 scheduler: move assume cache to utils, part 1
This is a verbatim move resp. copy of the files. They don't build in their new
location yet.
2024-04-25 10:49:41 +02:00
Patrick Ohly
a66d2163f9 dra scheduler: fix data race in unit test
Clearing some irrelevant fields in objects caused a flaky data race alert
because in some cases, the objects were pointers into a shared cache. A better
solution is to treat the objects as read-only and ignore the irrelevant fields.
2024-04-19 17:14:13 +02:00
Kubernetes Prow Robot
846e282d05 Merge pull request #124055 from yangjunmyfm192085/optklogprint
Optimize klog output(Use klog.KObj(pod) instead of pod)
2024-04-18 02:11:47 -07:00
Kubernetes Prow Robot
d2ce87eb94 Merge pull request #123938 from pohly/dra-structured-parameters-tests
DRA: test for structured parameters
2024-04-18 02:10:08 -07:00
Kubernetes Prow Robot
2c6d5fae7a Merge pull request #122471 from nayihz/feat_podaffinity_qhint
interpodaffinity: scheduler queueing hints
2024-04-18 00:00:21 -07:00
nayihz
1b3d10aafa fix: node added with matched pod anti-affinity topologyKey
Co-authored-by: Kensei Nakada <handbomusic@gmail.com>
2024-04-12 11:08:44 +08:00
Patrick Ohly
6f5696b537 dra scheduler: simplify unit tests
The guideline in
https://github.com/kubernetes/community/blob/master/sig-scheduling/CONTRIBUTING.md#technical-and-style-guidelines
is to not compare error strings. This makes the tests less precise. In return,
unit tests don't need to be updated when error strings change.
2024-03-27 10:27:01 +01:00
杨军10092085
ba76a624f9 Optimize klog output 2024-03-26 18:53:29 +08:00
Patrick Ohly
458e227de0 dra scheduler: unit tests
Coverage was checked with a cover profile. The biggest remaining gap is for
isSchedulableAfterClaimParametersChange and
isSchedulableAfterClassParametersChange which will get handled when refactoring
the
foreachPodResourceClaim (https://github.com/kubernetes/kubernetes/issues/123697).
2024-03-22 10:03:22 +01:00
Patrick Ohly
607261e4c5 dra scheduler: spelling fix 2024-03-22 10:03:22 +01:00
Patrick Ohly
95136db063 dra scheduler: fix re-allocation of claim with structured parameters
The code was incorrectly checking for a controller, but only the boolean
is set for allocated claims. As a result, deallocation was requested from
a non-existent control plane controller.

While at it, let's also clear the driver name. It's not needed when the
claim is deallocated.
2024-03-22 10:03:22 +01:00
nayihz
0cfe4438e9 interpodaffinity: scheduler queueing hints 2024-03-20 21:44:24 +08:00
kerthcet
84750fe52e Revert "enhancement(scheduler): share waitingPods among profiles"
This reverts commit 227c1915db.
2024-03-19 22:52:59 +01:00
Kubernetes Prow Robot
aa73f3163a Merge pull request #122292 from sanposhiho/nodeupdate
register Node/UpdateTaint event to plugins which has Node/Add only and doesn't have Node/UpdateTaint
2024-03-18 08:33:54 -07:00
Kensei Nakada
2b56de43e5 register Node/UpdateNodeTaint event to plugins which has Node/Add only, doesn't have Node/UpdateNodeTaint 2024-03-16 14:13:06 +00:00
Kevin Klues
21a0dd1d70 dra scheduler: create default claim/class parameters instead of nil
Without this, the scheduler was crashing in newClaimController() in
pkg/scheduler/framework/plugins/dynamicresources/structuredparameters.go

The code in newClaimController() assumes that the parameters are not nil.
Furthermore it assumes that there is at least one DriverRequest populated in
order to allocate any resources to a claim.

This PR adds logic to define default claim/class parameters that will allow
allocation to proceed even if an end user doesn't provide any class or claim
parameters themselves.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-03-11 13:57:16 +00:00
Patrick Ohly
251b3859b0 dra scheduler: consider in-flight allocation for resource calculation
Storing a modified claim with allocation and the original resource version in
the assume cache was not reliable: if an update was received, it replaced the
modified claim and the resource that was reserved for the claim might have been
used for some other claim.

To fix this, the in-flight claims are now stored in the map instead of just a
boolean and the status stored there overrides whatever is in the assume cache.

Logging got extended to diagnose this problem better. It started to occur in
E2E tests after splitting the claim update so that first the finalizer is set
and then the status, because setting the finalizer triggered an update.
2024-03-07 22:26:16 +01:00
Patrick Ohly
0b6a0d686a dra api: rename NodeResourceSlice -> ResourceSlice
While currently those objects only get published by the kubelet for node-local
resources, this could change once we also support network-attached
resources. Dropping the "Node" prefix enables such a future extension.

The NodeName in ResourceSlice and StructuredResourceHandle then becomes
optional. The kubelet still needs to provide one and it must match its own node
name, otherwise it doesn't have permission to access ResourceSlice objects.
2024-03-07 22:22:55 +01:00