Commit Graph

1174 Commits

Author SHA1 Message Date
carlory
850bc09e9b clean up codes after PodDisruptionConditions was promoted to GA and locked to default 2024-07-11 10:40:21 +08:00
Kubernetes Prow Robot
90615231a6 Merge pull request #125097 from YamasouA/ft/queuehit-csinode
volumebinding: scheduler queueing hints - CSINode
2024-07-09 17:53:05 -07:00
Kubernetes Prow Robot
4a214f6ad9 Merge pull request #125461 from mimowo/pod-disruption-conditions-ga
Graduate PodDisruptionConditions to stable
2024-07-09 11:08:13 -07:00
Kubernetes Prow Robot
b6899c5e08 Merge pull request #122251 from olderTaoist/unschedulable-plugin
register unschedulable plugin  for those plugins that PreFilter's PreFilterResult filter out some nodes
2024-07-05 05:44:26 -07:00
olderTaoist
b478621596 register unscheduable plugin when prefileter with NodeNames 2024-07-02 13:02:45 +08:00
Kubernetes Prow Robot
ac9aec9f9b Merge pull request #125116 from pohly/dra-one-of-source
DRA: remove "source" indirection from v1 Pod API
2024-06-28 12:46:45 -07:00
Michal Wozniak
780191bea6 review remarks for graduating PodDisruptionConditions 2024-06-28 17:32:27 +02:00
Michal Wozniak
bf0c9885a4 Graduate PodDisruptionConditions to stable 2024-06-28 16:36:51 +02:00
Kubernetes Prow Robot
eb66365bc4 Merge pull request #124931 from pohly/dra-scheduler-prebind-fix
DRA: fix scheduler/resource claim controller race
2024-06-28 05:57:24 -07:00
Patrick Ohly
bde9b64cdf DRA: remove "source" indirection from v1 Pod API
This makes the API nicer:

    resourceClaims:
    - name: with-template
      resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      resourceClaimName: test-shared-claim

Previously, this was:

    resourceClaims:
    - name: with-template
      source:
        resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      source:
        resourceClaimName: test-shared-claim

A more long-term benefit is that other, future alternatives
might not make sense under the "source" umbrella.

This is a breaking change. It's justified because DRA is still
alpha and will have several other API breaks in 1.31.
2024-06-27 17:53:24 +02:00
Patrick Ohly
4bddebc48e DRA: fix scheduler/resource claim controller race with retry
The JSON patch approach works, but it is complex. A retry loop is easier to
understand (detect conflict, get new claim, try again). There is one additional
API call (the get), but in practice this scenario is unlikely.
2024-06-27 15:03:56 +02:00
Patrick Ohly
ecbafb8de5 DRA: fix scheduler/resource claim controller race
There was a race caused by having to update claim finalizer and status in two
different operations:
- Resource claim controller removes allocation, does not yet
  get to remove the finalizer.
- Scheduler prepares an allocation, without adding the finalizer
  because it's there.
- Controller removes finalizer.
- Scheduler adds allocation.

This is an invalid state. Automatic checking found this during the execution of
the "with translated parameters on single node.*supports sharing a claim
sequentially" E2E test, but only when run stand-alone. When running in
parallel (as in the CI), the bad outcome of the race did not occur.

The fix is to check that the finalizer is still set when adding the
allocation. The apiserver doesn't check that because it doesn't know which
finalizer goes with the allocation result. It could check for "some finalizer",
but that is not guaranteed to be correct (could be some unrelated one).

Checking the finalizer can only be done with a JSON patch. Despite the
complications, having the ability to add multiple pods concurrently to
ReservedFor seems worth it (avoids expensive rescheduling or a local retry
loop).

The resource claim controller doesn't need this, it can do a normal update
which implicitly checks ResourceVersion.
2024-06-27 15:03:06 +02:00
googs1025
8ce056df84 add DefaultSelector method ut
Signed-off-by: googs1025 <googs1025@gmail.com>
2024-06-27 11:23:48 +08:00
Kubernetes Prow Robot
8c478a06d8 Merge pull request #124595 from pohly/dra-scheduler-assume-cache-eventhandlers
DRA: scheduler event handlers via assume cache
2024-06-25 11:56:28 -07:00
Patrick Ohly
1b63639d31 DRA scheduler: use assume cache to list claims
This finishes the transition to the assume cache as source of truth for the
current set of claims.

The tests have to be adapted. It's not enough anymore to directly put objects
into the informer store because that doesn't change the assume cache
content. Instead, normal Create/Update calls and waiting for the cache update
are needed.
2024-06-25 14:00:25 +02:00
Patrick Ohly
9a6f3b9388 scheduler: central ResourceClaim assume cache
This enables connecting the event handler for ResourceClaim to the assume
cache, which addresses a theoretic race condition.

It may also be useful for implementing the autoscaler support, because now
the autoscaler can modify the content of the cache.
2024-06-25 14:00:25 +02:00
Kubernetes Prow Robot
a008776ec9 Merge pull request #125279 from HirazawaUi/add-poddeleted-queueinghintfn
Add QueueingHintFn for pod events in VolumeRestriction plugin
2024-06-19 12:22:41 -07:00
Kubernetes Prow Robot
64355780d9 Merge pull request #125495 from pohly/dra-scheduler-fix-parameter-indexing
DRA: fix indexing of generated parameters
2024-06-18 04:10:38 -07:00
Kubernetes Prow Robot
ab8ad49b47 Merge pull request #125533 from kaisoz/sched-test-disruption-target-cond
scheduler: Test that the DisruptionTarget condition is added at preemption time
2024-06-18 01:14:28 -07:00
Tomas Tormo
8d7c113434 Test that the DisruptionTarget condition is added at preemption 2024-06-17 16:59:52 +00:00
HirazawaUi
f9693e0c0a Implement QueueingHintFn for pod deleted event 2024-06-17 22:42:04 +08:00
Hiroyuki Moriya
4ccae88114 fix 2024-06-16 00:03:13 +00:00
Hiroyuki Moriya
9f7843bde4 remove lister from test 2024-06-15 22:55:46 +00:00
moriya
e93016b68c fix_review_comments 2024-06-14 09:41:38 +09:00
Patrick Ohly
e0fce54d02 DRA: fix indexing of generated parameters
The claim parameter key didn't include the namespace of the claim. In the case
where two namespaces used the exact same parameter reference, the "too many
generated parameters" case got triggered incorrectly and lookup could have
returned an object from the wrong namespace.

Found while running the E2E tests in parallel:

              message: 'running PreFilter plugin "DynamicResources": multiple generated claim
                parameters for ConfigMap. dra-8794/parameters-3 found: [dra-4729/parameters-4
                dra-7328/parameters-4 dra-8794/parameters-4 dra-3402/parameters-4 dra-6156/parameters-4
                dra-1839/parameters-4 dra-7434/parameters-4 dra-6504/parameters-4]'
2024-06-13 17:27:04 +02:00
Kubernetes Prow Robot
9c8c61aee4 Merge pull request #122234 from AxeZhan/podUpdateEvent
[Scheduler]Put pod into the correct queue during podUpdate
2024-06-12 12:28:17 -07:00
AxeZhan
d66f8f9413 schedulingQueue update pod by queueHint 2024-06-12 21:26:09 +08:00
YamasouA
f409dedb5d Implement QHint for CSINode 2024-06-04 23:04:52 +09:00
Kubernetes Prow Robot
cfe5a7d03a Merge pull request #125213 from carlory/fix-dra-flaky
fix dra flaky test on TestPlugin
2024-06-03 13:32:10 -07:00
Kubernetes Prow Robot
8bd36c60bd Merge pull request #125197 from gabesaba/prefilter_perf
[scheduler] absent key in NodeToStatusMap implies UnschedulableAndUnresolvable
2024-06-03 07:35:41 -07:00
moriya
3f3ce8659f update_comment 2024-06-02 23:58:06 +09:00
moriya
a3e6fd724c remove_comment 2024-06-02 23:56:45 +09:00
moriya
e2632d0ed8 simplify_test 2024-06-02 23:55:41 +09:00
moriya
657bba80de simplify_test 2024-06-02 23:54:39 +09:00
moriya
105f9396b8 review_comment 2024-06-02 23:34:01 +09:00
moriya
a4b3ce8876 simplify 2024-06-02 22:24:35 +09:00
Gabe
c8f0ea1a54 Don't fill in NodeToStatusMap with UnschedulableAndUnresolvable 2024-05-31 15:52:16 +00:00
carlory
2794baf4c0 fix dra flaky test on TestPlugin 2024-05-30 23:22:37 +08:00
Kubernetes Prow Robot
ee2c1ffa80 Merge pull request #124630 from carlory/fix-123731
DRA: scheduler: index claim and class parameters to simplify lookup
2024-05-29 14:38:14 -07:00
moriya
116665da4d fix_review_comment 2024-05-28 23:24:33 +09:00
carlory
3072987fcc DRA: scheduler: index claim and class parameters to simplify lookup 2024-05-27 15:57:10 +08:00
moriya
1b8fb3a838 pvc 2024-05-20 23:13:56 +09:00
moriya
58143ff3eb volumezone: scheduler queueing hints 2024-05-18 23:34:39 +09:00
NoicFank
31a4b13238 enhancement(scheduler): share waitingPods among profiles 2024-05-17 17:07:27 +08:00
Toru Komatsu
5722db7aa3 QueueingHint for CSILimit when deleting pods (#121508)
Signed-off-by: utam0k <k0ma@utam0k.jp>
2024-05-14 11:07:11 -07:00
carlory
c8e91b9bc2 CephRBD volume plugin ( ) and its csi migration support were removed in this release 2024-05-09 22:55:34 +08:00
Kubernetes Prow Robot
b27608875c Merge pull request #124287 from sanposhiho/tainttoleration
implement QueueingHint in TaintToleration
2024-05-01 00:06:16 -07:00
carlory
06d3cd33b2 use slices library instead 2024-04-29 16:50:53 +08:00
wackxu
a4bfaae8a4 implement QueueingHint in TaintToleration 2024-04-29 07:18:35 +00:00
Kubernetes Prow Robot
cffc2c0b40 Merge pull request #124102 from pohly/dra-scheduler-assume-cache
scheduler: move assume cache to utils
2024-04-26 08:49:12 -07:00