Commit Graph

1628 Commits

Author SHA1 Message Date
bells17
e1aa8197ed volumebinding: scheduler queueing hints - CSIStorageCapacity (#124961)
* volumebinding: scheduler queueing hints - CSIStorageCapacity

* Fixed points mentioned in the review

* Fixed points mentioned in the review

* Update pkg/scheduler/framework/plugins/volumebinding/volume_binding.go

Co-authored-by: Kensei Nakada <handbomusic@gmail.com>

* Update pkg/scheduler/framework/plugins/volumebinding/volume_binding_test.go

Co-authored-by: Kensei Nakada <handbomusic@gmail.com>

* Fixed points mentioned in the review

* volume_binding.go を更新

Co-authored-by: Kensei Nakada <handbomusic@gmail.com>

---------

Co-authored-by: Kensei Nakada <handbomusic@gmail.com>
2024-07-19 07:53:52 -07:00
Kubernetes Prow Robot
24fbb13eaf Merge pull request #126113 from googs1025/enqueueExtensions_refactor
scheduler: Add ctx param and error return to EnqueueExtensions.EventsToRegister()
2024-07-18 00:53:25 -07:00
bells17
1298c8a5fe csi-translation-lib: Support structured and contextual logging 2024-07-18 14:01:27 +09:00
googs1025
a3978e8315 scheduler: Add ctx param and error return to EnqueueExtensions.EventsToRegister() 2024-07-18 12:22:17 +08:00
Kubernetes Prow Robot
5d40866fae Merge pull request #125994 from carlory/fix-job-api
clean up codes after PodDisruptionConditions was promoted to GA
2024-07-17 14:37:09 -07:00
Kubernetes Prow Robot
d879103c28 Merge pull request #125820 from macsko/add_separate_lock_for_pod_nominator_scheduling_queue
Add a separate lock for pod nominator in scheduling queue
2024-07-17 12:06:10 -07:00
Maciej Skoczeń
5def93b10a Add a separate lock for pod nominator in scheduling queue 2024-07-17 07:58:59 +00:00
bells17
4c3c4128af volumebinding: scheduler queueing hints - StorageClass 2024-07-17 15:03:17 +09:00
bells17
aceb4468b6 volumebinding: scheduler queueing hints - PersistentVolumeClaim 2024-07-16 12:48:50 +09:00
Kubernetes Prow Robot
ae1caa40a2 Merge pull request #125961 from Jerry-yz/master
Chore: fix scheduler code comment typos
2024-07-15 19:27:30 -07:00
Kubernetes Prow Robot
c7dab2a507 Merge pull request #125280 from HirazawaUi/add-pvc-events-queueinghintfn
Add QueueingHintFn for pvc events in VolumeRestriction plugin
2024-07-12 11:47:30 -07:00
HirazawaUi
cd13be8654 Add QueueingHintFn for pvc events in VolumeRestriction plugin 2024-07-13 00:25:39 +08:00
Kubernetes Prow Robot
31062790a1 Merge pull request #125855 from googs1025/refactor_scheduler_ut
chore: call close framework when finishing
2024-07-12 05:14:35 -07:00
Hiroyuki Moriya
52a622ad6d volumezone: scheduler queueing hints: pv (#125001)
* volumezone: scheduler queueing hints

* add_comment
2024-07-12 05:14:27 -07:00
googs1025
d4627f16a5 chore: call close framework when finishing
Signed-off-by: googs1025 <googs1025@gmail.com>
2024-07-12 18:11:04 +08:00
moriya
ac15717653 volumezone: scheduler queueing hints 2024-07-11 14:24:49 +00:00
Kubernetes Prow Robot
5b49afa66b Merge pull request #125000 from Gekko0114/vz_pvc
volumezone: scheduler queueing hints: pvc
2024-07-11 06:29:09 -07:00
carlory
850bc09e9b clean up codes after PodDisruptionConditions was promoted to GA and locked to default 2024-07-11 10:40:21 +08:00
Kubernetes Prow Robot
90615231a6 Merge pull request #125097 from YamasouA/ft/queuehit-csinode
volumebinding: scheduler queueing hints - CSINode
2024-07-09 17:53:05 -07:00
Kubernetes Prow Robot
4a214f6ad9 Merge pull request #125461 from mimowo/pod-disruption-conditions-ga
Graduate PodDisruptionConditions to stable
2024-07-09 11:08:13 -07:00
Jerry-yz
bd90e99b2a fix: schedule code comment typo
Signed-off-by: Jerry-yz <yz386071268@gmail.com>
2024-07-09 00:04:18 +08:00
Kubernetes Prow Robot
b6899c5e08 Merge pull request #122251 from olderTaoist/unschedulable-plugin
register unschedulable plugin  for those plugins that PreFilter's PreFilterResult filter out some nodes
2024-07-05 05:44:26 -07:00
olderTaoist
b478621596 register unscheduable plugin when prefileter with NodeNames 2024-07-02 13:02:45 +08:00
Kubernetes Prow Robot
ac9aec9f9b Merge pull request #125116 from pohly/dra-one-of-source
DRA: remove "source" indirection from v1 Pod API
2024-06-28 12:46:45 -07:00
Michal Wozniak
780191bea6 review remarks for graduating PodDisruptionConditions 2024-06-28 17:32:27 +02:00
Michal Wozniak
bf0c9885a4 Graduate PodDisruptionConditions to stable 2024-06-28 16:36:51 +02:00
Kubernetes Prow Robot
eb66365bc4 Merge pull request #124931 from pohly/dra-scheduler-prebind-fix
DRA: fix scheduler/resource claim controller race
2024-06-28 05:57:24 -07:00
Patrick Ohly
bde9b64cdf DRA: remove "source" indirection from v1 Pod API
This makes the API nicer:

    resourceClaims:
    - name: with-template
      resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      resourceClaimName: test-shared-claim

Previously, this was:

    resourceClaims:
    - name: with-template
      source:
        resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      source:
        resourceClaimName: test-shared-claim

A more long-term benefit is that other, future alternatives
might not make sense under the "source" umbrella.

This is a breaking change. It's justified because DRA is still
alpha and will have several other API breaks in 1.31.
2024-06-27 17:53:24 +02:00
Patrick Ohly
4bddebc48e DRA: fix scheduler/resource claim controller race with retry
The JSON patch approach works, but it is complex. A retry loop is easier to
understand (detect conflict, get new claim, try again). There is one additional
API call (the get), but in practice this scenario is unlikely.
2024-06-27 15:03:56 +02:00
Patrick Ohly
ecbafb8de5 DRA: fix scheduler/resource claim controller race
There was a race caused by having to update claim finalizer and status in two
different operations:
- Resource claim controller removes allocation, does not yet
  get to remove the finalizer.
- Scheduler prepares an allocation, without adding the finalizer
  because it's there.
- Controller removes finalizer.
- Scheduler adds allocation.

This is an invalid state. Automatic checking found this during the execution of
the "with translated parameters on single node.*supports sharing a claim
sequentially" E2E test, but only when run stand-alone. When running in
parallel (as in the CI), the bad outcome of the race did not occur.

The fix is to check that the finalizer is still set when adding the
allocation. The apiserver doesn't check that because it doesn't know which
finalizer goes with the allocation result. It could check for "some finalizer",
but that is not guaranteed to be correct (could be some unrelated one).

Checking the finalizer can only be done with a JSON patch. Despite the
complications, having the ability to add multiple pods concurrently to
ReservedFor seems worth it (avoids expensive rescheduling or a local retry
loop).

The resource claim controller doesn't need this, it can do a normal update
which implicitly checks ResourceVersion.
2024-06-27 15:03:06 +02:00
googs1025
8ce056df84 add DefaultSelector method ut
Signed-off-by: googs1025 <googs1025@gmail.com>
2024-06-27 11:23:48 +08:00
Kubernetes Prow Robot
084d6c4968 Merge pull request #125699 from pohly/scheduler-framework-logging
scheduler: fix klog.KObjSlice when applied to []*NodeInfo
2024-06-26 01:50:23 -07:00
Patrick Ohly
719a49cc13 scheduler: fix klog.KObjSlice when applied to []*NodeInfo
The DRA plugin does that. It didn't actually work and only printed an error
message about NodeInfo not implementing klog.KMetata. That's not a compile-time
check due to limitations with Go generics and had been missed earlier.
2024-06-26 08:11:31 +02:00
Kubernetes Prow Robot
8c478a06d8 Merge pull request #124595 from pohly/dra-scheduler-assume-cache-eventhandlers
DRA: scheduler event handlers via assume cache
2024-06-25 11:56:28 -07:00
Patrick Ohly
1b63639d31 DRA scheduler: use assume cache to list claims
This finishes the transition to the assume cache as source of truth for the
current set of claims.

The tests have to be adapted. It's not enough anymore to directly put objects
into the informer store because that doesn't change the assume cache
content. Instead, normal Create/Update calls and waiting for the cache update
are needed.
2024-06-25 14:00:25 +02:00
Patrick Ohly
9a6f3b9388 scheduler: central ResourceClaim assume cache
This enables connecting the event handler for ResourceClaim to the assume
cache, which addresses a theoretic race condition.

It may also be useful for implementing the autoscaler support, because now
the autoscaler can modify the content of the cache.
2024-06-25 14:00:25 +02:00
Kubernetes Prow Robot
a008776ec9 Merge pull request #125279 from HirazawaUi/add-poddeleted-queueinghintfn
Add QueueingHintFn for pod events in VolumeRestriction plugin
2024-06-19 12:22:41 -07:00
Kubernetes Prow Robot
64355780d9 Merge pull request #125495 from pohly/dra-scheduler-fix-parameter-indexing
DRA: fix indexing of generated parameters
2024-06-18 04:10:38 -07:00
Kubernetes Prow Robot
ab8ad49b47 Merge pull request #125533 from kaisoz/sched-test-disruption-target-cond
scheduler: Test that the DisruptionTarget condition is added at preemption time
2024-06-18 01:14:28 -07:00
Tomas Tormo
8d7c113434 Test that the DisruptionTarget condition is added at preemption 2024-06-17 16:59:52 +00:00
HirazawaUi
f9693e0c0a Implement QueueingHintFn for pod deleted event 2024-06-17 22:42:04 +08:00
Hiroyuki Moriya
4ccae88114 fix 2024-06-16 00:03:13 +00:00
Hiroyuki Moriya
9f7843bde4 remove lister from test 2024-06-15 22:55:46 +00:00
moriya
e93016b68c fix_review_comments 2024-06-14 09:41:38 +09:00
Patrick Ohly
e0fce54d02 DRA: fix indexing of generated parameters
The claim parameter key didn't include the namespace of the claim. In the case
where two namespaces used the exact same parameter reference, the "too many
generated parameters" case got triggered incorrectly and lookup could have
returned an object from the wrong namespace.

Found while running the E2E tests in parallel:

              message: 'running PreFilter plugin "DynamicResources": multiple generated claim
                parameters for ConfigMap. dra-8794/parameters-3 found: [dra-4729/parameters-4
                dra-7328/parameters-4 dra-8794/parameters-4 dra-3402/parameters-4 dra-6156/parameters-4
                dra-1839/parameters-4 dra-7434/parameters-4 dra-6504/parameters-4]'
2024-06-13 17:27:04 +02:00
Kubernetes Prow Robot
9c8c61aee4 Merge pull request #122234 from AxeZhan/podUpdateEvent
[Scheduler]Put pod into the correct queue during podUpdate
2024-06-12 12:28:17 -07:00
AxeZhan
d66f8f9413 schedulingQueue update pod by queueHint 2024-06-12 21:26:09 +08:00
Patrick Ohly
c339eafb76 scheduler: allow PreBind to return "Pending" and "Unschedulable"
Any error result from PreBind was treated as a pod scheduling failure. This was
overlooked when moving blocking API calls in the DRA plugin into a PreBind
implementation, leading to:

    E0604 15:45:50.980929  306340 schedule_one.go:1048] "Error scheduling pod; retrying" err="waiting for resource driver" pod="test/test-draqld28"

That's because DRA's PreBind does some updates in the apiserver, then returns
Pending to wait for the outcome.

The fix is to allow PreBind to return the same special status codes as other
extension points.
2024-06-06 15:28:08 +02:00
YamasouA
f409dedb5d Implement QHint for CSINode 2024-06-04 23:04:52 +09:00
AxeZhan
cf73c9d93c remove EvaluatedNodes field in Diagnosis struct 2024-06-04 14:20:55 +08:00