TestPlugin/multi-claims-binding-conditions-all-success/PreEnqueue
flakes due to the assumed cache not been synced with the initial
store. The test waits until the registered handler used by the
assumed cache has synced before proceeding with the test
Moving Scheduler interfaces to staging: Move PodInfo and NodeInfo interfaces (together with related types) to staging repo, leaving internal implementation in kubernetes/kubernetes/pkg/scheduler
As before when adding v1beta2, DRA drivers built using the
k8s.io/dynamic-resource-allocation helper packages remain compatible with all
Kubernetes release >= 1.32. The helper code picks whatever API version is
enabled from v1beta1/v1beta2/v1.
However, the control plane now depends on v1, so a cluster configuration where
only v1beta1 or v1beta2 are enabled without the v1 won't work.
Added a skipOnWindows flag to DynamicResources scheduler test case
to skip test that relies on nanosecond timer precision.
Windows timer granularity is much coarser than Linux, which causes
the test to fail often.
The intent is to catch abnormal runtimes with the generously large default
timeout of 10 seconds.
We have to set up a context with the configured timeout (optional!), then
ensure that both CEL evaluation and the allocation logic itself properly
returns the context error. The scheduler plugin then can convert that into
"unschedulable".
The allocator and thus Filter now also check for context cancellation by the
scheduler. This happens when enough nodes have been found.
This is meant for simple changes, like code cleanup or API changes of the
allocator code. For more complex changes and new features, SIG Scheduling
approvers will be required to approve, as before.
The goal is to maintain different version of the allocator logic. We already
had one incidence where adding an alpha feature caused a regression also when
it was disabled. Not everything can be implemented within obviously correct if
branches.
This also opens the door for implementing different alternatives.
The code just gets moved around for now.
* Move ClusterEvent type to staging repo, leaving some functions (that contain logic internal to scheduler) in kubernetes/kubernetes
apply review comment and fix linter warning
* update-vendor.sh
* update doc comments
* run update-vendor.sh
Thanks to the tracker, the plugin sees all taints directly in the device
definition and can compare it against the tolerations of a request while
trying to find a device for the request.
When the feature is turnedd off, taints are ignored during scheduling.
The controller is derived from the node taint eviction controller.
In contrast to that controller it tracks the UID of pods to prevent
deleting the wrong pod when it got replaced.
If there was an unexpected status, the code extracting the expected error
message crashed with a panic. Happened once so far, for unknown reasons
because the unexpected status then didn't get logged.
This was previously caught during Filter by the allocator check. Doing it
sooner avoids wasting resources on a pod which ultimately cannot get scheduled.
While at it, be a bit more clear about which feature is disabled. The user
might not know that.
All logic related to obtaining DRA objects and tracking modifications
to ResourceClaims in-memory is extracted to DefaultDRAManager, which
implements framework.SharedDRAManager.
This is intended to be a no-op in terms of the DRA plugin behavior.
A better place is the cel package because a) the name can become shorter
and b) it is tightly coupled with the compiler there.
Moving the compilation into the cache simplifies the callers.
"Allocated devices" are the ones which can be observed from the informer. "All
allocated devices" also includes those which are in flight and haven't been
written back to the apiserver.
The logic for skipping "admin access" was repeated in three different places. A
single foreachAllocatedDevices with a callback puts it into one function.
DeviceClasses and different requests are very likely to contain the same
expression string. We don't need to compile that over and over again.
To avoid hanging onto that cache longer than necessary, it's currently tied to
each PreFilter/Filter combination. It might make sense to move this up into the
scheduler plugin and thus reuse compiled expressions for different pods.
goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/test/integration/scheduler_perf
cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
│ before │ after │
│ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │
PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 33.95 ± 4% 36.65 ± 2% +7.95% (p=0.002 n=6)
PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 105.8 ± 2% 106.7 ± 3% ~ (p=0.177 n=6)
PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 100.7 ± 1% 119.7 ± 3% +18.82% (p=0.002 n=6)
PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 90.78 ± 1% 121.10 ± 4% +33.40% (p=0.002 n=6)
PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 50.51 ± 7% 63.72 ± 3% +26.17% (p=0.002 n=6)
PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 103.7 ± 5% 110.2 ± 2% +6.32% (p=0.002 n=6)
PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 28.50 ± 2% 28.16 ± 5% ~ (p=0.102 n=6)
geomean 64.99 73.15 +12.56%
Using unique strings instead of normal strings speeds up allocation with
structured parameters because maps that use those strings as key no longer need
to build hashes of the string content. However, care must be taken to call
unique.Make as little as possible because it is costly.
Pre-allocating the map of allocated devices reduces the need to grow the map
when adding devices.
goos: linux
goarch: amd64
pkg: k8s.io/kubernetes/test/integration/scheduler_perf
cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
│ before │ after │
│ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │
PerfScheduling/SchedulingWithResourceClaimTemplateStructured/5000pods_500nodes-36 18.06 ± 2% 33.30 ± 2% +84.31% (p=0.002 n=6)
PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_100nodes-36 104.7 ± 2% 105.3 ± 2% ~ (p=0.818 n=6)
PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/empty_500nodes-36 96.62 ± 1% 100.75 ± 1% +4.28% (p=0.002 n=6)
PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_100nodes-36 83.00 ± 2% 90.96 ± 2% +9.59% (p=0.002 n=6)
PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/half_500nodes-36 32.45 ± 7% 49.84 ± 4% +53.60% (p=0.002 n=6)
PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_100nodes-36 95.22 ± 7% 103.80 ± 1% +9.00% (p=0.002 n=6)
PerfScheduling/SteadyStateClusterResourceClaimTemplateStructured/full_500nodes-36 9.111 ± 10% 27.215 ± 7% +198.69% (p=0.002 n=6)
geomean 45.86 64.26 +40.12%