mirror of
https://github.com/optim-enterprises-bv/kubernetes.git
synced 2025-11-12 17:46:14 +00:00
Automatic merge from submit-queue (batch tested with PRs 67042, 66480, 67053). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. ensure MatchNodeSelectorTerms() runs statelessly **What this PR does**: Fix sorting behavior in selector.go: - move sorting from NewRequirement() out to String() - add related unit tests - add unit tests in one of outer callers (pkg/apis/core/v1/helper) **Why we need it**: - Without this fix, scheduling and daemonset controller doesn't work well in some (corner) cases **Which issue(s) this PR fixes**: Fixes #66298 **Special notes for your reviewer**: Parameter `nodeSelectorTerms` in method MatchNodeSelectorTerms() is a slice, which is fundamentally a {*elements, len, cap} tuple - i.e. it's passing in a pointer. In that method, NodeSelectorRequirementsAsSelector() -> NewRequirement() is invoked, and the `matchExpressions[*].values` is passed into and **modified** via `sort.Strings(vals)`. This will cause following daemonset pod fall into an infinite create/delete loop: ```yaml apiVersion: apps/v1 kind: DaemonSet metadata: name: problem spec: selector: matchLabels: app: sleeper template: metadata: labels: app: sleeper spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - 127.0.0.2 - 127.0.0.1 containers: - name: busybox image: busybox command: ["/bin/sleep", "7200"] ``` (the problem can be stably reproduced on a local cluster started by `hack/local-up-cluster.sh`) The first time daemonset yaml is handled by apiserver and persisted in etcd with original format (original order of values was kept - 127.0.0.2, 127.0.0.1). After that, daemonset controller tries to schedule pod, and it reuses the predicates logic in scheduler component - where the values are **sorted** deeply. This not only causes the pod to be created in sorted order (127.0.0.1, 127.0.0.2), but also introduced a bug when updating daemonset - internally ds controller use a "rawMessage" (bytes of an object) to calculate hash acting as a "controller-revision-hash" to control revision rollingUpdate/rollBack, so it keeps killing "old" pod and spawning "new" pod back and forth, and fall into an infinite loop. The issue exists in `master`, `release-1.11` and `release-1.10`. **Release note**: ```release-note NONE ```