kubernetes

mirror of https://github.com/optim-enterprises-bv/kubernetes.git synced 2025-11-12 17:46:14 +00:00

Files

Kubernetes Submit Queue 00bf292cdc Merge pull request #66480 from Huang-Wei/stateless-MatchNodeSelectorTerms

Automatic merge from submit-queue (batch tested with PRs 67042, 66480, 67053). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

ensure MatchNodeSelectorTerms() runs statelessly

**What this PR does**:

Fix sorting behavior in selector.go:

- move sorting from NewRequirement() out to String()
- add related unit tests
- add unit tests in one of outer callers (pkg/apis/core/v1/helper)

**Why we need it**:
- Without this fix, scheduling and daemonset controller doesn't work well in some (corner) cases

**Which issue(s) this PR fixes**:
Fixes #66298

**Special notes for your reviewer**:
Parameter `nodeSelectorTerms` in method MatchNodeSelectorTerms() is a slice, which is fundamentally a {*elements, len, cap} tuple - i.e. it's passing in a pointer. In that method, NodeSelectorRequirementsAsSelector() -> NewRequirement() is invoked, and the `matchExpressions[*].values` is passed into and **modified** via `sort.Strings(vals)`.

This will cause following daemonset pod fall into an infinite create/delete loop:

```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: problem
spec:
  selector:
    matchLabels:
      app: sleeper
  template:
    metadata:
      labels:
        app: sleeper
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - 127.0.0.2
                - 127.0.0.1
      containers:
      - name: busybox
        image: busybox
        command: ["/bin/sleep", "7200"]
```

(the problem can be stably reproduced on a local cluster started by `hack/local-up-cluster.sh`)

The first time daemonset yaml is handled by apiserver and persisted in etcd with original format (original order of values was kept - 127.0.0.2, 127.0.0.1). After that, daemonset controller tries to schedule pod, and it reuses the predicates logic in scheduler component - where the values are **sorted** deeply. This not only causes the pod to be created in sorted order (127.0.0.1, 127.0.0.2), but also introduced a bug when updating daemonset - internally ds controller use a "rawMessage" (bytes of an object) to calculate hash acting as a "controller-revision-hash" to control revision rollingUpdate/rollBack, so it keeps killing "old" pod and spawning "new" pod back and forth, and fall into an infinite loop.

The issue exists in `master`, `release-1.11` and `release-1.10`.

**Release note**:
```release-note
NONE
```

2018-08-07 14:27:59 -07:00