Per Clayton's suggestion, move stuff from cluster/lib/util.sh to
hack/lib/util.sh. Also consolidate ensure-temp-dir and use the
hack/lib/util.sh implementation rather than cluster/common.sh.
Automatic merge from submit-queue (batch tested with PRs 43726, 43643)
Make a smaller redis image for testing, based on Alpine.
**What this PR does / why we need it**:
This shrinks gcr.io/google_containers/redis from 400MB to 5MB, which should reduce flakes.
**Which issue this PR fixes**:
fixes#43631
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Moves dns-horizontal-autoscaler to a separate service account
Similar to #38816.
As one of the cluster add-ons, dns-horizontal-autoscaler is now using the default service account in kube-system namespace, which is introduced by https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/e2e-rbac-bindings/random-addon-grabbag.yaml for the ease of transition. This default service account will be removed in the future.
This PR subdivides dns-horizontal-autoscaler to a separate service account and setup the necessary permissions.
@bowei
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43518, 42467)
install/kube-up: fix some errors while install k8s through kube-up/down.sh
What this PR does / why we need it:
etcd2.3.1 will be installed follow this scripts, but k8s use etcd3 as default storage backend, so the next error will always be apprear:
API server: rpc error: code = 13 desc = transport is closing
so i think we should change the version of etcd
thank you!
Automatic merge from submit-queue
Centos provider: generate SSL certificates for etcd cluster.
**What this PR does / why we need it**:
Support secure etcd cluster for centos provider, generate SSL certificates for etcd in default. Running it w/o SSL is exposing cluster data to everyone and is not recommended. [#39462](https://github.com/kubernetes/kubernetes/pull/39462#issuecomment-271601547)
/cc @jszczepkowski @zmerlynn
**Release note**:
```release-note
Support secure etcd cluster for centos provider.
```
Automatic merge from submit-queue
added prompt warning if etcd3 media type isn't set during upgrade
**What this PR does / why we need it**:
This adds a prompt confirming the upgrade when `STORAGE_MEDIA_TYPE` is not explicitly set. This is to prevent users from accidentally upgrading to protobuf.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Alongs with docs, addresses #43669
**Special notes for your reviewer**:
Should be cherrypicked onto `release-1.6`
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41297, 42638, 42666, 43039, 42567)
Allow minion floating IPs to be optional
**What this PR does / why we need it**:
Makes the generation of floating IPs for worker nodes optional, based on an env var. To quote the original issue:
> Currently, the OpenStack installation method assigns a floating IP to every single worker node. While this is fine for smaller clusters with a good sized IP pool, it can cause issues in environments with high node counts or less IPs available.
**Which issue this PR fixes**:
https://github.com/kubernetes/kubernetes/issues/40737
**Special notes for your reviewer**:
I used the conditions section of the Heat spec: https://docs.openstack.org/developer/heat/template_guide/hot_spec.html#conditions-section
**Release note**:
```release-note
OpenStack clusters can now specify whether worker nodes are assigned a floating IP
```
Automatic merge from submit-queue (batch tested with PRs 43048, 43624, 43649)
Remove E2E_UPGRADE_TEST check in config-test.sh
Once https://github.com/kubernetes/test-infra/pull/2330 merges, the upgrade tests will drive the exact behavior they want, and we can remove the check for envvars leaked from the job env
Automatic merge from submit-queue
Update NPD rbac.
I recently enabled NPD in gke.
However, I found that in gke e2e test (https://k8s-testgrid.appspot.com/google-gke#gci-gke), npd on the node could not talk with apiserver, and reported full of following errors:
```
E0324 05:08:26.745545 1328 manager.go:160] failed to update node conditions: the server does not allow access to the requested resource (patch nodes gke-bootstrap-e2e-default-pool-fd91d792-mqh4)
E0324 05:08:37.719423 1328 manager.go:160] failed to update node conditions: the server does not allow access to the requested resource (patch nodes gke-bootstrap-e2e-default-pool-fd91d792-mqh4)
E0324 05:08:47.719694 1328 manager.go:160] failed to update node conditions: the server does not allow access to the requested resource (patch nodes gke-bootstrap-e2e-default-pool-fd91d792-mqh4)
```
I created a GKE cluster (v1.7.0-alpha.0.1483+1e879c69ecf09e) myself, and found that addon manager could not create npd binding with the following error:
```
error: error validating "/etc/kubernetes/addons/node-problem-detector/standalone/npd-binding.yaml": error validating data: couldn't find type: v1alpha1.ClusterRoleBinding; if you choose to ignore these errors, turn validation off with --validate=false
```
I found that rbac was updated to beta, but npd was missed because it was merged after 9e6a3496b4 (diff-b05c70853d9a772b310db71a61297841).
I updated rbac to beta in the master manifest and npd on the node could talk with apiserver immediately.
We must get this in 1.6 to make NPD working. @dchen1107
@dchen1107 @fabioy @liggitt
Automatic merge from submit-queue (batch tested with PRs 43546, 43544)
Default to enabling legacy ABAC policy in non-test kube-up.sh environments
Fixes https://github.com/kubernetes/kubernetes/issues/43541
In 1.5, we unconditionally stomped the abac policy file if KUBE_USER was set, and unconditionally used ABAC mode pointing to that file.
In 1.6, unless the user opts out (via `ENABLE_LEGACY_ABAC=false`), we want the same legacy policy included as a fallback to RBAC.
This PR:
* defaults legacy ABAC **on** in normal deployments
* defaults legacy ABAC **on** in upgrade E2Es (ensures combination of ABAC and RBAC works properly for upgraded clusters)
* defaults legacy ABAC **off** in non-upgrade E2Es (ensures e2e tests 1.6+ run with tightened permissions, and that default RBAC roles cover the required core components)
GKE changes to drive the `ENABLE_LEGACY_ABAC` envvar were made by @cjcullen out of band
```release-note
`kube-up.sh` using the `gce` provider enables both RBAC authorization and the permissive legacy ABAC policy that makes all service accounts superusers. To opt out of the permissive ABAC policy, export the environment variable `ENABLE_LEGACY_ABAC=false` before running `cluster/kube-up.sh`.
```
Automatic merge from submit-queue
Bump CNI consumers to v0.5.1
**What this PR does / why we need it**:
- vendored CNI plugins properly handle `DEL` on missing resources
- update CNI version refs
**Which issue this PR fixes**
fixes#43488
**Release note**:
`bumps CNI to version v0.5.1 where plugins properly handle DEL on non existent resources`
layer-nvidia-cuda does the hardware detection and sets a state that the
worker can react to.
When gpu is available, worker updates config and restarts kubelet to
enable gpu mode. Worker then notifies master that it's in gpu mode via
the kube-control relation.
When master sees that a worker is in gpu mode, it updates to privileged
mode and restarts kube-apiserver.
The kube-control interface has subsumed the kube-dns interface
functionality.
An 'allow-privileged' config option has been added to both worker and
master charms. The gpu enablement respects the value of this option;
i.e., we can't enable gpu mode if the operator has set
allow-privileged="false".
Automatic merge from submit-queue
Add an env KUBE_ENABLE_MASTER_NOSCHEDULE_TAINT and disable it by default
This PR changed master `NoSchedule` taint to opt-in.
As is discussed with @bgrant0607 @janetkuo, `NoSchedule` master taint breaks existing user workload, we should not enable it by default.
Previously, NPD required the taint because it can only support one OS distro with a specific configuration. If master and node are using different OS distros, NPD will not work either on master or node. However, we've already fixed this in https://github.com/kubernetes/kubernetes/pull/40206, so for NPD it's fine to disable the taint.
This should work, but I'll still try it in my cluster to confirm.
@kubernetes/sig-scheduling-misc @dchen1107 @mikedanese
Automatic merge from submit-queue (batch tested with PRs 43422, 43458)
Bump Cluster Autoscaler version to 0.5.0
**What this PR does / why we need it**:
This PR bumps Cluster Autoscaler version to 0.5.0. The version is the same as 0.5.0-beta2 (from the code perspective). We are just removing the -beta2 tag from the image.
**Release note**:
None.
cc: @MaciekPytel @fgrzadkowski @wojtek-t
Automatic merge from submit-queue
Keep ResourceQuota admission at the end of the chain
Fixes#43426
Moves DefaultTolerationSeconds admission prior to ResourceQuota to keep it at the end of the chain
Automatic merge from submit-queue
Increase memory limit for fluentd-gcp
This PR increases fluentd memory limit in fluentd-gcp addon to avoid OOMs. Request is left intact
Automatic merge from submit-queue
Export KUBE_VERSION for consumption by get-kube-binaries.sh
/assign @ixdy
https://github.com/kubernetes/kubernetes/pull/43331 will not have any effect until we update get-kube.sh to export KUBE_VERSION