Commit Graph

1470 Commits

Author SHA1 Message Date
Ryan Hallisey
dbb92f9836 Use ensure-temp-dir in the common.sh script
Instead of having an ensure-temp-dir function in multiple
places, add it to the common.sh script which is sourced by
all the providers.
2017-01-19 09:30:50 -05:00
Maisem Ali
52b6c9bb41 Adding cos as an alias for gci. 2017-01-18 15:14:25 -08:00
Kubernetes Submit Queue
b29d9cdbcf Merge pull request #39898 from ixdy/bazel-release-tars
Automatic merge from submit-queue

Build release tars using bazel

**What this PR does / why we need it**: builds equivalents of the various kubernetes release tarballs, solely using bazel.

For example, you can now do
```console
$ make bazel-release
$ hack/e2e.go -v -up -test -down
```

**Special notes for your reviewer**: this is currently dependent on 3b29803eb5, which I have yet to turn into a pull request, since I'm still trying to figure out if this is the best approach.

Basically, the issue comes up with the way we generate the various server docker image tarfiles and load them on nodes:
* we `md5sum` the binary being encapsulated (e.g. kube-proxy) and save that to `$binary.docker_tag` in the server tarball
* we then build the docker image and tag using that md5sum (e.g. `gcr.io/google_containers/kube-proxy:$MD5SUM`)
* we `docker save` this image, which embeds the full tag in the `$binary.tar` file.
* on cluster startup, we `docker load` these tarballs, which are loaded with the tag that we'd created at build time. the nodes then use the `$binary.docker_tag` file to find the right image.

With the current bazel `docker_build` rule, the tag isn't saved in the docker image tar, so the node is unable to find the image after `docker load`ing it.

My changes to the rule save the tag in the docker image tar, though I don't know if there are subtle issues with it. (Maybe we want to only tag when `--stamp` is given?)

Also, the docker images produced by bazel have the timestamp set to the unix epoch, which is not great for debugging. Might be another thing to change with a `--stamp`.

Long story short, we probably need to follow up with bazel folks on the best way to solve this problem.

**Release note**:

```release-note
NONE
```
2017-01-18 14:24:48 -08:00
Kubernetes Submit Queue
76d023ca90 Merge pull request #40094 from zmerlynn/cvm-v20170117
Automatic merge from submit-queue (batch tested with PRs 36467, 36528, 39568, 40094, 39042)

Bump GCE to container-vm-v20170117

Base image update only, no kubelet or Docker updates.

```release-note
Update GCE ContainerVM deployment to container-vm-v20170117 to pick up CVE fixes in base image.
```
2017-01-18 13:37:12 -08:00
Zach Loafman
a0b8fd618f Bump GCE to container-vm-v20170117
Base image update only, no kubelet or Docker updates.
2017-01-18 10:50:17 -08:00
Kubernetes Submit Queue
6dfe5c49f6 Merge pull request #38865 from vwfs/ext4_no_lazy_init
Automatic merge from submit-queue

Enable lazy initialization of ext3/ext4 filesystems

**What this PR does / why we need it**: It enables lazy inode table and journal initialization in ext3 and ext4.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #30752, fixes #30240

**Release note**:
```release-note
Enable lazy inode table and journal initialization for ext3 and ext4
```

**Special notes for your reviewer**:
This PR removes the extended options to mkfs.ext3/mkfs.ext4, so that the defaults (enabled) for lazy initialization are used.

These extended options come from a script that was historically located at */usr/share/google/safe_format_and_mount* and later ported to GO so this dependency to the script could be removed. After some search, I found the original script here: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/legacy/google-startup-scripts/usr/share/google/safe_format_and_mount

Checking the history of this script, I found the commit [Disable lazy init of inode table and journal.](4d7346f7f5). This one introduces the extended flags with this description:
```
Now that discard with guaranteed zeroing is supported by PD,
initializing them is really fast and prevents perf from being affected
when the filesystem is first mounted.
```

The problem is, that this is not true for all cloud providers and all disk types, e.g. Azure and AWS. I only tested with magnetic disks on Azure and AWS, so maybe it's different for SSDs on these cloud providers. The result is that this performance optimization dramatically increases the time needed to format a disk in such cases.

When mkfs.ext4 is told to not lazily initialize the inode tables and the check for guaranteed zeroing on discard fails, it falls back to a very naive implementation that simply loops and writes zeroed buffers to the disk. Performance on this highly depends on free memory and also uses up all this free memory for write caching, reducing performance of everything else in the system. 

As of https://github.com/kubernetes/kubernetes/issues/30752, there is also something inside kubelet that somehow degrades performance of all this. It's however not exactly known what it is but I'd assume it has something to do with cgroups throttling IO or memory. 

I checked the kernel code for lazy inode table initialization. The nice thing is, that the kernel also does the guaranteed zeroing on discard check. If it is guaranteed, the kernel uses discard for the lazy initialization, which should finish in a just few seconds. If it is not guaranteed, it falls back to using *bio*s, which does not require the use of the write cache. The result is, that free memory is not required and not touched, thus performance is maxed and the system does not suffer.

As the original reason for disabling lazy init was a performance optimization and the kernel already does this optimization by default (and in a much better way), I'd suggest to completely remove these flags and rely on the kernel to do it in the best way.
2017-01-18 09:09:52 -08:00
Jeff Grafton
bc4b6ac397 Build release tarballs in bazel and add make bazel-release rule 2017-01-13 16:17:44 -08:00
Jordan Liggitt
d94bb26776 Conditionally write token file entries 2017-01-13 17:59:46 -05:00
Kubernetes Submit Queue
d50c027d0c Merge pull request #39537 from liggitt/legacy-policy
Automatic merge from submit-queue (batch tested with PRs 39803, 39698, 39537, 39478)

include bootstrap admin in super-user group, ensure tokens file is correct on upgrades

Fixes https://github.com/kubernetes/kubernetes/issues/39532

Possible issues with cluster bring-up scripts:

- [x] known_tokens.csv and basic_auth.csv is not rewritten if the file already exists
  * new users (like the controller manager) are not available on upgrade
  * changed users (like the kubelet username change) are not reflected
  * group additions (like the addition of admin to the superuser group) don't take effect on upgrade
  * this PR updates the token and basicauth files line-by-line to preserve user additions, but also ensure new data is persisted
- [x] existing 1.5 clusters may depend on more permissive ABAC permissions (or customized ABAC policies). This PR adds an option to enable existing ABAC policy files for clusters that are upgrading

Follow-ups:
- [ ] both scripts are loading e2e role-bindings, which only be loaded in e2e tests, not in normal kube-up scenarios
- [ ] when upgrading, set the option to use existing ABAC policy files
- [ ] update bootstrap superuser client certs to add superuser group? ("We also have a certificate that "used to be" a super-user. On GCE, it has CN "kubecfg", on GKE it's "client"")
- [ ] define (but do not load by default) a relaxed set of RBAC roles/rolebindings matching legacy ABAC, and document how to load that for new clusters that do not want to isolate user permissions
2017-01-12 15:06:31 -08:00
Jordan Liggitt
968b0b30cf Update token users if needed 2017-01-11 17:21:12 -05:00
Jordan Liggitt
21b422fccc Allow enabling ABAC authz 2017-01-11 17:20:51 -05:00
Jordan Liggitt
1fe517e96a Include admin in super-user group 2017-01-11 17:20:42 -05:00
Euan Kemp
eeef293ee2 container-linux: restart rkt-api on failure
This works around a flake I saw which had the same root cause as
https://github.com/coreos/rkt/issues/3513.

This will potentially help reduce the impact of such future problems as
well.
2017-01-11 00:25:14 -08:00
Kubernetes Submit Queue
ebc8e40694 Merge pull request #39691 from yujuhong/bump_timeout
Automatic merge from submit-queue (batch tested with PRs 39694, 39383, 39651, 39691, 39497)

Bump container-linux and gci timeout for docker health check

The command `docker ps` can take longer time to respond under heavy load or
when encountering some known issues. In these cases, the containers are running
fine, so aggressive health check could cause serious disruption. Bump the
timeout to 60s to be consistent with the debian-based containerVM.

This addresses #38588
2017-01-10 21:25:16 -08:00
Kubernetes Submit Queue
addc6cae4a Merge pull request #38212 from mikedanese/kubeletauth
Automatic merge from submit-queue (batch tested with PRs 38212, 38792, 39641, 36390, 39005)

Generate a kubelet CA and kube-apiserver cert-pair for kubelet auth.

cc @cjcullen
2017-01-10 19:48:09 -08:00
Yu-Ju Hong
4e87973a9b Bump container-linux and gci timeout for docker health check
The command `docker ps` can take longer time to respond under heavy load or
when encountering some known issues. In these cases, the containers are running
fine, so aggressive health check could cause serious disruption. Bump the
timeout to 60s to be consistent with the debian-based containerVM.
2017-01-10 13:07:21 -08:00
Kubernetes Submit Queue
8ef6902516 Merge pull request #39451 from euank/remove-abac
Automatic merge from submit-queue

cluster/cl: move abac to rbac

See #39092

We based off of GCI in the brief time where it was using abac.

fixes #39395

cc @yifan-gu 

**Release note**:
```release-note
NONE
```
2017-01-05 12:31:17 -08:00
Euan Kemp
c1afc4a3d8 cluster/cl: move abac to rbac
See #39092

We based off of GCI in the brief time where it was using abac.
2017-01-04 16:10:59 -08:00
Mike Danese
3ab0e37cc6 implement upgrades 2017-01-04 11:45:57 -08:00
CJ Cullen
d0997a3d1f Generate a kubelet CA and kube-apiserver cert-pair for kubelet auth.
Plumb through to kubelet/kube-apiserver on gci & cvm.
2017-01-03 14:30:45 -08:00
Zach Loafman
a3b363000d Fix AWS break injected by kubernetes/kubernetes#39020 2017-01-03 13:52:02 -08:00
deads2k
ecd23a0217 remove abac authorizer from e2e 2017-01-03 07:53:03 -05:00
Yifan Gu
dd59aa1c3b cluster/gce: Rename coreos to container-linux. 2016-12-30 15:32:02 -08:00
Kubernetes Submit Queue
274a9f0f70 Merge pull request #38927 from luxas/remove_maintainer
Automatic merge from submit-queue

Remove all MAINTAINER statements in the codebase as they are deprecated

**What this PR does / why we need it**:
ref: https://github.com/docker/docker/pull/25466

**Release note**:

```release-note
Remove all MAINTAINER statements in Dockerfiles in the codebase as they are deprecated by docker
```
@ixdy @thockin (who else should be notified?)
2016-12-29 16:41:24 -08:00
deads2k
19391164b9 add additional e2e rbac bindings to match existing users 2016-12-21 16:24:45 -05:00
deads2k
2e2a2e4b94 update gce for RBAC, controllers, proxy, kubelet (p1) 2016-12-21 13:51:49 -05:00
deads2k
8360bc1a9f create kubelet client cert with correct group 2016-12-20 14:18:17 -05:00
Alexander Block
13a2bc8afb Enable lazy initialization of ext3/ext4 filesystems 2016-12-18 11:08:51 +01:00
Euan Kemp
028a0140d0 cluster/coreos: delete mounter
We don't use this bit of gci currently.
2016-12-17 21:36:32 -08:00
Euan Kemp
13afe18ab4 cluster/coreos: update to gci based implementation
This update includes significant refactoring. It moves almost all of the
logic into bash scripts, modeled after the `gci` cluster scripts.

The primary differences between the two are the following:
1. Use of the `/opt/kubernetes` directory over `/home/kubernetes`
2. Support for rkt as a runtime
3. No use of logrotate
4. No use of `/etc/default/`
5. No logic related to noexec mounts or gci-specific firewall-stuff
2016-12-17 21:36:31 -08:00
Euan Kemp
e2644bb442 cluster/gce: copy gci -> coreos
This is for reviewing ease as the following commits introduce changes
to make the coreos kube-up deployment share significant code with the
gci code.
2016-12-17 21:36:30 -08:00
Lucas Käldström
3c5b5f5963 Remove all MAINTAINER statements in the codebase as they aren't very useful and now deprecated 2016-12-17 20:34:10 +02:00
Kubernetes Submit Queue
a4577e70ab Merge pull request #38808 from du2016/change-heapster-version
Automatic merge from submit-queue (batch tested with PRs 38906, 38808)

change the version in the yaml file

change the version in heapster-controller.yaml with image version
2016-12-17 00:41:24 -08:00
Euan Kemp
9a8c6ac41e cluster/gce/coreos: add OWNERS 2016-12-16 14:08:54 -08:00
Piotr Szczesniak
a52637f09f Migrated fluentd to daemon set 2016-12-15 13:48:32 +01:00
du2016
90e2c31fa7 change the version in the yaml file 2016-12-15 07:14:19 -05:00
Jeff Grafton
27d096d27d Rename build-tools/ back to build/ 2016-12-14 13:42:15 -08:00
Kubernetes Submit Queue
911d10654c Merge pull request #38638 from madhusudancs/fed-bootstrap-e2e-logs-firewall
Automatic merge from submit-queue

Use the cluster name in the names of the firewall rules that allow cluster-internal traffic to disambiguate the rules belonging to different clusters.

Also dropping the network name from these firewall rule names.

Network name was used to disambiguate firewall rules in a given network.
However, since two clusters cannot share a name in a GCE project, this
sufficiently disambiguates the firewall rule names. A potential confusion
arises when someone tries to create a firewall rule with the same name
in a different network, but that's also an indication that they shouldn't
be doing that.


@jszczepkowski due to PR #33094
@ixdy for test-infra

cc @kubernetes/sig-federation @nikhiljindal
2016-12-13 22:07:04 -08:00
Amey Deshpande
5ec42e6a25 Ensure the GCI metadata files do not have whitespace at the end
Fixes #36708
2016-12-13 13:41:54 -08:00
Madhusudan.C.S
174856509e Dropping the network name from the internal master and node firewall rules.
Network name was used to disambiguate firewall rules in a given network.
However, since two clusters cannot share a name in a GCE project, this
sufficiently disambiguates the firewall rule names. A potential confusion
arises when someone tries to create a firewall rule with the same name
in a different network, but that's also an indication that they shouldn't
be doing that.
2016-12-13 11:21:14 -08:00
Kubernetes Submit Queue
18d05c7d56 Merge pull request #38640 from mtaufen/gci-version-env
Automatic merge from submit-queue

Allow GCI_VERSION to come from env

This is to facilitate GCI tip vs. K8s tip testing; we need to
dynamically set the version of GCI to stay current with their
latest canary (latest of the "gci-base" prefixed images).
2016-12-13 09:54:45 -08:00
Michael Taufen
fe4552057e Allow GCI_VERSION to come from env
This is to facilitate GCI tip vs. K8s tip testing; we need to
dynamically set the version of GCI to stay current with their
latest canary (latest of the "gci-base" prefixed images).
2016-12-12 11:19:56 -08:00
Madhusudan.C.S
d92cf4df5e Use the cluster name in the names of the firewall rules that allow cluster-internal traffic to disambiguate the rules belonging to different clusters. 2016-12-12 10:58:53 -08:00
Jerzy Szczepkowski
b01e3c1e17 Fixed detection of master during creation of multizone nodes.
Fixed detection of master during creation of multizone nodes.
2016-12-12 15:46:39 +01:00
Kubernetes Submit Queue
37cd01dc8c Merge pull request #38438 from MrHohn/addon-manager-coreos
Automatic merge from submit-queue

Keeps addon manager yamls in sync

From #38437.

We should have kept all addon manager YAML files in sync. This does not fix the release scripts issue, but we should still have this.

@mikedanese @ixdy
2016-12-11 11:41:35 -08:00
Kubernetes Submit Queue
d8c925319a Merge pull request #38523 from MrHohn/kube-dns-rename
Automatic merge from submit-queue (batch tested with PRs 38058, 38523)

Renames kube-dns configure files from skydns* to kubedns*

`skydns-` prefix and `-rc` suffix are confusing and misleading. Renaming it to `kubedns` in existing yaml files and scripts.

@bowei @thockin
2016-12-10 17:04:53 -08:00
Zihong Zheng
4ad06df18f Renames kube-dns configure files from skydns* to kubedns* 2016-12-08 20:01:19 -08:00
Zihong Zheng
95910cc40b Keeps addon manager yamls in sync 2016-12-08 19:54:14 -08:00
Tim St. Clair
759e9f5370 Bump Container VM to latest version
- Enables kernel softlockup detection
- Removes iSCSI support
2016-12-08 18:25:18 -08:00
Kubernetes Submit Queue
cafba0b94e Merge pull request #38291 from justinsb/fix_38920
Automatic merge from submit-queue (batch tested with PRs 36543, 38189, 38289, 38291, 36724)

kube-up: Only specify ETCD_QUORUM_READ if non-empty
2016-12-07 11:40:19 -08:00