Commit Graph

4613 Commits

Author SHA1 Message Date
Yu-Ju Hong
1095652cb8 Add more logs to help debugging 2017-03-08 12:27:49 -08:00
xiangpengzhao
7fed242d55 Only create the symlink when container log path exists 2017-03-08 01:36:48 -05:00
Kubernetes Submit Queue
5bc7387b3c Merge pull request #42169 from ncdc/pprof-trace
Automatic merge from submit-queue (batch tested with PRs 42692, 42169, 42173)

Add pprof trace support

Add support for `/debug/pprof/trace`

Can wait for master to reopen for 1.7.

cc @smarterclayton @wojtek-t @gmarek @timothysc @jeremyeder @kubernetes/sig-scalability-pr-reviews
2017-03-07 20:10:26 -08:00
Dawn Chen
ab790b6a3a Dropped docker 1.9.x support. Changed the minimumDockerAPIVersion to
1.22
2017-03-07 17:07:07 -08:00
Kubernetes Submit Queue
1ed3aa6750 Merge pull request #42264 from yujuhong/kubemark_cri
Automatic merge from submit-queue

kubemark: enable CRI for the hollow nodes

This fixes #41488
2017-03-07 13:04:39 -08:00
Matthew Wong
1dabce9815 Print dereferenced pod status fields when logging status update 2017-03-07 15:00:54 -05:00
Yu-Ju Hong
a0f90e1490 Use FakeDockerPuller to bypass auth/keyring logic in tests 2017-03-07 10:11:49 -08:00
Yu-Ju Hong
516848c37d Various fixes for the fake docker client
* Properly return ImageNotFoundError
 * Support inject "Images" or "ImageInspects" and keep both in sync.
 * Remove the FakeDockerPuller and let FakeDockerClient subsumes its
   functinality. This reduces the overhead to maintain both objects.
 * Various small fixes and refactoring of the testing utils.
2017-03-07 10:11:49 -08:00
Kubernetes Submit Queue
5cc6a4e269 Merge pull request #42609 from intelsdi-x/test-out-of-oir
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)

Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available (e2e disabled).

#41870 was reverted because it introduced an e2e test flake. This is the same code with the e2e for OIR disabled again.

We can attempt to enable the e2e test cases one-by-one in follow-up PRs, but it would be preferable to get the main fix merged in time for 1.6 since OIR is broken on master (see #41861).

cc @timothysc
2017-03-07 08:10:46 -08:00
Andy Goldstein
b011529d8a Add pprof trace support
Add pprof trace support and --enable-contention-profiling to those
components that don't already have it.
2017-03-07 10:10:42 -05:00
Kubernetes Submit Queue
a1c5d1b80f Merge pull request #42585 from derekwaynecarr/cgroup-flake
Automatic merge from submit-queue (batch tested with PRs 42506, 42585, 42596, 42584)

provide active pods to cgroup cleanup

**What this PR does / why we need it**:
This PR provides more information for when a pod cgroup is considered orphaned.  The running pods cache is based on the runtime's view of the world.  we create pod cgroups before containers so we should just be looking at activePods.

**Which issue this PR fixes**
Fixes https://github.com/kubernetes/kubernetes/issues/42431
2017-03-06 22:20:11 -08:00
Kubernetes Submit Queue
31db570a00 Merge pull request #42497 from derekwaynecarr/lower_cgroup_names
Automatic merge from submit-queue

cgroup names created by kubelet should be lowercased

**What this PR does / why we need it**:
This PR modifies the kubelet to create cgroupfs names that are lowercased.  This better aligns us with the naming convention for cgroups v2 and other cgroup managers in ecosystem (docker, systemd, etc.)

See: https://www.kernel.org/doc/Documentation/cgroup-v2.txt
"2-6-2. Avoid Name Collisions"

**Special notes for your reviewer**:
none

**Release note**:
```release-note
kubelet created cgroups follow lowercase naming conventions
```
2017-03-06 20:43:03 -08:00
Connor Doyle
364dbc0ca5 Revert "Revert "Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available.""
- This reverts commit 60758f3fff.
- Disabled opaque integer resource end-to-end tests.
2017-03-06 17:48:09 -08:00
Derek Carr
5ce298c9aa provide active pods to cgroup cleanup 2017-03-06 17:37:26 -05:00
Dawn Chen
60758f3fff Revert "Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available." 2017-03-06 14:27:17 -08:00
Kubernetes Submit Queue
0fad9ce5e2 Merge pull request #41870 from intelsdi-x/test-out-of-oir
Automatic merge from submit-queue (batch tested with PRs 31783, 41988, 42535, 42572, 41870)

Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available.

This appears to be a regression since v1.5.0 in scheduler behavior for opaque integer resources, reported in https://github.com/kubernetes/kubernetes/issues/41861.

- [X] Add failing e2e test to trigger the regression
- [x] Restore previous behavior (pods pending due to insufficient OIR get scheduled once sufficient OIR becomes available.)
2017-03-06 11:30:24 -08:00
Derek Carr
48d822eafe cgroup names created by kubelet should be lowercased 2017-03-06 11:19:21 -05:00
Seth Jennings
ccd87fca3f kubelet: add cgroup manager metrics 2017-03-06 08:53:47 -06:00
Harry Zhang
bc644f9e04 Use pod sandbox id in checkpoint 2017-03-06 10:46:26 +08:00
Kubernetes Submit Queue
4bbf98850f Merge pull request #42500 from vishh/fix-gpu-init
Automatic merge from submit-queue

[Bug] Fix gpu initialization in Kubelet

Kubelet incorrectly fails if `AllAlpha=true` feature gate is enabled with container runtimes that are not `docker`.

Replaces #42407
2017-03-04 20:28:08 -08:00
Connor Doyle
8a42189690 Fix unbounded growth of cached OIRs in sched cache
- Added schedulercache.Resource.SetOpaque helper.
- Amend kubelet allocatable sync so that when OIRs are removed from capacity
  they are also removed from allocatable.
- Fixes #41861.
2017-03-04 09:26:22 -08:00
Kubernetes Submit Queue
f9ccee7714 Merge pull request #42435 from dashpole/timestamps_for_fsstats
Automatic merge from submit-queue (batch tested with PRs 42369, 42375, 42397, 42435, 42455)

[Bug Fix]: Avoid evicting more pods than necessary by adding Timestamps for fsstats and ignoring stale stats

Continuation of #33121.  Credit for most of this goes to @sjenning.  I added volume fs timestamps.

**why is this a bug** 
This PR attempts to fix part of https://github.com/kubernetes/kubernetes/issues/31362 which results in multiple pods getting evicted unnecessarily whenever the node runs into resource pressure. This PR reduces the chances of such disruptions by avoiding reacting to old/stale metrics.
Without this PR, kubernetes nodes under resource pressure will cause unnecessary disruptions to user workloads. 
This PR will also help deflake a node e2e test suite.

The eviction manager currently avoids evicting pods if metrics are old.  However, timestamp data is not available for filesystem data, and this causes lots of extra evictions.
See the [inode eviction test flakes](https://k8s-testgrid.appspot.com/google-node#kubelet-flaky-gce-e2e) for examples.
This should probably be treated as a bugfix, as it should help mitigate extra evictions.

cc: @kubernetes/sig-storage-pr-reviews  @kubernetes/sig-node-pr-reviews @vishh @derekwaynecarr @sjenning
2017-03-03 23:21:48 -08:00
Kubernetes Submit Queue
51a3d7b663 Merge pull request #42397 from feiskyer/fix-42396
Automatic merge from submit-queue (batch tested with PRs 42369, 42375, 42397, 42435, 42455)

Kubelet: return container runtime's version instead of CRI's one

**What this PR does / why we need it**:

With CRI enabled by default, kubelet reports the version of CRI instead of container runtime version. This PR fixes this problem.

**Which issue this PR fixes** 

Fixes #42396.

**Special notes for your reviewer**:

Should also cherry-pick to 1.6 branch.

**Release note**:

```release-note
NONE
```

cc @yujuhong  @kubernetes/sig-node-bugs
2017-03-03 23:21:46 -08:00
Kubernetes Submit Queue
2d319bd406 Merge pull request #42204 from dashpole/allocatable_eviction
Automatic merge from submit-queue

Eviction Manager Enforces Allocatable Thresholds

This PR modifies the eviction manager to enforce node allocatable thresholds for memory as described in kubernetes/community#348.
This PR should be merged after #41234. 

cc @kubernetes/sig-node-pr-reviews @kubernetes/sig-node-feature-requests @vishh 

** Why is this a bug/regression**

Kubelet uses `oom_score_adj` to enforce QoS policies. But the `oom_score_adj` is based on overall memory requested, which means that a Burstable pod that requested a lot of memory can lead to OOM kills for Guaranteed pods, which violates QoS. Even worse, we have observed system daemons like kubelet or kube-proxy being killed by the OOM killer.
Without this PR, v1.6 will have node stability issues and regressions in an existing GA feature `out of Resource` handling.
2017-03-03 20:20:12 -08:00
Kubernetes Submit Queue
9cc5480918 Merge pull request #41149 from sjenning/qos-memory-limits
Automatic merge from submit-queue (batch tested with PRs 41919, 41149, 42350, 42351, 42285)

kubelet: enable qos-level memory limits

```release-note
Experimental support to reserve a pod's memory request from being utilized by pods in lower QoS tiers.
```

Enables the QoS-level memory cgroup limits described in https://github.com/kubernetes/community/pull/314

**Note: QoS level cgroups have to be enabled for any of this to take effect.**

Adds a new `--experimental-qos-reserved` flag that can be used to set the percentage of a resource to be reserved at the QoS level for pod resource requests.

For example, `--experimental-qos-reserved="memory=50%`, means that if a Guaranteed pod sets a memory request of 2Gi, the Burstable and BestEffort QoS memory cgroups will have their `memory.limit_in_bytes` set to `NodeAllocatable - (2Gi*50%)` to reserve 50% of the guaranteed pod's request from being used by the lower QoS tiers.

If a Burstable pod sets a request, its reserve will be deducted from the BestEffort memory limit.

The result is that:
- Guaranteed limit matches root cgroup at is not set by this code
- Burstable limit is `NodeAllocatable - Guaranteed reserve`
- BestEffort limit is `NodeAllocatable - Guaranteed reserve - Burstable reserve`

The only resource currently supported is `memory`; however, the code is generic enough that other resources can be added in the future.

@derekwaynecarr @vishh
2017-03-03 16:44:39 -08:00
Vishnu kannan
038585626d fix gpu initialization
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-03-03 12:13:01 -08:00
Klaus Ma
41c4426a30 Removed un-necessary empty line. 2017-03-03 19:43:48 +08:00
David Ashpole
a90c7951d4 add volume timestamps 2017-03-02 15:01:59 -08:00
Seth Jennings
cc50aa9dfb kubelet: enable qos-level memory request reservation 2017-03-02 15:04:13 -06:00
Seth Jennings
c5faf1c156 kubelet: eviction: add timestamp to FsStats 2017-03-02 11:20:24 -08:00
David Ashpole
ac612eab8e eviction manager changes for allocatable 2017-03-02 07:36:24 -08:00
Kubernetes Submit Queue
00c0c8332f Merge pull request #42273 from smarterclayton/evaluate_probes
Automatic merge from submit-queue (batch tested with PRs 41672, 42084, 42233, 42165, 42273)

ExecProbes should be able to do simple env var substitution

For containers that don't have bash, we should support env substitution
like we do on command and args. However, without major refactoring
valueFrom is not supportable from inside the prober. For now, implement
substitution based on hardcoded env and leave TODOs for future work.

Improves the state of #40846, will spawn a follow up issue for future refactoring after CRI settles down
2017-03-02 03:20:29 -08:00
Kubernetes Submit Queue
5ee6ba2f59 Merge pull request #42223 from Random-Liu/dockershim-better-implement-cri
Automatic merge from submit-queue (batch tested with PRs 41980, 42192, 42223, 41822, 42048)

CRI: Make dockershim better implements CRI.

When thinking about CRI Validation test, I found that `PodSandboxStatus.Linux.Namespaces.Options.HostPid` and `PodSandboxStatus.Linux.Namespaces.Options.HostIpc` are not populated. Although they are not used by kuberuntime now, we should populate them to conform to CRI.

/cc @yujuhong @feiskyer
2017-03-02 00:59:19 -08:00
Pengfei Ni
1986b78e0e Version(): return runtime version instead of CRI 2017-03-02 14:42:37 +08:00
Kubernetes Submit Queue
fa0387c9fe Merge pull request #42195 from Random-Liu/cri-support-non-json-logging
Automatic merge from submit-queue (batch tested with PRs 41931, 39821, 41841, 42197, 42195)

Use `docker logs` directly if the docker logging driver is not `json-file`

Fixes https://github.com/kubernetes/kubernetes/issues/41996.

Post the PR first, I still need to manually test this, because we don't have test coverage for journald logging pluggin.

@yujuhong @dchen1107 
/cc @kubernetes/sig-node-pr-reviews
2017-03-01 20:08:08 -08:00
Kubernetes Submit Queue
dfe05e0512 Merge pull request #41753 from derekwaynecarr/burstable-cpu-shares
Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212)

Burstable QoS cgroup has cpu shares assigned

**What this PR does / why we need it**:
This PR sets the Burstable QoS cgroup cpu shares value to the sum of the pods cpu requests in that tier.  We need it for proper evaluation of CPU shares in the new QoS hierarchy.

**Special notes for your reviewer**:
It builds against the framework proposed for https://github.com/kubernetes/kubernetes/pull/41833
2017-03-01 15:30:34 -08:00
Kubernetes Submit Queue
ddd8b5c1cf Merge pull request #41644 from derekwaynecarr/ensure-pod-cgroup-deleted
Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212)

Ensure pod cgroup is deleted prior to deletion of pod

**What this PR does / why we need it**:
This PR ensures that the kubelet removes the pod cgroup sandbox prior to deletion of a pod from the apiserver.   We need this to ensure that the default behavior in the kubelet is to not leak resources.
2017-03-01 15:30:30 -08:00
Kubernetes Submit Queue
d5ff69468e Merge pull request #29378 from vefimova/docker_resolv
Automatic merge from submit-queue

Re-writing of the resolv.conf file generated by docker

Fixes #17406 

Docker 1.12 will contain feature "The option --dns and --net=host should not be mutually exclusive" (docker/docker#22408)
This patch adds optional support for this ability in kubelet (for now in case of "hostNetwork: true" set all dns settings are ignored if any).
To enable feature use newly added kubelet flag: --allow-dns-for-hostnet=true
2017-03-01 14:19:08 -08:00
Derek Carr
21a899cf85 Ensure pod cgroup is deleted prior to deletion of pod 2017-03-01 15:29:36 -05:00
Derek Carr
1947e76e91 Set Burstable QOS Cgroup cpu.shares 2017-03-01 14:51:34 -05:00
Random-Liu
7c261bfed7 Use docker logs directly if the docker logging driver is not
supported.
2017-03-01 10:50:11 -08:00
Yu-Ju Hong
1759b87ffe Generate valid container id in fake docker client. 2017-03-01 10:33:08 -08:00
vefimova
fc8a37ec86 Added ability for Docker containers to set usage of dns settings along with hostNetwork is true
Introduced chages:
   1. Re-writing of the resolv.conf file generated by docker.
      Cluster dns settings aren't passed anymore to docker api in all cases, not only for pods with host network:
      the resolver conf will be overwritten after infra-container creation to override docker's behaviour.

   2. Added new one dnsPolicy - 'ClusterFirstWithHostNet', so now there are:
      - ClusterFirstWithHostNet - use dns settings in all cases, i.e. with hostNet=true as well
      - ClusterFirst - use dns settings unless hostNetwork is true
      - Default

Fixes #17406
2017-03-01 17:10:00 +00:00
Hemant Kumar
2d3008fc56 Implement support for mount options in PVs
Add support for mount options via annotations on PVs
2017-03-01 11:50:40 -05:00
Kubernetes Submit Queue
ed479163fa Merge pull request #42116 from vishh/gpu-experimental-support
Automatic merge from submit-queue

Extend experimental support to multiple Nvidia GPUs

Extended from #28216

```release-note
`--experimental-nvidia-gpus` flag is **replaced** by `Accelerators` alpha feature gate along with  support for multiple Nvidia GPUs. 
To use GPUs, pass `Accelerators=true` as part of `--feature-gates` flag.
Works only with Docker runtime.
```

1. Automated testing for this PR is not possible since creation of clusters with GPUs isn't supported yet in GCP.
1. To test this PR locally, use the node e2e.
```shell
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS=GPU SKIP="" make test-e2e-node
```

TODO:

- [x] Run manual tests
- [x] Add node e2e
- [x] Add unit tests for GPU manager (< 100% coverage)
- [ ] Add unit tests in kubelet package
2017-03-01 04:52:50 -08:00
Kubernetes Submit Queue
f68c824f95 Merge pull request #42139 from Random-Liu/unify-fake-runtime-helper
Automatic merge from submit-queue (batch tested with PRs 41921, 41695, 42139, 42090, 41949)

Unify fake runtime helper in kuberuntime, rkt and dockertools.

Addresses https://github.com/kubernetes/kubernetes/pull/42081#issuecomment-282429775.

Add `pkg/kubelet/container/testing/fake_runtime_helper.go`, and change `kuberuntime`, `rkt` and `dockertools` to use it.

@yujuhong This is a small unit test refactoring PR. Could you help me review it?
2017-03-01 04:10:04 -08:00
Kubernetes Submit Queue
1351324bed Merge pull request #41833 from sjenning/qos-refactor
Automatic merge from submit-queue (batch tested with PRs 38676, 41765, 42103, 41833, 41702)

kubelet: cm: refactor QoS logic into seperate interface

This commit has no functional change.  It refactors the QoS cgroup logic into a new `QOSContainerManager` interface to allow for better isolation for QoS cgroup features coming down the pike.

This is a breakout of the refactoring component of my QoS memory limits PR https://github.com/kubernetes/kubernetes/pull/41149 which will need to be rebased on top of this.

@vishh @derekwaynecarr
2017-03-01 01:44:10 -08:00
Kubernetes Submit Queue
9f3343df40 Merge pull request #42015 from dashpole/min_timeout_eviction
Automatic merge from submit-queue (batch tested with PRs 42162, 41973, 42015, 42115, 41923)

Increase Min Timeout for kill pod

Should mitigate #41347, which describes flakes in the inode eviction test due to "GracePeriodExceeded" errors.

When we use gracePeriod == 0, as we do in eviction, the pod worker currently sets a timeout of 2 seconds to kill a pod.
We are hitting this timeout fairly often during eviction tests, causing extra pods to be evicted (since the eviction manager "fails" to evict that pod, and kills the next one).

This PR increases the timeout from 2 seconds to 4, although we could increase it even more if we think that would be appropriate.

cc @yujuhong @vishh @derekwaynecarr
2017-02-28 22:06:01 -08:00
Kubernetes Submit Queue
91e1933f9f Merge pull request #42149 from Random-Liu/check-infra-container-image-existence
Automatic merge from submit-queue (batch tested with PRs 42216, 42136, 42183, 42149, 36828)

Check infra container image existence before pulling.

Fixes https://github.com/kubernetes/kubernetes/issues/42040.

This PR:
* Fixes https://github.com/kubernetes/kubernetes/issues/42040 by checking image existence before pulling.
* Add unit test for it.
* Fix a potential panic at https://github.com/kubernetes/kubernetes/compare/master...Random-Liu:check-infra-container-image-existence?expand=1#diff-e2eefa11d78ba95197ce406772c18c30R421.

@yujuhong
2017-02-28 21:17:02 -08:00
Clayton Coleman
ce62f3d4a0 ExecProbes should be able to do simple env var substitution
For containers that don't have bash, we should support env substitution
like we do on command and args. However, without major refactoring
valueFrom is not supportable from inside the prober. For now, implement
substitution based on hardcoded env and leave TODOs for future work.
2017-02-28 22:46:04 -05:00