Commit Graph

4870 Commits

Author SHA1 Message Date
Kubernetes Submit Queue
5fd0566ce7 Merge pull request #43652 from Random-Liu/avoid-kubelet-panic
Automatic merge from submit-queue (batch tested with PRs 43653, 43654, 43652)

CRI: Check nil pointer to avoid kubelet panic.

When working on the containerd kubernetes integration, I casually returns an empty `sandboxStatus.Linux{}`, but it cause kubelet to panic.

This won't happen when runtime returns valid data, but we should not make the assumption here.

/cc @yujuhong @feiskyer
2017-03-24 22:16:21 -07:00
NickrenREN
2f89a6bda6 optimize getPullSecretsForPod() and syncPod()
Since getPullSecretsForPod() will never return err,we do not need the second return value,and modify syncPod() function.
2017-03-25 11:05:13 +08:00
Random-Liu
9186d1568e Check nil pointer to avoid kubelet panic. 2017-03-24 17:27:15 -07:00
Kubernetes Submit Queue
a4986e38e6 Merge pull request #42556 from resouer/fix-id
Automatic merge from submit-queue (batch tested with PRs 42522, 42545, 42556, 42006, 42631)

Use pod sandbox id in checkpoint

**What this PR does / why we need it**: we should log out sandbox id when checkpoint error

**Release note**:

```NONE
```
2017-03-24 15:10:32 -07:00
Kubernetes Submit Queue
d14854fd5c Merge pull request #37698 from jsafrane/remove-all-filesystems
Automatic merge from submit-queue (batch tested with PRs 41139, 41186, 38882, 37698, 42034)

Make kubelet never delete files on mounted filesystems

With bug #27653, kubelet could remove mounted volumes and delete user data.
The bug itself is fixed, however our trust in kubelet is significantly lower.
Let's add an extra version of RemoveAll that does not cross mount boundary
(rm -rf --one-file-system).

It calls lstat(path) three times for each removed directory - once in
RemoveAllOneFilesystem and twice in IsLikelyNotMountPoint, however this way
it's platform independent and the directory that is being removed by kubelet
should be almost empty.
2017-03-24 12:33:27 -07:00
Kubernetes Submit Queue
6eaa8610a1 Merge pull request #42226 from timchenxiaoyu/reconciletypo
Automatic merge from submit-queue

fix reconcile typo
2017-03-24 10:25:27 -07:00
Klaus Ma
7c91274df2 Fix comments typo in rkt. 2017-03-24 11:31:15 +08:00
Kubernetes Submit Queue
7c24d1a665 Merge pull request #43539 from yujuhong/hostnet_ip
Automatic merge from submit-queue (batch tested with PRs 43533, 43539)

kuberuntime: don't override the pod IP for pods using host network

This fixes the issue of not passing pod IP via downward API for host network pods.
2017-03-22 15:07:18 -07:00
Yu-Ju Hong
ea868d6f7b kuberuntime: don't override the pod IP for pods using host network 2017-03-22 13:28:17 -07:00
Kubernetes Submit Queue
fb890dee06 Merge pull request #43474 from dcbw/cni-network-status
Automatic merge from submit-queue (batch tested with PRs 43465, 43529, 43474, 43521)

kubelet/cni: hook network plugin Status() up to CNI network discovery

Ensure that the plugin returns NotReady status until there is a
CNI network available which can be used to set up pods.

Fixes: https://github.com/kubernetes/kubernetes/issues/43014

I think the only reason it wasn't done like this in the first place was that the dynamic "reread /etc/cni/net.d every 10s forever" was added long after the Status() hook was.  What do you think?

@freehan @caseydavenport @luxas @jbeda
2017-03-22 12:35:11 -07:00
Dan Williams
193abffdbe kubelet/cni: hook network plugin Status() up to CNI network discovery
Ensure that the plugin returns NotReady status until there is a
CNI network available which can be used to set up pods.

Fixes: https://github.com/kubernetes/kubernetes/issues/43014
2017-03-21 15:50:39 -05:00
NickrenREN
14feb9aba8 Change AddPodToVolume() arg to volumeGidValue instead of devicePath 2017-03-21 19:07:15 +08:00
NickrenREN
a451daca0d cleanup: remove TODO(resolved) and var(unused) 2017-03-21 15:40:32 +08:00
Pengfei Ni
a16758396c Fix tiny typo 2017-03-21 14:22:33 +08:00
Random-Liu
fbc320af28 Use uid in config.go instead of pod full name. 2017-03-20 15:52:29 -07:00
Kubernetes Submit Queue
948e3754f8 Merge pull request #43368 from feiskyer/dns-policy
Automatic merge from submit-queue (batch tested with PRs 43398, 43368)

CRI: add support for dns cluster first policy

**What this PR does / why we need it**:

PR #29378 introduces ClusterFirstWithHostNet policy but only dockertools was updated to support the feature. 

This PR updates kuberuntime to support it for all runtimes.


**Which issue this PR fixes** 

fixes #43352

**Special notes for your reviewer**:

Candidate for v1.6.

**Release note**:

```release-note
NONE
```

cc @thockin @luxas @vefimova @Random-Liu
2017-03-20 13:54:33 -07:00
Pengfei Ni
95c3782043 Rewrite resolv.conf for dockershim
PR #29378 introduces ClusterFirstWithHostNet, but docker doesn't support
setting dns options togather with hostnetwork. This commit rewrites
resolv.conf same as dockertools.
2017-03-20 18:45:39 +08:00
Pengfei Ni
079158fa08 CRI: add support for dns cluster first policy
PR #29378 introduces ClusterFirstWithHostNet policy but only dockertools
was updated to support the feature. This PR updates kuberuntime to
support it for all runtimes.

Also fixes #43352.
2017-03-20 17:50:38 +08:00
Pengfei Ni
99ed3202f3 Run hack/update-bazel.sh 2017-03-20 17:48:36 +08:00
Pengfei Ni
53b5f2df48 Add unit test for MakePortsAndBindings 2017-03-20 17:47:38 +08:00
Antonio Murdaca
caa6dd2599 pkg/kubelet/remote: fix typo
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-03-20 10:12:03 +01:00
Pengfei Ni
2ddaaec199 dockershim: process protocol correctly for port mapping 2017-03-20 16:52:24 +08:00
Kubernetes Submit Queue
7bc86d84c1 Merge pull request #43116 from dchen1107/master
Automatic merge from submit-queue (batch tested with PRs 42828, 43116)

Apply taint tolerations for NoExecute for all static pods.

Fixed https://github.com/kubernetes/kubernetes/issues/42753


**Release note**:
```
Apply taint tolerations for NoExecute for all static pods.
```

cc/ @davidopp
2017-03-17 18:14:29 -07:00
Dawn Chen
d419efbe71 Fix unittest reflecting the default taint tolerations change for static
pods.
2017-03-17 14:06:34 -07:00
Dawn Chen
d26e906191 Apply taint tolerations for NoExecute for all static pods. 2017-03-17 09:50:27 -07:00
Julien Balestra
cd7c480f86 Kubelet:rkt Create any missing hostPath Volumes 2017-03-17 10:47:02 +01:00
Yu-Ju Hong
b1e6e7f774 Use the assert/require package in kubelet unit tests
This reduce the lines of code and improve readability.
2017-03-16 10:21:44 -07:00
Piotr Szczesniak
9bd05bdee4 Setup fluentd-ds-ready label in startup script not in kubelet 2017-03-16 13:18:31 +01:00
Kubernetes Submit Queue
ba25afd278 Merge pull request #40964 from tanshanshan/kubelet-unit-test
Automatic merge from submit-queue (batch tested with PRs 40964, 42967, 43091, 43115)

Improve code coverage for pkg/kubelet/status/generate.go

**What this PR does / why we need it**:

Improve code coverage for pkg/kubelet/status/generate.go  from #39559

Thanks.

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-03-15 16:08:23 -07:00
Kubernetes Submit Queue
222f69cf3c Merge pull request #43030 from yujuhong/rm_corrupted_checkpoint
Automatic merge from submit-queue (batch tested with PRs 42747, 43030)

dockershim: remove corrupted sandbox checkpoints

This is a workaround to ensure that kubelet doesn't block forever when
the checkpoint is corrupted.

This is a workaround for #43021
2017-03-14 22:56:20 -07:00
Yu-Ju Hong
48afc7d4e0 dockershim: call sync() after writing the checkpoint
This ensures the checkpoint files are persisted.
2017-03-14 18:36:51 -07:00
Pengfei Ni
91616f666a kubelet: check and enforce minimum docker api version 2017-03-15 09:28:06 +08:00
Kubernetes Submit Queue
6de28fab7d Merge pull request #42942 from vishh/gpu-cont-fix
Automatic merge from submit-queue (batch tested with PRs 42942, 42935)

[Bug] Handle container restarts and avoid using runtime pod cache while allocating GPUs

Fixes #42412

**Background**
Support for multiple GPUs is an experimental feature in v1.6. 
Container restarts were handled incorrectly which resulted in stranding of GPUs
Kubelet is incorrectly using runtime cache to track running pods which can result in race conditions (as it did in other parts of kubelet). This can result in same GPU being assigned to multiple pods.

**What does this PR do**
This PR tracks assignment of GPUs to containers and returns pre-allocated GPUs instead of (incorrectly) allocating new GPUs.
GPU manager is updated to consume a list of active pods derived from apiserver cache instead of runtime cache.
Node e2e has been extended to validate this failure scenario.

**Risk**
Minimal/None since support for GPUs is an experimental feature that is turned off by default. The code is also isolated to GPU manager in kubelet.

**Workarounds**
In the absence of this PR, users can mitigate the original issue by setting `RestartPolicyNever`  in their pods.
There is no workaround for the race condition caused by using the runtime cache though.
Hence it is worth including this fix in v1.6.0.

cc @jianzhangbjz @seelam @kubernetes/sig-node-pr-reviews 

Replaces #42560
2017-03-14 10:19:17 -07:00
Lou Yihua
63f1b077dc Add Host field to TCPSocketAction
Currently, TCPSocketAction always uses Pod's IP in connection. But when a
pod uses the host network, sometimes firewall rules may prevent kubelet
from connecting through the Pod's IP. This PR introduces the 'Host' field
for TCPSocketAction, and if it is set to non-empty string, the probe will
be performed on the configured host rather than the Pod's IP. This gives
users an opportunity to explicitly specify 'localhost' as the target for
the above situations.
2017-03-14 23:48:28 +08:00
Kubernetes Submit Queue
f1e9004da9 Merge pull request #42927 from Random-Liu/fix-kubelet-panic
Automatic merge from submit-queue (batch tested with PRs 42802, 42927, 42669, 42988, 43012)

Fix kubelet panic in cgroup manager.

Fixes https://github.com/kubernetes/kubernetes/issues/42920.
Fixes https://github.com/kubernetes/kubernetes/issues/42875
Fixes #42927 
Fixes #43059

Check the error in walk function, so that we don't use info when there is an error.

@yujuhong @dchen1107 @derekwaynecarr @vishh /cc @kubernetes/sig-node-bugs
2017-03-14 07:31:31 -07:00
Yu-Ju Hong
035afab901 dockershim: remove corrupted sandbox checkpoints
This is a workaround to ensure that kubelet doesn't block forever when
the checkpoint is corrupted.
2017-03-13 15:41:01 -07:00
Random-Liu
e6341cc3c7 Fix kubelet panic in cgroup manager. 2017-03-13 12:06:08 -07:00
Vishnu kannan
ad743a922a remove dead code in gpu manager
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
Vishnu kannan
ff158090b3 use active pods instead of runtime pods in gpu manager
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
Vishnu Kannan
8ed9bff073 handle container restarts for GPUs
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
tanshanshan
26ab52a3cb fix 2017-03-13 10:00:19 +08:00
Kubernetes Submit Queue
59aa924a9b Merge pull request #42642 from fraenkel/envfrom
Automatic merge from submit-queue

Invalid environment var names are reported and pod starts

When processing EnvFrom items, all invalid keys are collected and
reported as a single event.

The Pod is allowed to start.

fixes #42583
2017-03-10 17:37:31 -08:00
timchenxiaoyu
c295514443 accurate hint 2017-03-10 16:41:51 +08:00
Kubernetes Submit Queue
d790851c8f Merge pull request #42694 from dchen1107/master
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)

Dropped docker 1.9.x support. Changed the minimumDockerAPIVersion to

1.22

cc/ @Random-Liu @yujuhong 

We talked about dropping docker 1.9.x support for a while. I just realized that we haven't really done it yet. 

```release-note
Dropped the support for docker 1.9.x and the belows. 
```
2017-03-09 15:07:00 -08:00
Dawn Chen
69eaea2fcc Merge pull request #42779 from dashpole/fix_status
[Bug Fix] Allow Status Updates for Pods that can be deleted
2017-03-09 13:23:00 -08:00
David Ashpole
e3e0bc6ce0 do not skip pods that can be deleted 2017-03-09 09:35:50 -08:00
Kubernetes Submit Queue
9cfc4f1a10 Merge pull request #42739 from yujuhong/created_time
Automatic merge from submit-queue (batch tested with PRs 42762, 42739, 42425, 42778)

FakeDockerClient: add creation timestamp

This fixes #42736
2017-03-09 02:51:38 -08:00
Kubernetes Submit Queue
4cf553f78e Merge pull request #42767 from Random-Liu/cleanup-infra-container-on-error
Automatic merge from submit-queue (batch tested with PRs 42768, 42760, 42771, 42767)

Stop sandbox container when hit network error.

Fixes https://github.com/kubernetes/kubernetes/issues/42698.

This PR stops the sandbox container when hitting a network error.
This PR also adds a unit test for it.

I'm not sure whether we should try teardown pod network after `SetUpPod` failure. We don't do that in dockertools https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockertools/docker_manager.go#L2276.

@yujuhong @freehan
2017-03-09 00:08:01 -08:00
Michael Fraenkel
c4d07466e8 Invalid environment var names are reported and pod starts
When processing EnvFrom items, all invalid keys are collected and
reported as a single event.

The Pod is allowed to start.
2017-03-09 07:21:53 +00:00
Kubernetes Submit Queue
6fac75c80a Merge pull request #42768 from yujuhong/fix_sandbox_listing
Automatic merge from submit-queue

dockershim: Fix the race condition in ListPodSandbox

In ListPodSandbox(), we
 1. List all sandbox docker containers
 2. List all sandbox checkpoints. If the checkpoint does not have a
    corresponding container in (1), we return partial result based on
    the checkpoint.

The problem is that new PodSandboxes can be created between step (1) and
(2). In those cases, we will see the checkpoints, but not the sandbox
containers. This leads to strange behavior because the partial result
from the checkpoint does not include some critical information. For
example, the creation timestamp'd be zero, and that would cause kubelet's
garbage collector to immediately remove the sandbox.

This change fixes that by getting the list of checkpoints before listing
all the containers (since in RunPodSandbox we create them in the reverse
order).
2017-03-08 21:33:31 -08:00