Commit Graph

8604 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
dfb6993947 Merge pull request #89182 from dims/just-use-runtime-numcpu
Just use runtime.NumCPU on windows
2020-03-19 06:05:51 -07:00
Odin Ugedal
2830827442 Add support for removing unsupported huge page sizes
When kubelet is restarted, it will now remove the resources for huge
page sizes no longer supported. This is required when:
- node disables huge pages
- changing the default huge page size in older versions of linux
(because it will then only support the newly set default).
- Software updates that change what sizes are supported (eg. by changing
boot parameters).
2020-03-19 13:08:08 +01:00
Kubernetes Prow Robot
34ad7d1984 Merge pull request #88450 from shikanon/fix/golintTypo
fix typos error in handlers_test.go file
2020-03-18 14:24:44 -07:00
Kubernetes Prow Robot
0c8ac83e04 Merge pull request #88871 from dashpole/fix_oom
Use the container whose limit is hit for system OOMs
2020-03-17 19:27:54 -07:00
Davanum Srinivas
825f99c396 run update-vendor.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-03-17 21:26:07 -04:00
Davanum Srinivas
0c52ffe08f make local copy of JSONLog
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-03-17 21:25:55 -04:00
Kubernetes Prow Robot
761c72f691 Merge pull request #88348 from tedyu/image-not-nil
Check that ImageInspect pointer is not nil
2020-03-17 16:21:01 -07:00
Kubernetes Prow Robot
ffc87f2d0c Merge pull request #88266 from mattjmcnaughton/mattjmcnaughton/delete-pluginwatcher-DOS-TODO
Delete TODO around implementing rate limiting to protect against DOS
2020-03-17 16:20:34 -07:00
Davanum Srinivas
25c3ddf22e Just use runtime.NumCPU on windows
docker folks added NumCPU implementation for windows that
supported hot-plugging of CPUs. The implementation used the
GetProcessAffinityMask to be able to check which CPUs are
active as well.
3707a76921

The golang "runtime" package has also bene using GetProcessAffinityMask
since 1.6 beta1:
6410e67a1e

So we don't seem to need the sysinfo.NumCPU from docker/docker.

(Note that this is PR is an effort to get away from dependencies from
docker/docker)

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-03-17 15:53:52 -04:00
Eric Mountain
22e0ee768b Removes container RefManager 2020-03-16 14:30:57 +01:00
Byonggon Chun
a3047672d0 move pkg/kubelet/cm/cpumanager/containermap to pkg/kubelet/cm/containermap for reusing
containerMap is used in CPU Manager to store all containers information in the node.
containerMap provides a mapping from (pod, container) -> containerID for all containers a pod
It is reusable in another component in pkg/kubelet/cm which needs to track changes of all containers in the node.

Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
2020-03-14 02:38:51 +09:00
Giuseppe Scrivano
bb5ed1b797 kubelet: add initial support for cgroupv2
do a conversion from the cgroups v1 limits to cgroups v2.

e.g. cpu.shares on cgroups v1 has a range of [2-262144] while the
equivalent on cgroups v2 is cpu.weight that uses a range [1-10000].

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-03-12 08:50:19 +01:00
Kubernetes Prow Robot
562a420d86 Merge pull request #88915 from roycaihw/fix/image-manager-data-race
Fix a data race in kubelet image manager
2020-03-11 15:04:37 -07:00
Kubernetes Prow Robot
a37d68ec05 Merge pull request #88917 from adelina-t/fix_pod_admit_handler
Implement noopWindowsResourceAllocator
2020-03-11 07:45:37 -07:00
Kubernetes Prow Robot
7989ca4324 Merge pull request #88734 from joelsmith/master
Work-around for missing memory metrics on CRI-O exited containers
2020-03-10 16:21:36 -07:00
Haowei Cai
462b75388f let image cache do sort on write instead of on read to avoid data
race and improve efficienty
2020-03-10 15:33:34 -07:00
Adelina Tuvenie
a9f834d17d Implement noopWindowsResourceAllocator
On Windows, the podAdmitHandler returned by the GetAllocateResourcesPodAdmitHandler() func
and registered by the Kubelet is nil.

We implement a noopWindowsResourceAllocator that would admit any pod for Windows in order
to be consistent with the original implementation.
2020-03-10 21:32:23 +01:00
Savitha Raghunathan
3234d34714 moving volume plugin dir to kubelet config - part 1 2020-03-10 16:22:29 -04:00
Clayton Coleman
c26653ced9 kubelet: Also set PodIPs when assign a host network PodIP
When we clobber PodIP we should also overwrite PodIPs and not rely
on the apiserver to fix it for us - this caused the Kubelet status
manager to report a large string of the following warnings when
it tried to reconcile a host network pod:

```
 I0309 19:41:05.283623    1326 status_manager.go:846] Pod status is inconsistent with cached status for pod "machine-config-daemon-jvwz4_openshift-machine-config-operator(61176279-f752-4e1c-ac8a-b48f0a68d54a)", a reconciliation should be triggered:
   &v1.PodStatus{
           ... // 5 identical fields
           HostIP:                "10.0.32.2",
           PodIP:                 "10.0.32.2",
 -         PodIPs:                []v1.PodIP{{IP: "10.0.32.2"}},
 +         PodIPs:                []v1.PodIP{},
           StartTime:             s"2020-03-09 19:41:05 +0000 UTC",
           InitContainerStatuses: nil,
           ... // 3 identical fields
   }
```

With the changes to the apiserver, this only happens once, but it is
still a bug.
2020-03-09 18:15:32 -04:00
zyu
78e2668539 Delay sorting of evictUnits slice in kuberuntime_gc
Signed-off-by: zyu <yuzhihong@gmail.com>
2020-03-09 12:24:42 -07:00
Kubernetes Prow Robot
ef672c1c2d Merge pull request #88678 from verult/slow-rxm-attach
Parallelize attach operations across different nodes for volumes that allow multi-attach
2020-03-06 13:17:21 -08:00
David Ashpole
fc6b4719fd Use the container whose limit is hit for system OOMs 2020-03-06 11:06:16 -08:00
Christian Huffman
c6fd25d100 Updated CSIDriver references 2020-03-06 08:21:26 -05:00
Kubernetes Prow Robot
5708511499 Merge pull request #88708 from mikedanese/deleteopts
Migrate clientset metav1.DeleteOpts to pass-by-value
2020-03-05 23:09:23 -08:00
Cheng Xing
ef3d66b98b Parallelize attach operations across different nodes for volumes that allow multi-attach 2020-03-05 22:22:05 -08:00
Kubernetes Prow Robot
cd0057c16a Merge pull request #88876 from nolancon/none-policy-fix
Topology Manager none policy bug fix
2020-03-05 21:40:33 -08:00
Kubernetes Prow Robot
e90c908f64 Merge pull request #88141 from tedyu/pvc-being-del
Don't try to create VolumeSpec immediately after underlying PVC is being deleted
2020-03-05 21:39:23 -08:00
Kubernetes Prow Robot
ce01a9bad0 Merge pull request #88857 from nolancon/test-fix
Check for nil cpuManager in container manager
2020-03-05 20:05:14 -08:00
Kubernetes Prow Robot
48541a0b16 Merge pull request #87650 from nolancon/beta-feature-gate
Update TopologyManager Feature Gate
2020-03-05 20:03:04 -08:00
Ted Yu
723761aa88 Don't try to create VolumeSpec immediately after underlying PVC is being deleted
Signed-off-by: Ted Yu <yuzhihong@gmail.com>
2020-03-05 16:45:50 -08:00
Mike Danese
76f8594378 more artisanal fixes
Most of these could have been refactored automatically but it wouldn't
have been uglier. The unsophisticated tooling left lots of unnecessary
struct -> pointer -> struct transitions.
2020-03-05 14:59:47 -08:00
Mike Danese
c58e69ec79 automated refactor 2020-03-05 14:59:46 -08:00
Joel Smith
da988294ec Work-around for missing metrics on CRI-O exited containers
HPA needs metrics for exited init containers before it will
take action. By setting memory and CPU usage to zero for any
containers that cAdvisor didn't provide statistics for, we
are assured that HPA will be able to correctly calculate
pod resource usage.
2020-03-05 13:20:43 -07:00
nolancon
0551d408ac Bug fix for TM none policy 2020-03-05 14:25:48 +00:00
nolancon
4baa1d967d Check for nil cpuManager 2020-03-05 07:54:33 +00:00
Kubernetes Prow Robot
7a513b575a Merge pull request #88440 from smarterclayton/container_success_fix
Ensure Kubelet always reports terminating pod container status
2020-03-04 20:13:04 -08:00
Kubernetes Prow Robot
ac32644d6e Merge pull request #87759 from klueska/upstream-move-cpu-allocation-to-pod-admit
Guarantee aligned resources across containers
2020-03-04 20:12:37 -08:00
Clayton Coleman
8bc5cb01a9 kubelet: Clear the podStatusChannel before invoking syncBatch
The status manager syncBatch() method processes the current state
of the cache, which should include all entries in the channel. Flush
the channel before we call a batch to avoid unnecessary work and
to unblock pod workers when the node is congested.

Discovered while investigating long shutdown intervals on the node
where the status channel stayed full for tens of seconds.

Add a for loop around the select statement to avoid unnecessary
invocations of the wait.Forever closure each time.
2020-03-04 13:34:25 -05:00
Clayton Coleman
8722c834e5 kubelet: Never restart containers in deleting pods
When constructing the API status of a pod, if the pod is marked for
deletion no containers should be started. Previously, if a container
inside of a terminating pod failed to start due to a container
runtime error (that populates reasonCache) the reasonCache would
remain populated (it is only updated by syncPod for non-terminating
pods) and the delete action on the pod would be delayed until the
reasonCache entry expired due to other pods.

This dramatically reduces the amount of time the Kubelet waits to
delete pods that are terminating and encountered a container runtime
error.
2020-03-04 13:34:25 -05:00
Yu-Ju Hong
2364c10e2e kubelet: Don't delete pod until all container status is available
After a pod reaches a terminal state and all containers are complete
we can delete the pod from the API server. The dispatchWork method
needs to wait for all container status to be available before invoking
delete. Even after the worker stops, status updates will continue to
be delivered and the sync handler will continue to sync the pods, so
dispatchWork gets multiple opportunities to see status.

The previous code assumed that a pod in Failed or Succeeded had no
running containers, but eviction or deletion of running pods could
still have running containers whose status needed to be reported.

This modifies earlier test to guarantee that the "fallback" exit
code 137 is never reported to match the expectation that all pods
exit with valid status for all containers (unless some exceptional
failure like eviction were to occur while the test is running).
2020-03-04 13:34:25 -05:00
Clayton Coleman
ad3d8949f0 kubelet: Preserve existing container status when pod terminated
The kubelet must not allow a container that was reported failed in a
restartPolicy=Never pod to be reported to the apiserver as success.
If a client deletes a restartPolicy=Never pod, the dispatchWork and
status manager race to update the container status. When dispatchWork
(specifically podIsTerminated) returns true, it means all containers
are stopped, which means status in the container is accurate. However,
the TerminatePod method then clears this status. This results in a
pod that has been reported with status.phase=Failed getting reset to
status.phase.Succeeded, which is a violation of the guarantees around
terminal phase.

Ensure the Kubelet never reports that a container succeeded when it
hasn't run or been executed by guarding the terminate pod loop from
ever reporting 0 in the absence of container status.
2020-03-04 13:34:24 -05:00
Kubernetes Prow Robot
9d0cbb7503 Merge pull request #88673 from jsafrane/block-feature-ga
Promote block volumes to GA
2020-03-03 12:17:12 -08:00
Kubernetes Prow Robot
06b798781a Merge pull request #88591 from smarterclayton/status_update
kubelet: Avoid sending no-op patches
2020-03-03 09:43:38 -08:00
Jan Safranek
3af671011a Generated API 2020-03-02 22:21:42 +01:00
Jan Safranek
8536787133 Add unit tests 2020-03-02 12:54:02 +01:00
nolancon
e8538d9b76 Add mutex to Topology Manager Add/RemoveContainer
This was exposed as a potential bug during e2e test debugging of this
PR.
2020-03-02 04:07:21 +00:00
nolancon
1e613e5a4c Update TopologyManager Feature Gate:
- Alpha to Beta.
- True by default.
- Remove redundant validation checks.
2020-03-02 03:32:05 +00:00
Rob Scott
132d2afca0 Adding IngressClass to networking/v1beta1
Co-authored-by: Christopher M. Luciano <cmluciano@us.ibm.com>
2020-03-01 18:17:09 -08:00
Jan Safranek
2c1b743766 Promote block volume features to GA 2020-02-28 20:48:38 +01:00
James Munnelly
d5dae04898 certificates: update controllers to understand signerName field
Signed-off-by: James Munnelly <james.munnelly@jetstack.io>
2020-02-27 15:54:31 +00:00