Commit Graph

8070 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
792fe793a1 Merge pull request #86946 from cchord/fix_typo
fix typo
2020-01-08 14:46:24 -08:00
Kubernetes Prow Robot
fd0358fd21 Merge pull request #86689 from klueska/upstream-fix-cpumanager-v1-state-checksum
Lock checksum calculation for v1 CPUManager state to pre 1.18 logic
2020-01-08 02:57:40 -08:00
Jiahao Zhu
680df17f39 fix typo 2020-01-08 15:48:58 +08:00
Kubernetes Prow Robot
8dca390262 Merge pull request #84927 from mattjmcnaughton/mattjmcnaughton/fix-kubelet-config-common
Fix golint failures for pkg/kubelet/config/...
2020-01-07 21:09:40 -08:00
Kubernetes Prow Robot
49e24adf3e Merge pull request #86832 from mattjmcnaughton/mattjmcnaughton/remove-dead-code-in-fake-docker-client
Remove dead code in fake docker client
2020-01-07 07:36:18 -08:00
Kubernetes Prow Robot
f3df7a2fdb Merge pull request #86727 from mattjmcnaughton/mattjmcnaughton/remove-recorder-PastEventf
Remove `recorder.PastEventf` method
2020-01-07 04:38:49 -08:00
Kubernetes Prow Robot
dd5272b76f Merge pull request #86575 from gongguan/nkubemark
kubemark use remote cri
2020-01-07 03:20:46 -08:00
Kubernetes Prow Robot
195e8e3ad9 Merge pull request #86844 from mattjmcnaughton/mattjmcnaughton/update-cadvisor-stats-provider-comment
Correct comment around which integrations require cadvisor_stats
2020-01-07 01:13:14 -08:00
Kubernetes Prow Robot
8b8f2aa4a5 Merge pull request #85431 from irbull/api-doc
Add public documentation for kubelet/apis/config
2020-01-06 23:12:18 -08:00
louisgong
324e5ce7e3 hollow-node use remote CRI 2020-01-07 11:00:45 +08:00
Kubernetes Prow Robot
59b4933fb8 Merge pull request #86724 from gongguan/fix-fake-CRI
fix fake remote CRI
2020-01-06 18:06:57 -08:00
Kubernetes Prow Robot
49bc696614 Merge pull request #86251 from bboreham/pleg-last-seen-metric
Kubelet: add a metric to observe time since PLEG last seen
2020-01-06 18:06:18 -08:00
Kubernetes Prow Robot
d6412b856f Merge pull request #84345 from danielqsj/withdialer
replace grpc.WithDialer which is deprecated
2020-01-06 15:56:17 -08:00
Kubernetes Prow Robot
19ecd690fa Merge pull request #86646 from yutedz/client-protocol
Require client / server protocols
2020-01-06 13:34:18 -08:00
Kubernetes Prow Robot
b112ad4f0b Merge pull request #86845 from mattjmcnaughton/mattjmcnaughton/remove-rkt-from-runtime-options
Remove `rkt` from container runtime options
2020-01-06 11:12:29 -08:00
Kubernetes Prow Robot
9acf7d11fe Merge pull request #86344 from klueska/upstream-cm-approver
Add klueska as an approver in pkg/kubelet/cm/OWNERS
2020-01-06 09:54:16 -08:00
Ted Yu
906adbdfcd Require client / server protocols 2020-01-06 08:50:04 -08:00
mattjmcnaughton
794d0d9b4d Remove rkt from container runtime options
Part of efforts to clean up mentions of rkt in kubelet.

rkt was removed entirely in 1.11, in favor of using `rktlet` and CRI
instead. It should no longer be listed at all as a runtime.
2020-01-05 09:27:38 -05:00
mattjmcnaughton
06b44c76fd Correct comment around which integrations require cadvisor_stats
This commit is part of a larger effort to clean up references to `rkt`
in the kubelet.

Previously, this comment hard-coded which integrations required
the cadvisor stats provider. The comment has grown stale
(i.e. referenced rkt and did not reference cri-o).

Update the comment to instead point to the code which determines which
integrations need the cadvisor stats provider.
2020-01-05 09:23:09 -05:00
mattjmcnaughton
f2cb1f35fe Remove dead code in fake docker client
The `FakeDockerClient` had a number of methods defined on it which were
not being called anywhere. The majority were of the form `Assert...`.

In the spirit of removing dead code, remove the methods which aren't
being called.
2020-01-05 08:31:59 -05:00
louisgong
e8eb5c656b fix fake remote CRI 2020-01-04 08:43:17 +08:00
Bryan Boreham
cc0b3e82eb Kubelet: add a metric to observe time since PLEG last seen
Expose the measurement that kubelet uses to judge that "PLEG is
unhealthy". If we can observe the measurement growing then we can
alert before the node goes unhealthy.

Note that the existing metrics PLEGRelistInterval and
PLEGRelistDuration are poor for this, because when relist() gets
stuck they are never updated.

Signed-off-by: Bryan Boreham <bryan@weave.works>
2020-01-03 10:01:27 +00:00
mattjmcnaughton
92940fa80d Remove recorder.PastEventf method
The `recorder.PastEventf` method wasn't actually working as advertised.
It was supposed to accept a timestamp, which would be used when
generating the event. However, as the
[source code](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/record/event.go#L316)
shows, this `timestamp` was never actually used.

In other words, `PastEventf` is identical to `Eventf`.

We have two options: one would be to fix `PastEventf` so that it works
as advertised. The other would be to delete `PastEventf` and only
support `Eventf`.

Ultimately, I could only find one use of `PastEventf` in the code base,
so I propose we just delete `PastEventf` and convert all uses to
`Eventf`.
2019-12-30 12:00:23 -05:00
Kevin Klues
b373121a14 Make CPUManagerCheckpointV2 type an alias of CPUManagerCheckpoint
This change is to prevent problems when we remove the V1->V2 migration
code in the future. Without this, the checksums of all checkpoints would
be hashed with the name CPUManagerCheckpointV2 embedded inside of them,
which is undesirable. We want the checkpoints to be hashed with the name
CPUManagerCheckpoint instead.
2019-12-28 19:29:13 +01:00
Kevin Klues
5faf8f4c52 Lock checksum calculation for v1 CPUManager state to pre 1.18 logic
The updated CPUManager from PR #84462 implements logic to migrate the
CPUManager checkpoint file from an old format to a new one. To do so, it
defines the following types:

```
type CPUManagerCheckpoint = CPUManagerCheckpointV2
type CPUManagerCheckpointV1 struct {  ...  }
type CPUManagerCheckpointV2 struct {  ...  }
```

This replaces the old definition of just:

```
type CPUManagerCheckpoint struct {  ...  }
```

Code was put in place to ensure proper migration from checkpoints in V1
format to checkpoints in V2 format. However (and this is a big however),
all of the unit tests were performed on V1 checkpoints that were
generated using the type name `CPUManagerCheckpointV1` and not the
original type name of `CPUManagerCheckpoint`. As such, the checksum in
the checkpoint file uses the `CPUManagerCheckpointV1` type to calculate
its checksum and not the original type name of `CPUManagerCheckpoint`.

This causes problems in the real world since all pre-1.18 checkpoint
files will have been generated with the original type name of
`CPUManagerCheckpoint`. When verifying the checksum of the checkpoint
file across an upgrade to 1.18, the checksum is calculated assuming
a type name of `CPUManagerCheckpointV1` (which is incorrect) and the
file is seen to be corrupt.

This patch ensures that all V1 checksums are verified against a type
name of `CPUManagerCheckpoint` instead of ``CPUManagerCheckpointV1`.
It also locks the algorithm used to calculate the checksum in place,
since it wil never change in the future (for pre-1.18 checkpoint
files at least).
2019-12-28 14:17:55 +01:00
danielqsj
19fe9f8d94 replace grpc.WithDialer which is deprecated 2019-12-26 17:46:59 +08:00
SataQiu
2497a1209b bump k8s.io/utils version 2019-12-21 14:54:44 +08:00
Kubernetes Prow Robot
03e90b80ce Merge pull request #86167 from yiyang5055/change-CounterVec-to-Counter
change CounterVec to use Counter in the Kubelet's Pod Lifecycle Event…
2019-12-19 11:33:56 -08:00
Jacek Kaniuk
4303be3d9f Revert pull request #85879 "hollow-node use remote CRI" 2019-12-19 10:52:35 +01:00
Kubernetes Prow Robot
814fc34cde Merge pull request #85879 from gongguan/cri-kubemark
hollow-node use remote CRI
2019-12-18 06:01:57 -08:00
louisgong
e8e1cc9ee0 extract PreInitRuntimeService from NewMainKubelet 2019-12-18 11:48:29 +08:00
Kubernetes Prow Robot
40df9f82d0 Merge pull request #82492 from gnufied/fix-uncertain-mounts
Fix uncertain mounts
2019-12-17 14:49:57 -08:00
Kubernetes Prow Robot
a1fc96f41e Merge pull request #84462 from klueska/upstream-cpu-manager-update-state-semantics
Update CPUManager stored state semantics
2019-12-17 12:00:12 -08:00
Kevin Klues
9818b4522e Add klueska as an approver in pkg/kubelet/cm/OWNERS 2019-12-17 10:40:23 +01:00
Kubernetes Prow Robot
0633e2cd34 Merge pull request #86303 from langyenan/misspell
fix misspelling in comment
2019-12-16 21:39:49 -08:00
Kubernetes Prow Robot
a931227952 Merge pull request #83504 from lyft/remove-all-terminated-containers
cri_stats_provider: do not consider exited containers when calculating cpu usage
2019-12-16 17:53:39 -08:00
Jordan Liggitt
a65d8aeb76 Add UID precondition to kubelet pod status patch updates 2019-12-16 14:27:32 -05:00
ianlang
c9418412d1 fix misspelling in comment 2019-12-16 17:27:08 +08:00
Kubernetes Prow Robot
69410eca4b Merge pull request #86256 from liggitt/testapi
Remove use of testapi package
2019-12-13 12:55:50 -08:00
Jordan Liggitt
5d5b444c4d Remove use of testapi codecs, selflink, resourcepath functions 2019-12-13 11:56:29 -05:00
Kubernetes Prow Robot
7e01fe12bf Merge pull request #86228 from ahg-g/ahg-r1
Deprecate scheduler's FailureReason
2019-12-12 18:33:32 -08:00
Kubernetes Prow Robot
6db550d1db Merge pull request #85789 from ZP-AlwaysWin/dev-1202
Remove unnecessary nil check in if statement in nodelease controller
2019-12-12 18:32:54 -08:00
Abdullah Gharaibeh
70a2bccfd6 deprecate scheduler's FailureReason 2019-12-12 18:54:52 -05:00
Kubernetes Prow Robot
010291d4dc Merge pull request #84951 from yutedz/status-mgr-sync-static
Sync the status of static Pods
2019-12-11 19:40:32 -08:00
Hemant Kumar
ca532c6fb2 Ensure that error is returned on NodePublish 2019-12-11 22:10:09 -05:00
Kevin Klues
f553286156 Pass initial set of runtime containers to the CPUManager at startup
These information associatedd with these containers is used to migrate
the CPUManager state from it's old format to its new (i.e. keyed off of
podUID and containerName instead of containerID).
2019-12-11 23:02:51 +01:00
Kevin Klues
6441e1ef43 Move CPUManager Checkpoint restoration to Start() instead of New() 2019-12-11 23:02:51 +01:00
Kevin Klues
69f8053850 Update top-level CPUManager to adhere to new state semantics
For now, we just pass 'nil' as the set of 'initialContainers' for
migrating from old state semantics to new ones. In a subsequent commit
will we pull this information from higher layers so that we can pass it
down at this stage properly.
2019-12-11 23:02:51 +01:00
Kevin Klues
185e790f71 Update CPUManager policies to adhere to new state semantics 2019-12-11 23:02:51 +01:00
Kevin Klues
7c760fea38 Change CPUManager state to key off of podUID and containerName
Previously, the state was keyed off of containerID intead of podUID and
containerName. Unfortunately, this is no longer possible as we move to a
to model where we we allocate CPUs to containers at pod adit time rather
than container start time.

This patch is the first step towards full migration to the new
semantics. Only the unit tests in cpumanager/state are passing. In
subsequent commits we will update the CPUManager itself to use these new
semantics.

This patch also includes code to do migration from the old checkpoint format
to the new one, assuming the existence of a ContainerMap with the proper
mapping of (containerID)->(podUID, containerName). A subsequent commit
will update code in higher layers to make sure that this ContainerMap is
made available to this state logic.
2019-12-11 23:02:51 +01:00