Commit Graph

7694 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
7d6f8d8f69 Merge pull request #80570 from klueska/upstream-add-topology-manager-to-devicemanager
Add support for Topology Manager to Device Manager
2019-08-29 21:21:44 -07:00
Kubernetes Prow Robot
3ebe6a6a5f Merge pull request #77807 from matthyx/startupProbe
Add startupProbe to health checks
2019-08-29 21:21:30 -07:00
Kubernetes Prow Robot
7da563f0f8 Merge pull request #81573 from irajdeep/irajdeep/change_runningPod_runningContainer_metrics
Convert kubelet metrics(running_pod_count and running_container_count) from non-standard prometheus collectors to standard gauges
2019-08-29 18:08:42 -07:00
Matthias Bertschy
a042a4b0ee startupProbe: make update 2019-08-30 00:42:43 +02:00
Matthias Bertschy
1a08ea5984 startupProbe: Test changes 2019-08-30 00:40:26 +02:00
Matthias Bertschy
323f99ea8c startupProbe: Kubelet changes 2019-08-30 00:40:26 +02:00
Kubernetes Prow Robot
a9e5c4d6e4 Merge pull request #81968 from mtaufen/node-csr-hash
derive node CSR hashes from public keys
2019-08-29 13:31:41 -07:00
Kubernetes Prow Robot
da986c56ab Merge pull request #73944 from xiaoanyunfei/cleanup/rm_unuse_judge
rm unnecessary judgement
2019-08-29 13:30:57 -07:00
Kevin Klues
eb0216e54e Update semantics to set Preferred field in TopologyHint generation
We now only set Preferred to true if resources can be allocated with a
size equal to the minimimum _possible_ mask when all resources are
available.
2019-08-29 14:32:10 -05:00
Kevin Klues
e0e8b3e4fd Update CPUManager topology helpers to accept multiple ids 2019-08-29 13:22:54 -05:00
Rajdeep Das
c02d49d775 Update running_pod_count and running_container_count metric
As already mentioned in this issue https://github.com/kubernetes/kubernetes/issues/79286, some metrics like
"running_pod_count" and "running_container_count" uses non-standard prometheus metrics, this change converts them to be
standard prometheus gauges

Minor refactor in kubelet/pleg/generic.go and added some test for ruuning container and running pod metrics

Fixed issues related to github CI pipeline failure

* Updated bazel for new deps
* Add comment for exported metrics variables,RuuningContainerCount and RunningPodCount
* Specify keys explicitly in Guage metric instantation

Fix go lint errors

Replace "+=1" with "++", as reported by go lint

Set container state as a label for the metrics "running_container_count"

As per the metrics name "running_container_count" it should "ideally" be showing
the number of containers in "running" state , but it was showing all the container count, irrespective of the state it is in.
This commit adds a new label "container_running_state" to the metrics "running_container_count", which doesn't change the base metrics but adds the
option to query the metrics with "container_state" such as "running"/"unknown/...

remove unused methods reported by staticcheck

Remove variables while instantiating gauge(vec) which are default set to nil

Convert kubelet metrics(running_pod_count and running_container_count) to standard gauges and added label to running_container_count metrics.

Currently kubelet metrics(running_pod_count and running_container_count) use non-standard prometheus collectors , this change
converts them to standard prometheus gauges. Also this adds a new label(container_state) to running_container_count which does a breakdown of
containers tracked by kubelet based on the containers' state(running/unknown/created/exited).

Set statbility explicitly for running_pod_count and running_container_count and reformat test

register metrics explicitly in test , so that they don't become no-op
2019-08-29 17:23:04 +02:00
Kevin Klues
dcc9f66311 Add devicemanager tests for TopologyHint consumption 2019-08-29 08:22:50 -05:00
Kevin Klues
cc567afaf0 Consume TopologyHints in the devicemanager 2019-08-29 08:22:50 -05:00
Kevin Klues
a3320f80d9 Add devicemanager tests for TopologyHint generation 2019-08-29 07:45:43 -05:00
Kevin Klues
d3d7a8f5d4 Generate TopologyHints from the devicemanager 2019-08-29 07:45:43 -05:00
Louise Daly
9a118ceac4 Added stub support for Topology Manager to Device Manager
Co-authored-by: Conor Nolan <conor.nolan@intel.com>
Co-authored-by: Sreemanti Ghosh <sreemanti.ghosh@intel.com>
Co-authored-by: Kevin Klues <kklues@nvidia.com>
2019-08-29 07:45:43 -05:00
Kevin Klues
1c1f19c61c Change Topology.NUMANode in device plugin interface to a repeated field 2019-08-29 07:45:43 -05:00
Kubernetes Prow Robot
7d4d17583b Merge pull request #81722 from klueska/upstream-add-socket-awreness-to-topologymanager
Add NUMA Node awareness to the TopologyManager
2019-08-29 05:30:58 -07:00
Kubernetes Prow Robot
ca5babc1da Merge pull request #81534 from logicalhan/kubelet-migration
migrate kubelet's metrics/probes & metrics endpoint to metrics stability framework
2019-08-28 18:26:45 -07:00
Kubernetes Prow Robot
c6a506bb8c Merge pull request #78174 from gaorong/oom-event
enrich kubelet system oom event message info
2019-08-28 12:01:13 -07:00
Han Kang
3a50917795 migrate kubelet's metrics/probes & metrics endpoint to metrics stability framework 2019-08-28 11:16:38 -07:00
Kevin Klues
df1b54fc09 Fail fast with TopologyManager on machines with more than 8 NUMA Nodes 2019-08-28 11:04:52 -05:00
Kevin Klues
5660cd3cfb Add NUMA Node awareness to the TopologyManager 2019-08-28 11:04:52 -05:00
Kubernetes Prow Robot
35867b160a Merge pull request #81951 from klueska/upstream-update-cpu-amanger-numa-mapping
Update the CPUManager to include NUMANodeID in its topology information
2019-08-28 08:55:40 -07:00
Kubernetes Prow Robot
879418a714 Merge pull request #81828 from mars1024/bugfix/delete_lo_network
delete lo network when TearDownPod to avoid CNI cache leak
2019-08-28 03:09:11 -07:00
Kubernetes Prow Robot
de1cfa9bc1 Merge pull request #81787 from lmdaly/topology-manager-rename-strict-policy
Renaming strict policy to restricted policy
2019-08-28 01:38:04 -07:00
Kubernetes Prow Robot
08b67378d3 Merge pull request #81397 from ddebroy/win-socket
Support Kubelet PluginWatcher in Windows
2019-08-28 01:37:12 -07:00
Kubernetes Prow Robot
6cf7f3c342 Merge pull request #80320 from wk8/wk8/gmsa_cleanup
Make container removal fail if platform-specific containers fail
2019-08-27 22:41:29 -07:00
Deep Debroy
1321c9115b Support PluginWatcher in Windows
Signed-off-by: Deep Debroy <ddebroy@docker.com>
2019-08-27 16:24:38 -07:00
Kevin Klues
f4dbd29cdb Rename TopologyHint.SocketAffinity to TopologyHint.NUMANodeAffinity
As part of this, update the logic to use the NUMA information instead of
the Socket information when generating and consuming TopologyHints in
the CPUManager.
2019-08-27 16:51:05 -05:00
Kevin Klues
ecc14fe661 Update CPUManager to include NUMANodeID in CPUTopology
Unfortunately, the NUMA information is not readily available from
cadvisor, so we have to roll the logic to discover it by hand. In the
future, we should remove this custiom code to use the information
provided by cadvisor once it is made available.
2019-08-27 16:51:05 -05:00
Kevin Klues
869962fa48 Cache the discovered topology in the CPUManager instead of MachineInfo 2019-08-27 16:23:07 -05:00
Michael Taufen
9dcf4d4ae2 derive node CSR hashes from public keys
These hashes were previously derived from the private key.
This is not a best practice. After this PR they are derived from public
keys.
2019-08-27 09:41:41 -07:00
Bruce Ma
ec342ec98f delete lo network when TearDownPod to avoid CNI cache leak
Signed-off-by: Bruce Ma <brucema19901024@gmail.com>
2019-08-27 19:26:23 +08:00
Kubernetes Prow Robot
f105fef3d5 Merge pull request #81429 from huffmanca/resize_block_volume
Enables resizing of block volumes.
2019-08-23 17:59:05 -07:00
Kubernetes Prow Robot
d5f9a81d0f Merge pull request #79873 from tedyu/kube-runtime
Set runtimeState when RuntimeReady is not set or false
2019-08-23 17:58:37 -07:00
Kubernetes Prow Robot
0e1bad3764 Merge pull request #81747 from Random-Liu/fix-windows-log-follow
Fix windows kubectl log -f.
2019-08-23 06:53:24 -07:00
Kubernetes Prow Robot
10bd85a127 Merge pull request #81663 from jfbai/update-existing-node-lease-with-retry
Update existing node lease with retry.
2019-08-22 22:03:30 -07:00
Hemant Kumar
9dbe0b3ad8 Fix devicePath for raw block expansion
Fix tests
2019-08-22 22:48:46 -04:00
Jean Rouge
4d4edcb27b Make container removal fail if platform-specific containers fail
https://github.com/kubernetes/kubernetes/pull/74737 introduced a new in-memory
map for the dockershim, that could potentially (in pathological cases) cause
memory leaks - for containers that use GMSA cred specs, get created
successfully, but then never get started nor removed.

This patch addresses this issue by making container removal fail altogether
when platform-specific clean ups fail: this allows clean ups to be retried
later, when the kubelet attempts to remove the container again.

Resolves issue https://github.com/kubernetes/kubernetes/issues/74843.

Signed-off-by: Jean Rouge <rougej+github@gmail.com>
2019-08-22 18:03:48 -07:00
Kubernetes Prow Robot
a3488b4cee Merge pull request #81206 from tallclair/staticcheck-kubelet-push
Cleanup Kubelet static analysis issues
2019-08-22 15:09:43 -07:00
Kubernetes Prow Robot
37651f1cef Merge pull request #80368 from danwinship/iptables-checks
iptables feature detection improvements
2019-08-22 13:31:20 -07:00
Christian Huffman
7a4cdf5ab2 Included resizing for CSI-based block volumes.
Perform a no-op when volume is of type raw block

Fix bug with checking volume mounts for readonly
2019-08-22 15:45:57 -04:00
Kubernetes Prow Robot
6b47754740 Merge pull request #81627 from tallclair/copy
Delete duplicate resource.Quantity.Copy()
2019-08-22 11:13:13 -07:00
Kubernetes Prow Robot
2af52db689 Merge pull request #81529 from tedyu/asoww-get-pod
Drop GetPods from ActualStateOfWorld
2019-08-22 08:43:24 -07:00
Kubernetes Prow Robot
53f4e0fa26 Merge pull request #57741 from dixudx/container_hash
Compute container hash based on API content, not go type
2019-08-22 08:42:46 -07:00
Louise Daly
2fb94231d0 Renaming strict policy to restricted policy
Restricted policy will fail admission of guaranteed pods where
all requested resources are not available on a single NUMA Node
2019-08-22 07:57:55 +01:00
Di Xu
739cdc8a8c Omit nil or empty field when calculating hash value 2019-08-22 13:46:52 +08:00
Kubernetes Prow Robot
ebf15029da Merge pull request #80003 from wongma7/cloudprovider-authoritative-hostname
Fix cloud reported hostname being overridden if nodeIP set
2019-08-21 20:50:32 -07:00
Matthew Wong
fc28045220 Fix cloud reported hostname being overridden if nodeIP set 2019-08-21 19:07:12 -07:00