Commit Graph

7711 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
e972912fe4 Merge pull request #74881 from qingsenLi/k8s190304-fix-syntactic
fix syntactic error in kuberuntime_manager.go
2019-09-10 14:28:48 -07:00
Bruce Ma
f9169d29cb skip recording inputs & outputs in fake script plugin when CNI_COMMAND=VERSION
Signed-off-by: Bruce Ma <brucema19901024@gmail.com>
2019-09-04 22:50:13 +08:00
Kubernetes Prow Robot
542f3c65a0 Merge pull request #78547 from MikeSpreitzer/fix-76699
Make iptables and ipvs modes of kube-proxy MASQUERADE --random-fully if possible
2019-09-03 14:34:58 -07:00
Mike Spreitzer
d86d1defa1 Made IPVS and iptables modes of kube-proxy fully randomize masquerading if possible
Work around Linux kernel bug that sometimes causes multiple flows to
get mapped to the same IP:PORT and consequently some suffer packet
drops.

Also made the same update in kubelet.

Also added cross-pointers between the two bodies of code, in comments.

Some day we should eliminate the duplicate code.  But today is not
that day.
2019-09-01 22:07:30 -04:00
Kubernetes Prow Robot
7d40536c81 Merge pull request #82024 from codenrhoden/mv-hostutil
Move HostUtil to pkg/volume/util/hostutil
2019-08-30 19:21:49 -07:00
Kubernetes Prow Robot
c86da8e2c1 Merge pull request #82048 from cheftako/kas-np4
Add support for konnectivity service to the etcd3 client.
2019-08-30 16:15:28 -07:00
Kubernetes Prow Robot
887edd2273 Merge pull request #82099 from lmdaly/single-numa-node-policy
Topology Manager Policy: single-numa-node
2019-08-30 11:21:26 -07:00
Walter Fender
edbb0fa2fe Add support for konnectivity service to the etcd3 client.
If konnectivity service is enabled, the etcd client will now use it.
This did require moving a few methods to break circular dependencies.

Factored in feedback from lavalamp and wenjiaswe.
2019-08-30 10:31:53 -07:00
Travis Rhoden
935c23f2ad Move HostUtil to pkg/volume/util/hostutil
This patch moves the HostUtil functionality from the util/mount package
to the volume/util/hostutil package.

All `*NewHostUtil*` calls are changed to return concrete types instead
of interfaces.

All callers are changed to use the `*NewHostUtil*` methods instead of
directly instantiating the concrete types.
2019-08-30 10:14:42 -06:00
Kubernetes Prow Robot
9165f7bf56 Merge pull request #82104 from klueska/upstream-fix-cpu-manager-topology-bug
Fix bug in CPUManager with setting topology for policies
2019-08-30 08:00:44 -07:00
Kubernetes Prow Robot
f442b6ef32 Merge pull request #82090 from liggitt/webhook-http2
Use http/1.1 for apiserver->webhook clients
2019-08-30 06:26:54 -07:00
Louise Daly
8ad1b5ba3b Single-numa-node Topology Manager bug fix
Added one off fix for single-numa-node policy to correctly
reject pod admission on a resource allocation that spans
NUMA nodes

Co-authored-by: Kevin Klues <kklues@nvidia.com>
2019-08-30 07:17:56 +01:00
Louise Daly
f6c085f60e Added Single NUMA Node Policy which ensure resource are
aligned on a single NUMA node

Co-authored-by: Kevin Klues <kklues@nvidia.com>
2019-08-30 07:17:17 +01:00
Kevin Klues
5ed80dadcf Update CanAdmitPodResult() in TopologyManager to take a TopologyHint
Previously it only took a bool, which limited the logic it could perform
to determine if a pod should be admitted or not based on the merged hint
from the policy.
2019-08-30 07:17:17 +01:00
Kubernetes Prow Robot
7d6f8d8f69 Merge pull request #80570 from klueska/upstream-add-topology-manager-to-devicemanager
Add support for Topology Manager to Device Manager
2019-08-29 21:21:44 -07:00
Kubernetes Prow Robot
3ebe6a6a5f Merge pull request #77807 from matthyx/startupProbe
Add startupProbe to health checks
2019-08-29 21:21:30 -07:00
Kubernetes Prow Robot
7da563f0f8 Merge pull request #81573 from irajdeep/irajdeep/change_runningPod_runningContainer_metrics
Convert kubelet metrics(running_pod_count and running_container_count) from non-standard prometheus collectors to standard gauges
2019-08-29 18:08:42 -07:00
Matthias Bertschy
a042a4b0ee startupProbe: make update 2019-08-30 00:42:43 +02:00
Matthias Bertschy
1a08ea5984 startupProbe: Test changes 2019-08-30 00:40:26 +02:00
Matthias Bertschy
323f99ea8c startupProbe: Kubelet changes 2019-08-30 00:40:26 +02:00
Kubernetes Prow Robot
a9e5c4d6e4 Merge pull request #81968 from mtaufen/node-csr-hash
derive node CSR hashes from public keys
2019-08-29 13:31:41 -07:00
Kubernetes Prow Robot
da986c56ab Merge pull request #73944 from xiaoanyunfei/cleanup/rm_unuse_judge
rm unnecessary judgement
2019-08-29 13:30:57 -07:00
Kevin Klues
eb0216e54e Update semantics to set Preferred field in TopologyHint generation
We now only set Preferred to true if resources can be allocated with a
size equal to the minimimum _possible_ mask when all resources are
available.
2019-08-29 14:32:10 -05:00
Kevin Klues
e0e8b3e4fd Update CPUManager topology helpers to accept multiple ids 2019-08-29 13:22:54 -05:00
Rajdeep Das
c02d49d775 Update running_pod_count and running_container_count metric
As already mentioned in this issue https://github.com/kubernetes/kubernetes/issues/79286, some metrics like
"running_pod_count" and "running_container_count" uses non-standard prometheus metrics, this change converts them to be
standard prometheus gauges

Minor refactor in kubelet/pleg/generic.go and added some test for ruuning container and running pod metrics

Fixed issues related to github CI pipeline failure

* Updated bazel for new deps
* Add comment for exported metrics variables,RuuningContainerCount and RunningPodCount
* Specify keys explicitly in Guage metric instantation

Fix go lint errors

Replace "+=1" with "++", as reported by go lint

Set container state as a label for the metrics "running_container_count"

As per the metrics name "running_container_count" it should "ideally" be showing
the number of containers in "running" state , but it was showing all the container count, irrespective of the state it is in.
This commit adds a new label "container_running_state" to the metrics "running_container_count", which doesn't change the base metrics but adds the
option to query the metrics with "container_state" such as "running"/"unknown/...

remove unused methods reported by staticcheck

Remove variables while instantiating gauge(vec) which are default set to nil

Convert kubelet metrics(running_pod_count and running_container_count) to standard gauges and added label to running_container_count metrics.

Currently kubelet metrics(running_pod_count and running_container_count) use non-standard prometheus collectors , this change
converts them to standard prometheus gauges. Also this adds a new label(container_state) to running_container_count which does a breakdown of
containers tracked by kubelet based on the containers' state(running/unknown/created/exited).

Set statbility explicitly for running_pod_count and running_container_count and reformat test

register metrics explicitly in test , so that they don't become no-op
2019-08-29 17:23:04 +02:00
Kevin Klues
dcc9f66311 Add devicemanager tests for TopologyHint consumption 2019-08-29 08:22:50 -05:00
Kevin Klues
cc567afaf0 Consume TopologyHints in the devicemanager 2019-08-29 08:22:50 -05:00
Kevin Klues
a3320f80d9 Add devicemanager tests for TopologyHint generation 2019-08-29 07:45:43 -05:00
Kevin Klues
d3d7a8f5d4 Generate TopologyHints from the devicemanager 2019-08-29 07:45:43 -05:00
Louise Daly
9a118ceac4 Added stub support for Topology Manager to Device Manager
Co-authored-by: Conor Nolan <conor.nolan@intel.com>
Co-authored-by: Sreemanti Ghosh <sreemanti.ghosh@intel.com>
Co-authored-by: Kevin Klues <kklues@nvidia.com>
2019-08-29 07:45:43 -05:00
Kevin Klues
1c1f19c61c Change Topology.NUMANode in device plugin interface to a repeated field 2019-08-29 07:45:43 -05:00
Kubernetes Prow Robot
7d4d17583b Merge pull request #81722 from klueska/upstream-add-socket-awreness-to-topologymanager
Add NUMA Node awareness to the TopologyManager
2019-08-29 05:30:58 -07:00
Kubernetes Prow Robot
ca5babc1da Merge pull request #81534 from logicalhan/kubelet-migration
migrate kubelet's metrics/probes & metrics endpoint to metrics stability framework
2019-08-28 18:26:45 -07:00
Kevin Klues
ddfd9ac0ca Fix bug in CPUManager with setting topology for policies
Also add a check in the unit tests to avoid regressions
2019-08-28 17:32:25 -05:00
Jordan Liggitt
aef05c8dca Plumb NextProtos to TLS client config, honor http/2 client preference 2019-08-28 16:51:56 -04:00
Kubernetes Prow Robot
c6a506bb8c Merge pull request #78174 from gaorong/oom-event
enrich kubelet system oom event message info
2019-08-28 12:01:13 -07:00
Han Kang
3a50917795 migrate kubelet's metrics/probes & metrics endpoint to metrics stability framework 2019-08-28 11:16:38 -07:00
Kevin Klues
df1b54fc09 Fail fast with TopologyManager on machines with more than 8 NUMA Nodes 2019-08-28 11:04:52 -05:00
Kevin Klues
5660cd3cfb Add NUMA Node awareness to the TopologyManager 2019-08-28 11:04:52 -05:00
Kubernetes Prow Robot
35867b160a Merge pull request #81951 from klueska/upstream-update-cpu-amanger-numa-mapping
Update the CPUManager to include NUMANodeID in its topology information
2019-08-28 08:55:40 -07:00
Kubernetes Prow Robot
879418a714 Merge pull request #81828 from mars1024/bugfix/delete_lo_network
delete lo network when TearDownPod to avoid CNI cache leak
2019-08-28 03:09:11 -07:00
Kubernetes Prow Robot
de1cfa9bc1 Merge pull request #81787 from lmdaly/topology-manager-rename-strict-policy
Renaming strict policy to restricted policy
2019-08-28 01:38:04 -07:00
Kubernetes Prow Robot
08b67378d3 Merge pull request #81397 from ddebroy/win-socket
Support Kubelet PluginWatcher in Windows
2019-08-28 01:37:12 -07:00
Kubernetes Prow Robot
6cf7f3c342 Merge pull request #80320 from wk8/wk8/gmsa_cleanup
Make container removal fail if platform-specific containers fail
2019-08-27 22:41:29 -07:00
Deep Debroy
1321c9115b Support PluginWatcher in Windows
Signed-off-by: Deep Debroy <ddebroy@docker.com>
2019-08-27 16:24:38 -07:00
Kevin Klues
f4dbd29cdb Rename TopologyHint.SocketAffinity to TopologyHint.NUMANodeAffinity
As part of this, update the logic to use the NUMA information instead of
the Socket information when generating and consuming TopologyHints in
the CPUManager.
2019-08-27 16:51:05 -05:00
Kevin Klues
ecc14fe661 Update CPUManager to include NUMANodeID in CPUTopology
Unfortunately, the NUMA information is not readily available from
cadvisor, so we have to roll the logic to discover it by hand. In the
future, we should remove this custiom code to use the information
provided by cadvisor once it is made available.
2019-08-27 16:51:05 -05:00
Kevin Klues
869962fa48 Cache the discovered topology in the CPUManager instead of MachineInfo 2019-08-27 16:23:07 -05:00
Michael Taufen
9dcf4d4ae2 derive node CSR hashes from public keys
These hashes were previously derived from the private key.
This is not a best practice. After this PR they are derived from public
keys.
2019-08-27 09:41:41 -07:00
Bruce Ma
ec342ec98f delete lo network when TearDownPod to avoid CNI cache leak
Signed-off-by: Bruce Ma <brucema19901024@gmail.com>
2019-08-27 19:26:23 +08:00