As already mentioned in this issue https://github.com/kubernetes/kubernetes/issues/79286, some metrics like
"running_pod_count" and "running_container_count" uses non-standard prometheus metrics, this change converts them to be
standard prometheus gauges
Minor refactor in kubelet/pleg/generic.go and added some test for ruuning container and running pod metrics
Fixed issues related to github CI pipeline failure
* Updated bazel for new deps
* Add comment for exported metrics variables,RuuningContainerCount and RunningPodCount
* Specify keys explicitly in Guage metric instantation
Fix go lint errors
Replace "+=1" with "++", as reported by go lint
Set container state as a label for the metrics "running_container_count"
As per the metrics name "running_container_count" it should "ideally" be showing
the number of containers in "running" state , but it was showing all the container count, irrespective of the state it is in.
This commit adds a new label "container_running_state" to the metrics "running_container_count", which doesn't change the base metrics but adds the
option to query the metrics with "container_state" such as "running"/"unknown/...
remove unused methods reported by staticcheck
Remove variables while instantiating gauge(vec) which are default set to nil
Convert kubelet metrics(running_pod_count and running_container_count) to standard gauges and added label to running_container_count metrics.
Currently kubelet metrics(running_pod_count and running_container_count) use non-standard prometheus collectors , this change
converts them to standard prometheus gauges. Also this adds a new label(container_state) to running_container_count which does a breakdown of
containers tracked by kubelet based on the containers' state(running/unknown/created/exited).
Set statbility explicitly for running_pod_count and running_container_count and reformat test
register metrics explicitly in test , so that they don't become no-op
As part of this, update the logic to use the NUMA information instead of
the Socket information when generating and consuming TopologyHints in
the CPUManager.
Unfortunately, the NUMA information is not readily available from
cadvisor, so we have to roll the logic to discover it by hand. In the
future, we should remove this custiom code to use the information
provided by cadvisor once it is made available.
https://github.com/kubernetes/kubernetes/pull/74737 introduced a new in-memory
map for the dockershim, that could potentially (in pathological cases) cause
memory leaks - for containers that use GMSA cred specs, get created
successfully, but then never get started nor removed.
This patch addresses this issue by making container removal fail altogether
when platform-specific clean ups fail: this allows clean ups to be retried
later, when the kubelet attempts to remove the container again.
Resolves issue https://github.com/kubernetes/kubernetes/issues/74843.
Signed-off-by: Jean Rouge <rougej+github@gmail.com>