Commit Graph

16 Commits

Author SHA1 Message Date
Francesco Romani
04129d1dc8 node: metrics for alignment failures
Add metrics to report alignment allocation failures
See: https://github.com/kubernetes/enhancements/pull/5108

Signed-off-by: Francesco Romani <fromani@redhat.com>
2025-03-04 19:50:08 +01:00
Francesco Romani
14ec0edd10 node: metrics: add metrics about cpu pool sizes
Add metrics about the sizing of the cpu pools.
Currently the cpumanager maintains 2 cpu pools:
- shared pool: this is where all pods with non-exclusive
  cpu allocation run
- exclusive pool: this is the union of the set of exclusive
  cpus allocated to containers, if any (requires static policy in use).

By reporting the size of the pools, the users (humans or machines)
can get better insights and more feedback about how the resources
actually allocated to the workload and how the node resources are used.
2024-10-24 15:35:51 +02:00
Francesco Romani
c025861e0c node: metrics: add resource alignment metrics
In order to improve the observability of the resource management
in kubelet, cpu allocation and NUMA alignment, we add more metrics
to report if resource alignment is in effect.

The more precise reporting would probably be using pod status,
but this would require more invasive and riskier changes,
and possibly extra interactions to the APIServer.

We start adding metrics to report if containers got their
compute resources aligned.
If metrics are growing, the assingment is working as expected;
If metrics stay consistent, perhaps at zero, no resource
alignment is done.

Extra fixes brought by this work
- retroactively add labels for existing tests
- running metrics test demands precision accounting to avoid flakes;
  ensure the node state is restored pristine between each test, to
  minimize the aforementioned risk of flakes.
- The test pod command line was wrong, with this the pod could not
  reach Running state. That gone unnoticed so far because
  no test using this utility function actually needed a pod
  in running state.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2024-10-23 08:05:38 +02:00
Patrick Ohly
f2cfbf44b1 e2e: use framework labels
This changes the text registration so that tags for which the framework has a
dedicated API (features, feature gates, slow, serial, etc.) those APIs are
used.

Arbitrary, custom tags are still left in place for now.
2023-11-01 15:17:34 +01:00
Kubernetes Prow Robot
2190775b69 Merge pull request #118280 from stlaz/e2e_psa_labels
Set all PSa labels in tests
2023-06-28 11:14:43 -07:00
Davanum Srinivas
a75b00ea39 Better URL for scraping metrics from kubelet
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-06-27 16:14:59 -04:00
Stanislav Laznicka
7f532891c9 e2e tests: set all PSa labels instead of just enforcing 2023-06-21 15:05:13 +02:00
Paco Xu
420fbd11e4 ignore Histogram for prometheus client v1.16.0 2023-06-21 09:38:05 +08:00
Ian K. Coolidge
cede96336a Depend on k8s.io/utils cpuset
Steps performed:

$ find . -name '*.go' -exec sed -i
's|k8s.io/kubernetes/pkg/kubelet/cm/cpuset|k8s.io/utils/cpuset|g' {} \
$ ./hack/update-vendor.sh
$ ./hack/update-gofmt.sh
$ git rm -r pkg/kubelet/cm/cpuset/
2023-05-03 16:26:09 +00:00
Swati Sehgal
51c6a1fbe7 node: e2e: cpumgr: Rename: s/getCPUManagerMetrics/getKubeletMetrics
Since we need to gather kubelet metrics for CPU Manager and Topology
Manager, renaming this function to a more generic name.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-01-19 14:18:05 +00:00
Ian K. Coolidge
f3829c4be3 cpuset: Rename 'NewCPUSet' to 'New' 2023-01-06 23:32:51 +00:00
Antonio Ojea
e0a23577d2 pass context to gomega
Change-Id: Ibef02a52d8922984a09efa48361b9876fc91287c
2022-12-19 13:14:02 +00:00
Patrick Ohly
2f6c4f5eab e2e: use Ginkgo context
All code must use the context from Ginkgo when doing API calls or polling for a
change, otherwise the code would not return immediately when the test gets
aborted.
2022-12-16 20:14:04 +01:00
Patrick Ohly
df5d84ae81 e2e: accept context from Ginkgo
Every ginkgo callback should return immediately when a timeout occurs or the
test run manually gets aborted with CTRL-C. To do that, they must take a ctx
parameter and pass it through to all code which might block.

This is a first automated step towards that: the additional parameter got added
with

    sed -i 's/\(framework.ConformanceIt\|ginkgo.It\)\(.*\)func() {$/\1\2func(ctx context.Context) {/' \
        $(git grep -l -e framework.ConformanceIt -e ginkgo.It )
    $GOPATH/bin/goimports -w $(git status | grep modified: | sed -e 's/.* //')

log_test.go was left unchanged.
2022-12-10 19:50:18 +01:00
Francesco Romani
ff44dc1932 cpumanager: the FG is locked to default (ON)
hence we can remove the if() guards, the feature
is always available.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-11-02 18:41:41 +01:00
Francesco Romani
bdc08eaa4b e2e: node: add tests for cpumanager metrics
Add tests to verify the cpumanager metrics are populated.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-10-27 14:40:56 +02:00