Commit Graph

13 Commits

Author SHA1 Message Date
Francesco Romani
c025861e0c node: metrics: add resource alignment metrics
In order to improve the observability of the resource management
in kubelet, cpu allocation and NUMA alignment, we add more metrics
to report if resource alignment is in effect.

The more precise reporting would probably be using pod status,
but this would require more invasive and riskier changes,
and possibly extra interactions to the APIServer.

We start adding metrics to report if containers got their
compute resources aligned.
If metrics are growing, the assingment is working as expected;
If metrics stay consistent, perhaps at zero, no resource
alignment is done.

Extra fixes brought by this work
- retroactively add labels for existing tests
- running metrics test demands precision accounting to avoid flakes;
  ensure the node state is restored pristine between each test, to
  minimize the aforementioned risk of flakes.
- The test pod command line was wrong, with this the pod could not
  reach Running state. That gone unnoticed so far because
  no test using this utility function actually needed a pod
  in running state.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2024-10-23 08:05:38 +02:00
cyclinder
87129c350a kubelet: Add a TopologyManager policy options: "max-allowable-numa-nodes"
Signed-off-by: cyclidner <kuocyclinder@gmail.com>
2024-07-09 22:26:24 +08:00
Patrick Ohly
f2cfbf44b1 e2e: use framework labels
This changes the text registration so that tags for which the framework has a
dedicated API (features, feature gates, slow, serial, etc.) those APIs are
used.

Arbitrary, custom tags are still left in place for now.
2023-11-01 15:17:34 +01:00
Swati Sehgal
fa83d5fef1 node: e2e: topology-mgr: Disambiguage cores from cpus
Currently in the tests there is ambiguity in terms of host setup
when it comes to cpus or cores. This commit disambiguates that.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-10-23 13:01:17 +01:00
Swati Sehgal
e1f5eb3f14 node: e2e: topology-mgr: Determine threads per core
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-10-23 12:58:50 +01:00
Swati Sehgal
f5d915b594 topology-mgr: metrics: Deflake Topology Manager metrics e2e tests
On local execution of Topology Manager metrics tests, the tests pass rate was 100%.
Yet, we can see that the Topology Manager metrics tests are failing in upstream
CI consistently: https://testgrid.k8s.io/sig-node-presubmits#pr-kubelet-serial-gce-e2e-topology-manager.

From the logs, it was identified that these failures are because of timeouts,
so we are increasing the default timeout as well as polling interval frequency
of obtaining KubeletMetrics to deflake this test.

We have noticed a similar flake in case of CPU manager metrics tests as well:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-cpu-manager/1701615009836044288.
Once it is confirmed that the issue is resolved for Topology Manager test,
we will be fix this for CPU Manager as well in a follow-up PR.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-09-20 13:37:27 +01:00
RuquanZhao
bfc3c2110f e2e-node: fix TopologyManager test jobs.
Signed-off-by: Ruquan Zhao <ruquan.zhao@arm.com>
2023-09-01 17:53:16 +08:00
Kubernetes Prow Robot
2190775b69 Merge pull request #118280 from stlaz/e2e_psa_labels
Set all PSa labels in tests
2023-06-28 11:14:43 -07:00
Stanislav Laznicka
7f532891c9 e2e tests: set all PSa labels instead of just enforcing 2023-06-21 15:05:13 +02:00
Paco Xu
420fbd11e4 ignore Histogram for prometheus client v1.16.0 2023-06-21 09:38:05 +08:00
Swati Sehgal
e5ad3cbf6a node: topologymgr: update node e2e test tags
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-07 09:52:07 +00:00
Swati Sehgal
cf21dcef51 node: topology-mgr: e2e: changes to validate admission latency metrics
The component was previously incorrect. This patch updates to
the correct component name.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-02-15 13:59:56 +00:00
Swati Sehgal
340db7109d node: e2e: topologymgr: add tests for topology manager metrics
Add node e2e tests to verify population of topology metrics.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-01-19 14:40:37 +00:00