a) add namespacing to metrics: fixes interference between `should scale up when one metric is missing (Pod and External metrics)` and `should not scale down when one metric is missing (Container Resource and External Metrics)` specs, cause of flakiness.
b) replaces deployments containing unused exporters (metrics ignored) with deployments without any exporters: potential fix for often hitting a rate-limit on creating metrics descriptors (429 errors), also adds clarity.
c) fixes metric types: some external metrics tests used non-average type while expecting the value to be constant regardless of the number of pods. However, queries resulting from metric specs don't filter by pods, so a sum of metrics for all the pods is the fetched metric value (https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-metrics-not-related-to-kubernetes-objects). Adding averaging back by the number of pods fixes a couple of specs where the tests were passing for the wrong reason (wanted d ifferent test conditions).
The nightly containerd binary no longer works in the current kind base images:
May 15 16:32:31 kind-worker containerd[222]: /usr/local/bin/containerd:
/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by
/usr/local/bin/containerd)
kind now builds containerd directly with the base images. The official base
images still use containerd 1.6, so we have to use a special base image that
was prepared for this purpose.
Because the containerd config can be patched through kind, we don't need to
modify the generated node image anymore.
The goal is to only label workloads as "performance" which actually run long
enough to provide useful metrics. The throughput collector samples once per
second, so a workload should run at least 5, better 10 seconds to get at least
a minimal amount of samples for the percentile calculation.
For benchstat analysis of runs with sufficient repetitions to get statistically
meaningful results, each workload shouldn't run more than one minute, otherwise
before/after analysis becomes too slow.
The labels were chosen based on benchmark runs on a reasonably fast desktop. To
know how long each workload takes, a new "runtime_seconds" benchmark result
gets added.
This PR updates changes related references to the legacy
release bucket, excluding CHANGELOG updates.
Signed-off-by: Ricky Sadowski <richard.j.sadowski@gmail.com>
When certain status conditions are not expected, we need to see
the nested objects, but %#v doesn't handle pointers well. Output
as simple encoded JSON.
Add two new metrics to monitor the client-go logic that
generate http.Transports for the clients.
- rest_client_transport_cache_entries is a gauge metrics
with the number of existin entries in the internal cache
- rest_client_transport_create_calls_total is a counter
that increments each time a new transport is created, storing
the result of the operation needed to generate it: hit, miss
or uncacheable
Change-Id: I2d8bde25281153d8f8e8faa249385edde3c1cb39
This touches cases where FromInt() is used on numeric constants, or
values which are already int32s, or int variables which are defined
close by and can be changed to int32s with little impact.
Signed-off-by: Stephen Kitt <skitt@redhat.com>
* update serial number to a valid non-zero number in ca certificate
* fix the existing problem (0 SerialNumber in all certificate) as part of this PR in a separate commit