As part of the PR 132028 we added more e2e test coverage to validate
the fix, and check as much as possible there are no regressions.
The issue and the fix become evident largely when inspecting
memory allocation with the Memory Manager static policy enabled.
Quoting the commit message of bc56d0e45a
```
The podresources API List implementation uses the internal data of the
resource managers as source of truth.
Looking at the implementation here:
https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/pkg/kubelet/apis/podresources/server_v1.go#L60
we take care of syncing the device allocation data before querying the
device manager to return its pod->devices assignment.
This is needed because otherwise the device manager (and all the other
resource managers) would do the cleanup asynchronously, so the `List` call
will return incorrect data.
But we don't do this syncing neither for CPUs or for memory,
so when we report these we will get stale data as the issue #132020 demonstrates.
For CPU manager, we however have the reconcile loop which cleans the stale data periodically.
Turns out this timing interplay was actually the reason the existing issue #119423 seemed fixed
(see: #119423 (comment)).
But it's actually timing. If in the reproducer we set the `cpuManagerReconcilePeriod` to a time
very high (>= 5 minutes), then the issue still reproduces against current master branch
(https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/test/e2e_node/podresources_test.go#L983).
```
The missing actor here is memory manager. Memory manager has no
reconcile loop (implicit fixing the stale data problem) no explicit
synchronization, so it is the unlucky one which reported stale data,
leading to the eventual understanding of the problem.
For this reason it was (and still is) important to exercise it during
the test.
Turns out the test is however wrong, likely because a hidden dependency
between the test expectations and the lane configuration (notably
machine specs), so we disable the memory manager activation for the time
being, until we figure out a safe way to enable it.
Note this significantly weakens the signal for this specific test.
Signed-off-by: Francesco Romani <fromani@redhat.com>
fix the utilities to enable multi-app-container tests,
which were previously quite hard to implement.
Add a consumer of the new utility to demonstrate the usage
and to initiate the basic coverage.
Signed-off-by: Francesco Romani <fromani@redhat.com>
add a e2e test to ensure that if the Get endpoint is asked
about a non-existing pod, it returns error.
Likewise, add a e2e test for terminated pods, which should
not be returned because they don't consume nor hold resources,
much like `List` does.
The expected usage patterns is to iterate over the list of
pods returned by `List`, but nevertheless the endpoint must
handle this case.
Signed-off-by: Francesco Romani <fromani@redhat.com>
We want to fix and enhance lanes which exercise
the podresources API tests. The first step is to clarify
the label and made it specific to podresources API,
minimzing the clash and the ambiguity with the "PodLevelResources"
feature.
Note we change the label names, but the label name is backward
compatible (filtering for "Feature:PodResources" will still
get the tests). This turns out to be not a problem because
these tests are no longer called out explicitly in the lane
definitions. We want to change this ASAP.
The new name is more specific and allows us to clearly
call out tests for this feature in the lane definitions.
Signed-off-by: Francesco Romani <fromani@redhat.com>
add more e2e tests to cover the interaction with
core resource managers (cpu, memory) and to ensure
proper reporting.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Add metrics about the sizing of the cpu pools.
Currently the cpumanager maintains 2 cpu pools:
- shared pool: this is where all pods with non-exclusive
cpu allocation run
- exclusive pool: this is the union of the set of exclusive
cpus allocated to containers, if any (requires static policy in use).
By reporting the size of the pools, the users (humans or machines)
can get better insights and more feedback about how the resources
actually allocated to the workload and how the node resources are used.
Fixed `The phase of Pod e2e-test-pod is Succeeded which is unexpected`
error. `e2epod.NewPodClient(f).CreateSync` is unable to catch 'Running'
status of the pod as pod finishes too fast.
Using `Create` API should solve the issue as it doesn't query pod
status.
This changes the text registration so that tags for which the framework has a
dedicated API (features, feature gates, slow, serial, etc.) those APIs are
used.
Arbitrary, custom tags are still left in place for now.
Rate limitter.go file is a generic file implementing
grpc Limiter interface. This file can be reuse by other gRPC
API not only by podresource.
Change-Id: I905a46b5b605fbb175eb9ad6c15019ffdc7f2563
Add and use more facilities to the *internal* podresources client.
Checking e2e test runs, we have quite some
```
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused"
```
This is likely caused by kubelet restarts, which we do plenty in e2e tests,
combined with the fact gRPC does lazy connection AND we don't really
check the errors in client code - we just bubble them up.
While it's arguably bad we don't check properly error codes, it's also
true that in the main case, e2e tests, the functions should just never
fail besides few well known cases, we're connecting over a
super-reliable unix domain socket after all.
So, we centralize the fix adding a function (alongside with minor
cleanups) which wants to trigger and ensure the connection happens,
localizing the changes just here. The main advantage is this approach
is opt-in, composable, and doesn't leak gRPC details into the client
code.
Signed-off-by: Francesco Romani <fromani@redhat.com>
PodResourcesAPI reports in the List call about resources of pods in terminal phase.
The internal managers reassign resources assigned to pods in terminal phase, so podresources should ignore them.
Whether this behavior intended or not (the docs are not unequivocal)
this e2e test demonstrates and verifies the mentioned above.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
We have a e2e test which want to get a rate limit error. To do so, we
sent an abnormally high amount of calls in a tight loop.
The relevant test per se is reportedly fine, but wwe need to play nicer
with *other* tests which may run just after and which need to query the API.
If the testsuite runs "too fast", it's possible an innocent test falls in the
same rate limit watch period which was saturated by the ratelimit test,
so the innocent test can still be throttled because the throttling period
is not exhausted yet, yielding false negatives, leading to flakes.
We can't reset the period for the rate limit, we just wait "long enough" to
make sure we absorb the burst and other legit queries are not rejected.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Implement DOS prevention wiring a global rate limit for podresources
API. The goal here is not to introduce a general ratelimiting solution
for the kubelet (we need more research and discussion to get there),
but rather to prevent misuse of the API.
Known limitations:
- the rate limits value (QPS, BurstTokens) are hardcoded to
"high enough" values.
Enabling user-configuration would require more discussion
and sweeping changes to the other kubelet endpoints, so it
is postponed for now.
- the rate limiting is global. Malicious clients can starve other
clients consuming the QPS quota.
Add e2e test to exercise the flow, because the wiring itself
is mostly boilerplate and API adaptation.
We have quite a few podresources e2e tests and, as the feature
progresses to GA, we should consider moving them to NodeConformance.
Unfortunately most of them require linux-specific features not in the
test themselves but in the test prelude (fixture) to check or create the
node conditions (e.g. presence or not of devices, online CPUS...) to be
verified in the test proper.
For this reason we promote only a single test for starters.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Fix the waiting logic in the e2e test loop to wait
for resources to be reported again instead of making logic on the
timestamp. The idea is that waiting for resource availability
is the canonical way clients should observe the desired state,
and it should also be more robust than comparing timestamps,
especially on CI environments.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Start to consolidate the sample device plugin utility
and constants in a central place, because we need
to use it in different e2e tests.
Having a central dependency is better than a maze of
entangled e2e tests depending on each other helpers.
Signed-off-by: Francesco Romani <fromani@redhat.com>