Commit Graph

23 Commits

Author SHA1 Message Date
Patrick Ohly
33ea278c51 DRA: use v1beta1 API
No code is left which depends on the v1alpha3, except of course the code
implementing that version.
2024-11-06 13:03:19 +01:00
Patrick Ohly
f3fef01e79 DRA API: AdminAccess in DeviceRequestAllocationResult
Drivers need to know that because admin access may also grant additional
permissions. The allocator needs to ignore such results when determining which
devices are considered as allocated.

In both cases it is conceptually cleaner to not rely on the content of the
ClaimSpec.
2024-10-29 09:50:07 +01:00
Ed Bartosh
c5842ca4ad DRA: e2e_node: improve readability 2024-07-29 21:57:44 +03:00
Patrick Ohly
d11b58efe6 DRA kubelet: refactor gRPC call timeouts
Some of the E2E node tests were flaky. Their timeout apparently was chosen
under the assumption that kubelet would retry immediately after a failed gRPC
call, with a factor of 2 as safety margin. But according to
0449cef8fd,
kubelet has a different, higher retry period of 90 seconds, which was exactly
the test timeout. The test timeout has to be higher than that.

As the tests don't use the gRPC call timeout anymore, it can be made
private. While at it, the name and documentation gets updated.
2024-07-22 18:09:34 +02:00
Patrick Ohly
0b62bfb690 DRA e2e: adapt to v1alpha3 API 2024-07-22 18:09:34 +02:00
Patrick Ohly
b51d68bb87 DRA: bump API v1alpha2 -> v1alpha3
This is in preparation for revamping the resource.k8s.io completely. Because
there will be no support for transitioning from v1alpha2 to v1alpha3, the
roundtrip test data for that API in 1.29 and 1.30 gets removed.

Repeating the version in the import name of the API packages is not really
required. It was done for a while to support simpler grepping for usage of
alpha APIs, but there are better ways for that now. So during this transition,
"resourceapi" gets used instead of "resourcev1alpha3" and the version gets
dropped from informer and lister imports. The advantage is that the next bump
to v1beta1 will affect fewer source code lines.

Only source code where the version really matters (like API registration)
retains the versioned import.
2024-07-21 17:28:13 +02:00
Patrick Ohly
616a014347 DRA: move ResourceSlice publishing into DRA drivers
This is a first step towards making kubelet independent of the resource.k8s.io
API versioning because it now doesn't need to copy structs defined by that API
from the driver to the API server. The next step is removing the other
direction (reading ResourceClaim status and passing the resource handle to
drivers).

The drivers must get deployed so that they have their own connection to the API
server. Securing at least the writes via a validating admission policy should
be possible.

As before, the kubelet removes all ResourceSlices for its node at startup, then
DRA drivers recreate them if (and only if) they start up again. This ensures
that there are no orphaned ResourceSlices when a driver gets removed while the
kubelet was down.

While at it, logging gets cleaned up and updated to use structured, contextual
logging as much as possible. gRPC requests and streams now use a shared,
per-process request ID and streams also get logged.
2024-07-18 09:09:19 +02:00
Patrick Ohly
3d4bc44a2f dra e2e node: addd test case for ResourceSlice handling during kubelet startup
Any redundant object must get deleted, but not the ones of other names.
2024-07-18 09:09:19 +02:00
Patrick Ohly
bde9b64cdf DRA: remove "source" indirection from v1 Pod API
This makes the API nicer:

    resourceClaims:
    - name: with-template
      resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      resourceClaimName: test-shared-claim

Previously, this was:

    resourceClaims:
    - name: with-template
      source:
        resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      source:
        resourceClaimName: test-shared-claim

A more long-term benefit is that other, future alternatives
might not make sense under the "source" umbrella.

This is a breaking change. It's justified because DRA is still
alpha and will have several other API breaks in 1.31.
2024-06-27 17:53:24 +02:00
Ed Bartosh
ee0340a828 e2e_node: add tests for 2 Kubelet plugins 2024-06-07 22:53:35 +03:00
Ed Bartosh
ce6faef8d8 e2e_node: change DRA test APIs to work with multiple plugins 2024-06-07 22:53:31 +03:00
Ed Bartosh
118158d8df e2e_node: DRA: test plugin failures 2024-06-07 22:51:53 +03:00
Ed Bartosh
ffc407b4dd e2e_node: DRA: reimplement call blocking 2024-06-07 22:47:20 +03:00
Ed Bartosh
d6c78f853a e2e_node: add deferPodDeletion parameter 2024-05-25 01:02:31 +03:00
Ed Bartosh
f609aa8310 e2e: test-driver: add new matchers 2024-05-25 01:02:25 +03:00
Patrick Ohly
d59676a545 dra kubelet: publish NodeResourceSlices
The information is received from the DRA driver plugin through a new gRPC
streaming interface. This is backwards compatible with old DRA driver kubelet
plugins, their gRPC server will return "not implemented" and that can be
handled by kubelet. Therefore no API break is needed.

However, DRA drivers need to be updated because the Go API changed. They can
return
    status.New(codes.Unimplemented, "no node resource support").Err()
if they don't support the new ListAndWatchResources method and
structured parameters.

The controller in kubelet then synchronizes this information from the driver
with NodeResourceSlice objects, creating, updating and deleting them as needed.
2024-03-07 22:22:13 +01:00
Patrick Ohly
f2cfbf44b1 e2e: use framework labels
This changes the text registration so that tags for which the framework has a
dedicated API (features, feature gates, slow, serial, etc.) those APIs are
used.

Arbitrary, custom tags are still left in place for now.
2023-11-01 15:17:34 +01:00
Patrick Ohly
d743c50bb9 kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
Combining all prepare/unprepare operations for a pod enables plugins to
optimize the execution. Plugins can continue to use the v1beta2 API for now,
but should switch. The new API is designed so that plugins which want to work
on each claim one-by-one can do so and then report errors for each claim
separately, i.e. partial success is supported.
2023-07-12 14:50:30 +02:00
Stanislav Laznicka
7f532891c9 e2e tests: set all PSa labels instead of just enforcing 2023-06-21 15:05:13 +02:00
Ed Bartosh
a1e0aa0e50 DRA Node E2E: add NodeAlphaFeature to fix CI
Added NodeAlphaFeature:DynamicResourceAllocation to the Node DRA test
to fix failing containerd serial jobs. Those jobs skip tests labeled
with NodeAlphaFeature.
2023-06-17 13:12:40 +03:00
Ed Bartosh
a83edd35c4 DRA Node E2E: relabel test suite to fix CI
Removed NodeFeature:DynamicResourceAllocation label from the
tests to fix cos-cgroupv1/v2-containerd-node-e2e-serial CI jobs.

It turned out that labeling DRA Node tests as NodeFeature was
a mistake. Re-labeling with NodeAlphaFeature would not work either.
It would fail certain containerd jobs as DRA requires containerd >= 1.7
2023-06-14 20:46:24 +03:00
Ed Bartosh
4960207b31 DRA Node E2E: test NodePrepareResource timeout 2023-06-13 12:42:05 +03:00
Ed Bartosh
58162ffd63 DRA: add node tests
- Setup overall test structure
- Tested Kubelet plugin re-registration on plugin and Kubelet restarts
- Tested pod processing on Kubelet start
2023-06-06 23:03:50 +03:00