Commit Graph

1440 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
bcadbfcc55 Merge pull request #125496 from harche/cgroup_imp
KEP-4569: Separate cgroup v1 and v2 manager implementations
2024-06-28 09:54:02 -07:00
Harshal Patil
79495a21a8 Separate cgroup v1 and v2 manager implementations
Signed-off-by: Harshal Patil <harpatil@redhat.com>
2024-06-28 07:49:43 -04:00
Patrick Ohly
bde9b64cdf DRA: remove "source" indirection from v1 Pod API
This makes the API nicer:

    resourceClaims:
    - name: with-template
      resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      resourceClaimName: test-shared-claim

Previously, this was:

    resourceClaims:
    - name: with-template
      source:
        resourceClaimTemplateName: test-inline-claim-template
    - name: with-claim
      source:
        resourceClaimName: test-shared-claim

A more long-term benefit is that other, future alternatives
might not make sense under the "source" umbrella.

This is a breaking change. It's justified because DRA is still
alpha and will have several other API breaks in 1.31.
2024-06-27 17:53:24 +02:00
Ed Bartosh
f53991d111 kube_features: DevicePluginCDIDevices: LockToDefault 2024-06-25 16:14:48 +03:00
Sergey Kanzhelev
e8e2fda5c3 improve logging of pod admission denied 2024-06-21 17:46:49 +00:00
Stephen Kitt
3f36c83c68 Switch to stretchr/testify / mockery for mocks
testify is used throughout the codebase; this switches mocks from
gomock to testify with the help of mockery for code generation.

Handlers and mocks in test/utils/oidc are moved to a new package:
mockery operates package by package, and requires packages to build
correctly; test/utils/oidc/testserver.go relies on the mocks and fails
to build when they are removed. Moving the interface and mocks to a
different package allows mockery to process that package without
having to build testserver.go.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2024-06-20 19:42:53 +02:00
Kubernetes Prow Robot
e6616033cb Merge pull request #120844 from bzsuni/cleanup/sets/kubelet
[kubelet] Use a generic Set instead of a specified Set
2024-06-14 09:09:17 -07:00
Harshal Patil
966d304704 Report correct error after validating the root container
Signed-off-by: Harshal Patil <harpatil@redhat.com>
2024-06-11 16:42:59 -04:00
Kubernetes Prow Robot
a8d51f4f05 Use a generic Set instead of a specified Set in kubelet
Signed-off-by: bzsuni <bingzhe.sun@daocloud.io>
2024-06-04 14:25:43 +08:00
Kubernetes Prow Robot
fad52aedfc Merge pull request #125086 from oxxenix/exponential-backoff
add exponential backoff in NodeResourceSlices controller
2024-05-28 02:46:43 -07:00
Oksana Baranova
c4ec24890e nodeResourceSlicesController: add exponential backoff 2024-05-27 23:12:53 +03:00
zhanluxianshen
e5c229fafa clean typos logs in kubelet. 2024-05-22 16:56:06 +08:00
Itamar Holder
a6b971f14b Use kubelet owned directories for mounting rather than /tmp
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-05-21 13:18:16 +03:00
Itamar Holder
74f29880bd Replace log entry by a warning event
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-05-21 13:18:16 +03:00
Itamar Holder
29535c0463 Warn of swap is enabled on the OS and tmpfs noswap is not supported
When --fail-swap-on=false kubelet CLI argument
is provided, but tmpfs noswap is not supported
by the kernel, warn about the risks of memory-backed
volumes being swapped into disk

Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-05-21 13:18:16 +03:00
Kubernetes Prow Robot
8352c09592 Merge pull request #124323 from bart0sh/PR142-dra-fix-cache-integrity
kubelet: DRA: fix cache integrity
2024-05-13 09:54:02 -07:00
Alvaro Aleman
6d0ac8c561 Use the generic/typed workqueue throughout
This change makes us use the generic workqueue throughout the project in
order to improve type safety and readability of the code.
2024-05-04 14:33:12 -04:00
Ed Bartosh
f24134d7b2 kubelet: DRA: add unit test for ClaimInfo and claimInfoCache 2024-05-03 13:30:31 +00:00
Ed Bartosh
6ce294558a kubelet: DRA: add stress test
The tests calls PrepareResources and UnprepareResources API in
parallel to help discover race conditions.
2024-05-03 13:30:29 +00:00
Kevin Klues
86a18d5333 kubelet: DRA: update manager test to adhere to new claiminfo cache APIs
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-05-03 13:28:37 +00:00
Kevin Klues
805e7c3434 kubelet: DRA: remove check to set pluginName to DriverName if not in ResourceHandle
It has always been validated that a ResourceHandle MUST have DriverName set, so
this check is unnecessary.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-05-03 13:23:29 +00:00
Kevin Klues
f80be2728e kubelet: DRA: change key of claimInfo cache to "namespace/claimname"
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-05-03 13:23:29 +00:00
Kevin Klues
639e887631 kubelet: DRA: add a reconcile loop to unprepare claims for deleted pods
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-05-03 13:23:29 +00:00
Kevin Klues
a8931c6c25 kubelet: DRA: update locking/checkpoint semantics of the claimInfo cache
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-05-03 13:23:27 +00:00
Kubernetes Prow Robot
1fd835ce59 Merge pull request #123398 from ffromani/remove-legacy-checkpoint
node: devicemgr: remove obsolete pre-1.20 checkpoint file support
2024-04-29 14:46:53 -07:00
Marek Siarkowicz
3ee8178768 Cleanup defer from SetFeatureGateDuringTest function call 2024-04-24 20:25:29 +02:00
Patrick Ohly
77341f7595 DRA: remove support for v1alpha2 kubelet API
The v1alpha2 API is several releases old. No current drivers should still
depend on it.
2024-04-19 18:27:05 +02:00
Kubernetes Prow Robot
bbfd2145de Merge pull request #124091 from bitoku/dra-nil-check
kubelet: add nil check for Node(Un)PrepareResources.
2024-04-18 10:46:05 -07:00
Kubernetes Prow Robot
528cff12f6 Merge pull request #120969 from skitt/uber-go-mock
Switch from golang/mock to uber-go/mock
2024-04-17 23:59:24 -07:00
Francesco Romani
181fb0da51 node: devicemgr: remove obsolete pre-1.20 checkpoint file support
In commit 2f426fdba6 we added
compatibility (and tests) to deal with pre-1.20 checkpoint files.
We are now well past the end of support for pre-1.20 kubelets,
so we can get rid of this code.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2024-04-15 14:01:56 +02:00
SataQiu
06d5ee274d remove unused code in container manager 2024-04-09 22:30:32 +08:00
Ayato Tokubi
d04f87abde add nil check for Node(Un)PrepareResources.
Signed-off-by: Ayato Tokubi <atokubi@redhat.com>
2024-04-04 23:24:25 +00:00
HirazawaUi
10b6319e64 fix slow dra unit test 2024-03-16 22:21:15 +08:00
Ed Bartosh
26881132bd kubelet: assign Node as an owner for the ResourceSlice
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
2024-03-15 09:46:13 +02:00
Patrick Ohly
a0add8d2c7 dra api: NodeResourceModel -> ResourceModel
When renaming NodeResourceSlice to ResourceSlice, the embedded
[Node]ResourceModel also should have been renamed.
2024-03-14 18:07:36 +01:00
Kevin Klues
fc2134c84c dra kubelet: fix error log
Previously we were returning the error string from 'err' (which is nil), when
we should have been returning it from result.Error. Without this it is hard to
debug issues with NodeUnprepareResources.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-03-11 13:51:29 +00:00
Kevin Klues
13a6dcc21c dra kubelet: add StructuredResourceModel to UnprepareResources call
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2024-03-09 18:08:14 +00:00
fakecore
4060ee60c1 fix: handle socket file detection on Windows
Update socket file detection logic to use os.Stat as per upstream
Go fix for https://github.com/golang/go/issues/33357. This resolves
the issue where socket files could not be properly identified on
Windows systems.
2024-03-08 18:16:10 +08:00
Patrick Ohly
0b6a0d686a dra api: rename NodeResourceSlice -> ResourceSlice
While currently those objects only get published by the kubelet for node-local
resources, this could change once we also support network-attached
resources. Dropping the "Node" prefix enables such a future extension.

The NodeName in ResourceSlice and StructuredResourceHandle then becomes
optional. The kubelet still needs to provide one and it must match its own node
name, otherwise it doesn't have permission to access ResourceSlice objects.
2024-03-07 22:22:55 +01:00
Patrick Ohly
d59676a545 dra kubelet: publish NodeResourceSlices
The information is received from the DRA driver plugin through a new gRPC
streaming interface. This is backwards compatible with old DRA driver kubelet
plugins, their gRPC server will return "not implemented" and that can be
handled by kubelet. Therefore no API break is needed.

However, DRA drivers need to be updated because the Go API changed. They can
return
    status.New(codes.Unimplemented, "no node resource support").Err()
if they don't support the new ListAndWatchResources method and
structured parameters.

The controller in kubelet then synchronizes this information from the driver
with NodeResourceSlice objects, creating, updating and deleting them as needed.
2024-03-07 22:22:13 +01:00
Patrick Ohly
6f1ddfcd2e kubelet: support structured parameters for preparing resources
If the resource handle has data from a structured parameter model, then we need
to pass that to the DRA driver kubelet plugin. Because Kubernetes uses
gogo/protobuf, we cannot use "optional" for that new optional field and have to
resort to "repeated" with a single repetition if present.

This is a new, backwards-compatible field.

That extending the resource.k8s.io changes the checksum of a kubelet checkpoint
is unfortunate. Updating the test cases is a stop-gap measure, the actual
solution will have to be something else before beta.
2024-03-07 22:22:13 +01:00
Stephen Kitt
6bf667af06 Switch from golang/mock to uber-go/mock
See https://github.com/golang/mock#gomock: golang/mock is no longer
maintained, and should be replaced by go.uber.org/mock.

This allows golang/mock to be dropped from the status and vendored
fields in unwanted-dependencies.json.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2024-03-07 09:12:16 +01:00
Kubernetes Prow Robot
70383f3701 Merge pull request #119561 from payall4u/fix-kubelet-panic-when-allocate-device
Fix kubelet panic when allocate resource for pod.
2024-02-29 03:06:54 -08:00
Kubernetes Prow Robot
0f7cc6fcaa Merge pull request #121778 from Tal-or/mm_metrics
kubelet: memorymanager: metrics:  add metrics about static allocation
2024-02-20 09:41:50 -08:00
Kubernetes Prow Robot
79e11fe563 Merge pull request #122703 from TommyStarK/fix/dra-manager-should-timeout
dra: increase timeout in setupFakeDRADriverGRPCServer to prevent tests to flake
2024-02-13 09:33:17 -08:00
Kubernetes Prow Robot
244fbf94fd Merge pull request #122698 from daniel-hutao/feat-1
Code Cleanup: Redundant String Conversions and Spelling/Grammar Corrections
2024-02-05 16:57:07 -08:00
Daniel Hu
1baf7d4586 Corrected some spelling and grammatical errors
Signed-off-by: Daniel Hu <farmer.hutao@outlook.com>
2024-01-27 10:10:25 +08:00
Kubernetes Prow Robot
3da22db11c Merge pull request #121499 from matte21/add-comments-to-cpu-accumulator
Improve understandability of kubelet's cpu accumulator code
2024-01-26 00:56:21 +01:00
Daniel Hu
d652596e42 Remove redundant string conversions in print statements
Signed-off-by: Daniel Hu <farmer.hutao@outlook.com>
2024-01-15 09:57:35 +08:00
TommyStarK
6f021e99cf dra: increase timeout in setupFakeDRADriverGRPCServer to prevent tests to flake.
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2024-01-11 09:20:04 +01:00