Commit Graph

11839 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
847be85000 Merge pull request #128657 from ffromani/unshare-containermap-among-managers
node: cm: don't share containerMap instances between managers
2024-11-07 19:45:20 +00:00
Kubernetes Prow Robot
aee1a91896 Merge pull request #128644 from huww98/multi-volume-part-1
kubelet: don't check for mounted before update dsw PV size
2024-11-07 19:45:11 +00:00
Kubernetes Prow Robot
25101d33bc Merge pull request #128518 from tallclair/pleg-watch-conditions
[FG:InPlacePodVerticalScaling] PLEG watch conditions: rapid polling for expected changes
2024-11-07 19:45:01 +00:00
Kubernetes Prow Robot
9660e5c4cd Merge pull request #127360 from knight42/feat/split-stdout-stderr-server-side
API: add a new `Stream` field to `PodLogOptions`
2024-11-07 19:44:45 +00:00
Francesco Romani
2a99bfc3d1 node: cm: don't share containerMap instances between managers
Since the GA graduation of memory manager in https://github.com/kubernetes/kubernetes/pull/128517
we are sharing the initial container map across managers.

The intention of this sharing was not to actually share a data
structure, but
1. save the relatively expensive relisting from runtime
2. have all the managers share a consistent view - even though the
   chance for misalignement tend to be tiny.

The unwanted side effect though is now all the managers race
to modify a data shared, not thread safe data structure.

The fix is to clone (deepcopy) the computed map when passing it
to each manager. This restores the old semantic of the code.

This issue brings the topic of possibly managers go out of sync
since each of them maintain a private view of the world.
This risk is real, yet this is how the code worked for
most of the lifetime, so the plan is to look at this and evaluate
possible improvements later on.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2024-11-07 16:02:55 +01:00
Kubernetes Prow Robot
33c64b380a Merge pull request #128646 from pohly/dra-kubelet-separate-beta-api
DRA kubelet: separate beta and alpha gRPC APIs
2024-11-07 14:57:45 +00:00
Kubernetes Prow Robot
c9024e7ae6 Merge pull request #128640 from mengqiy/spreadkubeletlaod
Add random interval to nodeStatusReport interval every time after an actual node status change
2024-11-07 13:48:03 +00:00
Kubernetes Prow Robot
ef37cb503b Merge pull request #128634 from thockin/remove_PodHostIPs_gate_for_1.32
Remove PodHostIPs feature gates
2024-11-07 13:47:54 +00:00
huweiwen
fd2dbe0d68 kubelet: don't check for mounted before update dsw PV size
We are still only calling NodeExpand after the volume is mounted.

avoid depending on ASW from dswp.findAndAddNewPods(). It is weird to determine desired state based on actual state.
2024-11-07 20:59:54 +08:00
Sergey Kanzhelev
631c5f9c82 call cancel on plugin that is replaced by another plugin with the same name 2024-11-07 07:36:25 +00:00
Lan Liang
6e5a3cde50 Remove PodHostIPs feature gates.
Signed-off-by: Lan Liang <gcslyp@gmail.com>
2024-11-06 23:10:36 -08:00
Patrick Ohly
9261a182bb DRA kubelet: separate beta and alpha gRPC APIs
Reusing types from the alpha in the beta made it possible to provide and use
both versions without conversion. The downside was that removal of the alpha
would have been harder, if not impossible. DRA drivers could continue to
use the alpha types and provided the beta interface automatically.

Now the two versions are completely separate gRPC APIs, although in practice
there are no differences besides the name. Support for the alpha API in kubelet
is provided via automatically generated conversion and manually written
interface wrappers.

Those are provided as part of the v1alpha4 package. The advantage of having all
of that in a central place is that it'll be easier to remove when no longer
needed.
2024-11-07 07:42:40 +01:00
Kubernetes Prow Robot
c462d4c8e5 Merge pull request #126096 from utam0k/support-disabling-oom-group-kill
kubelet: new kubelet config option for disabling group oom kill
2024-11-07 06:29:36 +00:00
Jian Zeng
94cd0a0892 feat(kubelet): only returns logs that match the given stream
Signed-off-by: Jian Zeng <anonymousknight96@gmail.com>
2024-11-07 13:52:16 +08:00
Mengqi (David) Yu
1003d36870 Add random interval to nodeStatusReport interval every time after an actual node status change
update TestUpdateNodeStatusWithLease this time to avoid flakiness
2024-11-07 04:33:59 +00:00
Kubernetes Prow Robot
3184eb3d1b Merge pull request #128629 from liggitt/revert-spreadkubeletload
Revert "Add random interval to nodeStatusReport interval every time after an actual node status change
2024-11-07 03:53:42 +00:00
Kubernetes Prow Robot
f3498df864 Merge pull request #128522 from huww98/multi-volume-part-0
Cleanups about kubelet/volumemanager
2024-11-07 03:53:28 +00:00
utam0k
4f909c14a0 kubelet: new kubelet config option for disabling group oom kill
Signed-off-by: utam0k <k0ma@utam0k.jp>
2024-11-07 12:03:04 +09:00
Tim Allclair
24443b67cb Expand PLEG SetWatchCondition unit test coverage 2024-11-06 17:01:15 -08:00
Kubernetes Prow Robot
4c487b00af Merge pull request #128627 from kannon92/revert-128046-ga3960
Revert "Graduate PodLifecycleSleepAction to GA"
2024-11-07 00:25:51 +00:00
Jordan Liggitt
4850b31bda Revert "Add random interval to nodeStatusReport interval every time after an actual node status change"
This reverts commit d6e17ad808.
2024-11-06 17:12:13 -05:00
Kevin Hannon
350b0d2b93 Revert "Graduate PodLifecycleSleepAction to GA" 2024-11-06 16:29:19 -05:00
Anish Shah
207842d3e0 drop InPlacePodVerticalScaling support in windows 2024-11-06 12:57:55 -08:00
Kubernetes Prow Robot
099449954e Merge pull request #128556 from AnishShah/kubelet-reject-metric
Introduce a metric to track kubelet admission failure.
2024-11-06 20:10:33 +00:00
Kubernetes Prow Robot
0edef5aa91 Merge pull request #128447 from bart0sh/PR164-migrate-cadvisor-to-contextual-logging
kubelet: Migrate CAdvisor to contextual logging
2024-11-06 20:10:10 +00:00
Kubernetes Prow Robot
198ec57f86 Merge pull request #128394 from mengqiy/spreadkubeletlaod
add randomness to nodeStatusReportFrequency for kubelet
2024-11-06 20:10:02 +00:00
Kubernetes Prow Robot
dfba334a33 Merge pull request #128242 from jsafrane/selinux-controller
1710: Add SELinux warning controller
2024-11-06 20:09:44 +00:00
Kubernetes Prow Robot
96250d4411 Merge pull request #124918 from SergeyKanzhelev/commentIgnoringBadStatuses
added a comment that statuses lists are not being validated
2024-11-06 20:09:29 +00:00
Tim Allclair
7fce6f2317 More comments around PLEG WatchConditions 2024-11-06 11:05:24 -08:00
Tim Allclair
35bd1e6831 Emit a pod event when WatchConditions are completed 2024-11-06 11:05:24 -08:00
Tim Allclair
da9c2c553b Set pod watch conditions for resize 2024-11-06 11:05:24 -08:00
Tim Allclair
f4d36dd402 Add WatchCondition concept to the PLEG 2024-11-06 11:05:23 -08:00
Tim Allclair
07a9ab87bc Simplify PLEG relist loops 2024-11-06 11:05:23 -08:00
Kubernetes Prow Robot
e273349f3a Merge pull request #127511 from pohly/dra-1.32-api
DRA 1.32 API: promotion to beta
2024-11-06 13:13:29 +00:00
Patrick Ohly
a1b8e9d3a7 DRA kubelet: increase plugin test coverage
Deleting slices was not covered to begin with and the recent registration
changes also could have been covered better. Now coverage is at 91%.
2024-11-06 13:03:20 +01:00
Patrick Ohly
2c23fe1b82 DRA kubelet: list supported gRPC services during registration
Listing supported gRPC services (e.g. drav1alpha3.Node, drav1beta1.DRAPlugin)
during registration enables the kubelet to determine in advance which methods
it can call.

Versioning by Kubernetes release makes less sense because it doesn't say
anything about which gRPC service is supported. New ones might get added and
obsolete ones removed. Some services might be optional.

In the past, this versioning support wasn't really used. At least one version
had to be provided and kubelet tried to use the plugin with the highest
version. This version comparison gets dropped. In the unlikely situation
that different plugins register under the same name, the most recent one is
used.

Because advertising gRPC services is a new convention, plugins only reporting
some version are treated as providing the old alpha gRPC service.
2024-11-06 13:03:20 +01:00
Patrick Ohly
437be1e651 DRA kubelet: rename gRPC server from Node to DRAPlugin in v1beta1
The version bump is an opportunity to pick a name that is a bit more
descriptive. It matches the "DevicePlugin" service name.
2024-11-06 13:03:20 +01:00
Patrick Ohly
33ea278c51 DRA: use v1beta1 API
No code is left which depends on the v1alpha3, except of course the code
implementing that version.
2024-11-06 13:03:19 +01:00
Kubernetes Prow Robot
50d0f920c0 Merge pull request #126750 from AMDEPYC/uncore_v1
Split L3 Cache Topology Awareness in CPU Manager
2024-11-06 11:13:29 +00:00
Jan Safranek
0d71dc677e Refactor CreateVolumeSpec
Rename old CreateVolumeSpec to CreateVolumeSpecWithNodeMigration that
extracts volume.Spec with node specific CSI migration.

Add CreateVolumeSpec that does the same, only without evaluating node CSI
migration.
2024-11-06 11:15:31 +01:00
Patrick Ohly
7b3a9afca3 DRA kubelet: add v1beta1 gRPC API
The v1beta1 API is identical to the previous v1alpha4, which erroneously was
still called "v1alpha3" in a few places, including the gRPC interface
definition itself.

The only reason for v1beta1 is to document the increased maturity of this API.

To simplify the transition, kubelet supports both v1alpha4 and v1beta1, picking
the more recent one automatically. All that DRA driver authors need to do to
implement v1beta1 is to update to the latest
k8s.io/dynamic-resource-allocation/kubeletplugin: it will automatically
register both API versions unless explicitly configured otherwise, which is
mostly just for testing.

DRA driver authors may replace their package import of v1alpha4 with v1beta1,
but they don't have to because the types in both packages are the same.
2024-11-06 11:05:05 +01:00
Anish Shah
d4f05fdda5 Introduce a metric to track kubelet admission failure. 2024-11-06 00:07:17 -08:00
Kubernetes Prow Robot
aafcf4e932 Merge pull request #128453 from tallclair/cacheless-pleg
Cleanup unused cacheless PLEG code
2024-11-06 06:59:35 +00:00
Mengqi (David) Yu
d6e17ad808 Add random interval to nodeStatusReport interval every time after an actual node status change 2024-11-06 06:11:05 +00:00
Kubernetes Prow Robot
5e0b818ff9 Merge pull request #128551 from tallclair/allocated-checkpoint
[FG:InPlacePodVerticalScaling] Don't checkpoint ResizeStatus
2024-11-06 04:19:36 +00:00
Kubernetes Prow Robot
bf75546494 Merge pull request #128432 from zhifei92/integrating-health-check
Integrate device plugin registration gRPC server health checks.
2024-11-06 04:19:29 +00:00
huweiwen
b3fe7a6410 fix ExistingPodExistingVolume test case
the previous code is identical with NewPodNewVolume
2024-11-06 11:11:14 +08:00
huweiwen
b8777bc3b5 test: check for error returned by dsw.AddPodToVolume 2024-11-06 11:11:14 +08:00
huweiwen
f9a9b6f660 rename Gid => GID
according to stylecheck
2024-11-06 11:11:13 +08:00
Kubernetes Prow Robot
8c5472ce66 Merge pull request #128189 from zylxjtu/bug
Fix the incorrect metrics setting/naming in nodeshutdown manager
2024-11-06 02:29:29 +00:00