Commit Graph

2331 Commits

Author SHA1 Message Date
Mengqi (David) Yu
d6e17ad808 Add random interval to nodeStatusReport interval every time after an actual node status change 2024-11-06 06:11:05 +00:00
Kubernetes Prow Robot
5e0b818ff9 Merge pull request #128551 from tallclair/allocated-checkpoint
[FG:InPlacePodVerticalScaling] Don't checkpoint ResizeStatus
2024-11-06 04:19:36 +00:00
Kubernetes Prow Robot
bf75546494 Merge pull request #128432 from zhifei92/integrating-health-check
Integrate device plugin registration gRPC server health checks.
2024-11-06 04:19:29 +00:00
Tim Allclair
ea53083c14 Don't checkpoint ResizeStatus 2024-11-05 15:48:35 -08:00
Tim Allclair
4a4748d23c Determine resize status from state in handlePodResourcesResize 2024-11-05 15:41:49 -08:00
Kubernetes Prow Robot
f81a68f488 Merge pull request #128377 from tallclair/allocated-status-2
[FG:InPlacePodVerticalScaling] Implement AllocatedResources status changes for Beta
2024-11-05 23:21:49 +00:00
Kubernetes Prow Robot
e57618970e Merge pull request #126870 from AnishShah/outofcpu-fix
Ensure mirror pods are created as soon as node is registered
2024-11-05 19:15:29 +00:00
zhangzhifei16
1381e41f28 feat: Integrate device plugin registration gRPC server health checks. 2024-11-05 19:59:56 +08:00
Anish Shah
dcafd93b68 kubelet: try registering mirror pods as soon as node is registered.
Mirror pods for static pods may not be created immediately during node startup
because either the node is not registered or node informer is not synced.
They will be created eventually when static pods are resynced (every 1-1.5 minutes).

However, during this delay of 1-1.5 mins, kube-scheduler might overcommit resources
to the node and eventually cause kubelet to reject pods with
OutOfCPU/OutOfMemory/OutOfPods error.

To ensure kube-scheduler is aware of static pod resource usage faster,
mirror pods are created as soon as the node registers.
2024-11-05 00:56:21 -08:00
lauralorenz
4965a7a8a0 KEP-4603: Refactor various hardcoded backoffs into separate constants (#128369)
* Refactor various hardcoded backoffs into separate constants

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

* Fix comment formatting

Signed-off-by: Laura Lorenz <lauralorenz@google.com>

---------

Signed-off-by: Laura Lorenz <lauralorenz@google.com>
2024-11-05 06:07:28 +00:00
Kubernetes Prow Robot
c6ea102f5f Merge pull request #128298 from SergeyKanzhelev/convergePluginRegistrationForDRAAndDP
converge DRA and Device Plugin plugins registration
2024-11-04 11:41:28 +00:00
Tim Allclair
84201658c3 Fix ResizeStatus state transitions 2024-11-01 14:02:59 -07:00
Sergey Kanzhelev
1297d0cdd1 converge DRA and Device Plugin plugins registration 2024-10-30 16:58:13 +00:00
Oksana Baranova
49b88f1d8a kubelet: migrate clustertrustbundle, token to contextual logging
Signed-off-by: Oksana Baranova <oksana.baranova@intel.com>
2024-10-30 17:31:11 +02:00
Kubernetes Prow Robot
b3cf9c6e5c Merge pull request #128269 from tallclair/allocated
[FG:InPlacePodVerticalScaling] Rework handling of allocated resources
2024-10-25 23:24:52 +01:00
Tim Allclair
b186c160ca Clarify eviction based on allocated pods 2024-10-25 13:53:11 -07:00
Tim Allclair
c75a3e717e More precise allocatedPod name usage 2024-10-25 13:32:36 -07:00
Tim Allclair
7166169c82 Tidy up handlePodResize 2024-10-24 16:35:28 -07:00
Tim Allclair
34cf754fe9 Pass allocatedPods to canAdmitPod 2024-10-24 16:31:49 -07:00
Tim Allclair
d1f1bf200c Add more comments 2024-10-24 15:51:19 -07:00
Tim Allclair
321eff34f7 Rework allocated resources handling 2024-10-24 09:27:40 -07:00
Kubernetes Prow Robot
e526a27118 Merge pull request #116388 from mxpv/shutdown
Clean/refactor node shutdown manager
2024-10-24 08:34:53 +01:00
Maksym Pavlenko
449f86b0ba Refactor node shutdown manager
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2024-10-23 17:36:22 -07:00
Kubernetes Prow Robot
d7e5ff87e0 Merge pull request #128083 from carlory/fix-126662-kubelet
kubelet: fix a bug where kubelet wrongly drops the QOSClass field of the Pod's s status when it rejects a Pod
2024-10-23 21:00:53 +01:00
carlory
c7e384f9ff kubelet: fix a bug where kubelet drops the QOSClass field of the Pod's status when it rejects a Pod
Co-authored-by: Sergey Kanzhelev <S.Kanzhelev@live.com>
2024-10-24 01:01:04 +08:00
Kubernetes Prow Robot
cdc0cf5c10 Merge pull request #128237 from kolyshkin/userns
pkg/kubelet/userns/inuserns: use moby/sys/userns
2024-10-23 05:28:51 +01:00
Kir Kolyshkin
54d43ecaed pkg/kubelet/user/userns: remove, use moby/sys/userns
The code from github.com/opencontainers/runc/libcontainer/userns package
was moved into github.com/moby/sys/user and github.com/moby/sys/userns
(see [1]), and the runc package is now deprecated in favor of moby/sys
(see [2]).

In addition, moby/sys/userns now has a non-Linux implementation, so
pkg/kubelet/user/userns package (introduced in commit 2e999ff to make a
non-Linux implementation) is not really needed anymore.

Let's switch to moby/sys/userns, and remove the package.

[1]: https://github.com/moby/sys/releases/tag/userns%2Fv0.1.0
[2]: https://github.com/opencontainers/runc/pull/4350

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-10-22 14:36:14 -07:00
Tim Allclair
53aa727708 Checkpoint allocated requests and limits 2024-10-22 11:26:48 -07:00
zhifei92
dac7332ed2 integrate kubelet with the systemd watchdog
feat:  add unit test

feat:  add FeatureGate for SystemdWatchdog

fix:  linter and failed tests

feat:  add SystemdWatchdog to versioned feature list yaml
2024-10-21 10:46:14 +08:00
Kubernetes Prow Robot
e2c17c09a4 Merge pull request #125070 from torredil/kublet-vm-race
Ensure volumes are unmounted during graceful node shutdown
2024-10-02 00:33:48 +01:00
elmiko
38fe239ac4 factor our cloudprovider.DeprecationWarningForProvider
this change removes the deprecation warning function in favor of using
the `cloudprovider.DisableWarningForProvider`. it also fixes some of the
logic to ensure that non-external providers are properly detected and
warned about.
2024-09-30 12:20:25 -04:00
elmiko
d1d05d3eba remove IsDeprecatedInternal from cloudprovider.plugins
The internal cloud controller loops are disabled at this point, this
function should not be used as it does not return accurate information.
In its place we check for the presence of the external cloud provider as
that is the only acceptable value.
2024-09-26 14:55:25 -04:00
torredil
b4a21a3e43 Wait for volume teardown during graceful node shutdown
Signed-off-by: torredil <torredil@amazon.com>
2024-09-19 20:39:33 +00:00
Kubernetes Prow Robot
24a74f887a Merge pull request #126595 from pacoxu/kubelet-cgroup-v2-kernel-version
[1.32]kubelet: add log and event for cgroup v2 running on kernel < 5.8
2024-09-18 18:34:44 +01:00
Paco Xu
259671bd43 check root cpu.stat instead of kernel version for cgroup v2 2024-09-18 11:39:36 +08:00
Kubernetes Prow Robot
bf6a9da1c3 Merge pull request #126931 from lengrongfu/feat/add-unready
Add default status "not ready" when readiness probe sync
2024-09-18 03:28:44 +01:00
rongfu.leng
de4010939d add default status unready when readiness probe sync
Signed-off-by: rongfu.leng <lenronfu@gmail.com>
2024-09-14 08:56:32 +00:00
Oksana Baranova
2474369227 kubelet: migrate pleg to contextual logging
Signed-off-by: Oksana Baranova <oksana.baranova@intel.com>
2024-09-13 12:13:26 +03:00
Kubernetes Prow Robot
5ac315faf4 Merge pull request #126494 from bart0sh/PR153-migrate-DRA-Manager-to-contextual-logging
Migrate pkg/kubelet/cm/dra to contextual logging
2024-08-26 13:14:26 +01:00
Kubernetes Prow Robot
3f306ae140 Merge pull request #126343 from SergeyKanzhelev/succeededPodReadmitted
Terminated pod should not be re-admitted
2024-08-22 16:32:09 +01:00
Ed Bartosh
e1bc8defac kubelet: Migrate DRA Manager to contextual logging
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
2024-08-22 11:12:41 +03:00
Kubernetes Prow Robot
113b12c6fb Merge pull request #124439 from bells17/csi-translation-lib-structured-and-contextual-logging
Migrate k8s.io/csi-translation-lib/.* to structured logging
2024-08-19 18:13:54 -07:00
Paco Xu
69a67556c7 kubelet: add warning log and events for cgroup v2 running on kernel < 5.8 2024-08-12 14:06:56 +08:00
Sergey Kanzhelev
300128de65 succeeded pod is being re-admitted 2024-07-25 17:45:27 +00:00
Kubernetes Prow Robot
57d197fb89 Merge pull request #124430 from AllenXu93/fix-kubelet-restart-notReady
fix node notReady in first sync period after kubelet restart
2024-07-23 21:20:40 -07:00
Sergey Kanzhelev
62f96d2748 set AllocatedResourcesStatus in the Pod Status 2024-07-24 00:29:35 +00:00
Kubernetes Prow Robot
558c9536a1 Merge pull request #123678 from kinvolk/userns-use-kubelet-user-mappings
kubelet: Add logs for userns custom mappings parsing
2024-07-20 19:59:57 -07:00
bells17
1298c8a5fe csi-translation-lib: Support structured and contextual logging 2024-07-18 14:01:27 +09:00
Kubernetes Prow Robot
52c0ed4673 Merge pull request #124342 from zhifei92/fix-error-check
fix error checking in kl.killPod within SyncPod
2024-07-16 16:05:07 -07:00
Kubernetes Prow Robot
fc3abdaf2d Merge pull request #125470 from everpeace/kep-3619-SupplementalGroupsPolicy-e2e
KEP-3619: Add NodeStatus.Features.SupplementalGroupsPolicy API and e2e
2024-07-16 13:57:06 -07:00