kubernetes

mirror of https://github.com/optim-enterprises-bv/kubernetes.git synced 2025-12-15 20:37:39 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	45b96eae98	Merge pull request #113145 from smarterclayton/zombie_terminating_pods kubelet: Force deleted pods can fail to move out of terminating	2023-03-09 15:32:30 -08:00
Clayton Coleman	6b9a381185	kubelet: Force deleted pods can fail to move out of terminating If a CRI error occurs during the terminating phase after a pod is force deleted (API or static) then the housekeeping loop will not deliver updates to the pod worker which prevents the pod's state machine from progressing. The pod will remain in the terminating phase but no further attempts to terminate or cleanup will occur until the kubelet is restarted. The pod worker now maintains a store of the pods state that it is attempting to reconcile and uses that to resync unknown pods when SyncKnownPods() is invoked, so that failures in sync methods for unknown pods no longer hang forever. The pod worker's store tracks desired updates and the last update applied on podSyncStatuses. Each goroutine now synchronizes to acquire the next work item, context, and whether the pod can start. This synchronization moves the pending update to the stored last update, which will ensure third parties accessing pod worker state don't see updates before the pod worker begins synchronizing them. As a consequence, the update channel becomes a simple notifier (struct{}) so that SyncKnownPods can coordinate with the pod worker to create a synthetic pending update for unknown pods (i.e. no one besides the pod worker has data about those pods). Otherwise the pending update info would be hidden inside the channel. In order to properly track pending updates, we have to be very careful not to mix RunningPods (which are calculated from the container runtime and are missing all spec info) and config- sourced pods. Update the pod worker to avoid using ToAPIPod() and instead require the pod worker to directly use update.Options.Pod or update.Options.RunningPod for the correct methods. Add a new SyncTerminatingRuntimePod to prevent accidental invocations of runtime only pod data. Finally, fix SyncKnownPods to replay the last valid update for undesired pods which drives the pod state machine towards termination, and alter HandlePodCleanups to: - terminate runtime pods that aren't known to the pod worker - launch admitted pods that aren't known to the pod worker Any started pods receive a replay until they reach the finished state, and then are removed from the pod worker. When a desired pod is detected as not being in the worker, the usual cause is that the pod was deleted and recreated with the same UID (almost always a static pod since API UID reuse is statistically unlikely). This simplifies the previous restartable pod support. We are careful to filter for active pods (those not already terminal or those which have been previously rejected by admission). We also force a refresh of the runtime cache to ensure we don't see an older version of the state. Future changes will allow other components that need to view the pod worker's actual state (not the desired state the podManager represents) to retrieve that info from the pod worker. Several bugs in pod lifecycle have been undetectable at runtime because the kubelet does not clearly describe the number of pods in use. To better report, add the following metrics: kubelet_desired_pods: Pods the pod manager sees kubelet_active_pods: "Admitted" pods that gate new pods kubelet_mirror_pods: Mirror pods the kubelet is tracking kubelet_working_pods: Breakdown of pods from the last sync in each phase, orphaned state, and static or not kubelet_restarted_pods_total: A counter for pods that saw a CREATE before the previous pod with the same UID was finished kubelet_orphaned_runtime_pods_total: A counter for pods detected at runtime that were not known to the kubelet. Will be populated at Kubelet startup and should never be incremented after. Add a metric check to our e2e tests that verifies the values are captured correctly during a serial test, and then verify them in detail in unit tests. Adds 23 series to the kubelet /metrics endpoint.	2023-03-08 22:03:51 -06:00
Kubernetes Prow Robot	625b8be09e	Merge pull request #115371 from pacoxu/cgroup-v2-memory-tuning default memoryThrottlingFactor to 0.9 and optimize the memory.high formulas	2023-03-08 18:46:00 -08:00
Kubernetes Prow Robot	b8aaaf380a	Merge pull request #116083 from SataQiu/clean-20230227 kubelet: remove unused DockerID type	2023-03-06 02:22:58 -08:00
Paco Xu	81c5a122c3	add pageSize to memory.high formula	2023-03-03 11:24:50 +08:00
Paco Xu	7dab6253e1	default memoryThrottlingFactor to 0.9 and optimize the memory.high calculation formulas	2023-03-03 11:24:40 +08:00
ruiwen-zhao	572e6e0ffb	Add MaxParallelImagePulls support Signed-off-by: ruiwen-zhao <ruiwen@google.com>	2023-03-02 03:57:59 +00:00
Kubernetes Prow Robot	53f3583c7f	Merge pull request #114785 from TommyStarK/kubelet/replace-deprecated-pointer-function kubelet: Replace deprecated pointer function	2023-03-01 18:04:55 -08:00
Ed Bartosh	5a86895070	DRA: pass CDI devices through CRI CDIDevice field	2023-02-28 19:21:20 +02:00
SataQiu	ed2caf17e0	kubelet: remove unused DockerID type	2023-02-27 16:02:59 +08:00
Chen Wang	7db339dba2	This commit contains the following: 1. Scheduler bug-fix + scheduler-focussed E2E tests 2. Add cgroup v2 support for in-place pod resize 3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes. Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>	2023-02-24 18:21:21 +00:00
Vinay Kulkarni	f2bd94a0de	In-place Pod Vertical Scaling - core implementation 1. Core Kubelet changes to implement In-place Pod Vertical Scaling. 2. E2E tests for In-place Pod Vertical Scaling. 3. Refactor kubelet code and add missing tests (Derek's kubelet review) 4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature. 5. Fix corner-case where resize A->B->A gets ignored 6. Add cgroup v2 support to pod resize E2E test. KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>	2023-02-24 18:21:21 +00:00
Ed Bartosh	4f88332ab4	kubelet: prepare DRA resources before CNI setup	2023-02-06 20:40:11 +02:00
Claudiu Belu	ec753fcb55	unittests: Fixes unit tests for Windows (part 6) Currently, there are some unit tests that are failing on Windows due to various reasons: - On Windows, consecutive time.Now() calls may return the same timestamp, which would cause the TestFreeSpaceRemoveByLeastRecentlyUsed test to flake. - tests in kuberuntime_container_windows_test.go fail on Nodes that have fewer than 3 CPUs, expecting the CPU max set to be more than 100% of available CPUs, which is not possible. - calls in summary_windows_test.go are missing context. - filterTerminatedContainerInfoAndAssembleByPodCgroupKey will filter and group container information by the Pod cgroup key, if it exists. However, we don't have cgroups on Windows, thus we can't make the same assertions.	2023-01-31 11:49:26 +00:00
Kubernetes Prow Robot	559014f13e	Merge pull request #115273 from SergeyKanzhelev/restartCountRegexFix use a proper regex looking for the restartCount	2023-01-30 17:36:49 -08:00
Sergey Kanzhelev	15b63c380e	use a proper regex looking for the restartCount	2023-01-25 23:55:27 +00:00
Patrick Ohly	bc6c7fa912	logging: fix names of keys The stricter checking with the upcoming logcheck v0.4.1 pointed out these names which don't comply with our recommendations in https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#name-arguments.	2023-01-23 14:24:29 +01:00
Kubernetes Prow Robot	c913e6ce62	Merge pull request #114542 from pacoxu/EphemeralContainers cleanup: EphemeralContainers feature gate related codes	2023-01-17 11:18:34 -08:00
Paco Xu	70e56fa71a	cleanup: EphemeralContainers feature gate related codes	2023-01-15 21:15:01 +08:00
TommyStarK	1fcc8fbf59	kubelet: Replace deprecated pointer function Signed-off-by: TommyStarK <thomasmilox@gmail.com>	2023-01-08 13:44:09 +01:00
Michael Weibel	8818c215c1	win: fix cpu count to calculate cpu_maximum take all processor groups into account when calculating cpu maximum. Signed-off-by: Michael Weibel <michael@helio.exchange>	2022-12-14 13:56:31 +01:00
Kubernetes Prow Robot	a668924cb6	Merge pull request #113255 from claudiubelu/path-filepath-update-kubelet Replaces path.Operation with filepath.Operation (kubelet)	2022-12-09 22:27:41 -08:00
Peter Hunt	6298ce68e2	kubelet: wire ListPodSandboxMetrics Signed-off-by: Peter Hunt <pehunt@redhat.com>	2022-11-08 14:47:08 -05:00
Daniel Ye	dcc7c2f660	Add fake runtimes and CRI changes for KEP-2371 Added new gRPC call 'ListPodSanboxMetrics' which would return additional container stats currently supported by cAdvisor, but outside the scope of /stats/summary api. Added new types to support metric exporting of prometheus, including Metric and other subfields. Added fake runtime changes associated with the CRI changes.	2022-11-08 14:47:08 -05:00
Claudiu Belu	b9bf3e5c49	Replaces path.Operation with filepath.Operation (kubelet) The path module has a few different functions: Clean, Split, Join, Ext, Dir, Base, IsAbs. These functions do not take into account the OS-specific path separator, meaning that they won't behave as intended on Windows. For example, Dir is supposed to return all but the last element of the path. For the path "C:\some\dir\somewhere", it is supposed to return "C:\some\dir\", however, it returns ".". Instead of these functions, the ones in filepath should be used instead.	2022-11-08 16:05:48 +00:00
Harshal Patil	86284d42f8	Add support for Evented PLEG Signed-off-by: Harshal Patil <harpatil@redhat.com> Co-authored-by: Swarup Ghosh <swghosh@redhat.com>	2022-11-08 20:06:16 +05:30
Kubernetes Prow Robot	2ef00038d3	Merge pull request #112961 from marosset/windows-hostnetwork-alpha Windows hostnetwork alpha	2022-11-07 12:42:16 -08:00
David Ashpole	64af1adace	Second attempt: Plumb context to Kubelet CRI calls (#113591 ) * plumb context from CRI calls through kubelet * clean up extra timeouts * try fixing incorrectly cancelled context	2022-11-05 06:02:13 -07:00
Mark Rossetti	f4305db4ee	populate namespace options in runtimeapi.WindowsSandboxSecurityContext + unit tests Signed-off-by: Mark Rossetti <marosset@microsoft.com>	2022-11-04 09:29:39 -07:00
Kubernetes Prow Robot	1bf4af4584	Merge pull request #111930 from azylinski/new-histogram-pod_start_sli_duration_seconds New histogram: Pod start SLI duration	2022-11-04 07:28:14 -07:00
Kubernetes Prow Robot	79014dd6da	Merge pull request #113216 from astraw99/ftr-add-backoff-container Add container name in the `BackOff` event message	2022-11-03 21:24:13 -07:00
Antonio Ojea	9c2b333925	Revert "plumb context from CRI calls through kubelet" This reverts commit `f43b4f1b95`.	2022-11-02 13:37:23 +00:00
astraw99	244598af80	Add back-off restarting failed container name	2022-11-02 20:46:32 +08:00
Kubernetes Prow Robot	5899432f92	Merge pull request #113481 from rphillips/fixes/77063 kubelet: fix pod log line corruption when using timestamps and long lines	2022-11-01 19:59:50 -07:00
Kubernetes Prow Robot	9bbd0fbdb2	Merge pull request #113476 from marosset/hpc-to-stable Promoting WindowsHostProcessContainers to stable	2022-11-01 19:59:43 -07:00
Mark Rossetti	498d065cc5	Promoting WindowsHostProcessContainers to stable Signed-off-by: Mark Rossetti <marosset@microsoft.com>	2022-11-01 14:06:25 -07:00
Ryan Phillips	ddae396ce3	kubelet: fix pod log line corruption when using timestamps and long lines	2022-11-01 09:22:30 -05:00
David Ashpole	f43b4f1b95	plumb context from CRI calls through kubelet	2022-10-28 02:55:28 +00:00
Artur Żyliński	b0fac15cd6	Make the interface local to each package	2022-10-26 11:28:18 +02:00
Artur Żyliński	9f31669a53	New histogram: Pod start SLI duration	2022-10-26 11:28:17 +02:00
Kubernetes Prow Robot	244c035b87	Merge pull request #110263 from claudiubelu/unittests unittests: Fixes unit tests for Windows	2022-10-25 14:50:34 -07:00
Claudiu Belu	6f2eeed2e8	unittests: Fixes unit tests for Windows Currently, there are some unit tests that are failing on Windows due to various reasons: - config options not supported on Windows. - files not closed, which means that they cannot be removed / renamed. - paths not properly joined (filepath.Join should be used). - time.Now() is not as precise on Windows, which means that 2 consecutive calls may return the same timestamp. - different error messages on Windows. - files have \r\n line endings on Windows. - /tmp directory being used, which might not exist on Windows. Instead, the OS-specific Temp directory should be used. - the default value for Kubelet's EvictionHard field was containing OS-specific fields. This is now moved, the field is now set during Kubelet's initialization, after the config file is read.	2022-10-25 23:46:56 +03:00
Jordan Liggitt	122b43037e	Record event for lifecycle fallback to http	2022-10-19 14:11:36 -04:00
Jason Simmons	5a6acf85fa	Align lifecycle handlers and probes Align the behavior of HTTP-based lifecycle handlers and HTTP-based probers, converging on the probers implementation. This fixes multiple deficiencies in the current implementation of lifecycle handlers surrounding what functionality is available. The functionality is gated by the features.ConsistentHTTPGetHandlers feature gate.	2022-10-19 09:51:52 -07:00
Kubernetes Prow Robot	2522420937	Merge pull request #111601 from claudiubelu/skip-unittests unit tests: Skip Windows-unrelated tests on Windows	2022-10-18 11:29:30 -07:00
Kubernetes Prow Robot	843ad71cac	Merge pull request #113041 from saschagrunert/kubelet-pods-creation-time Sort kubelet pods by their creation time	2022-10-18 09:17:19 -07:00
Claudiu Belu	af77381e01	unit tests: Skip Windows-unrelated tests on Windows Some of the unit tests cannot pass on Windows due to various reasons: - fsnotify does not have a Windows implementation. - Proxy Mode IPVS not supported on Windows. - Seccomp not supported on Windows. - VolumeMode=Block is not supported on Windows. - iSCSI volumes are mounted differently on Windows, and iscsiadm is a Linux utility.	2022-10-18 12:43:07 +03:00
Kubernetes Prow Robot	6f579d3ceb	Merge pull request #111616 from ndixita/credential-api-ga Move the Kubelet Credential Provider feature to GA and Update the Credential Provider API to GA	2022-10-15 07:53:09 -07:00
Sascha Grunert	b296f82c69	Sort kubelet pods by their creation time There is a corner case when blocking Pod termination via a lifecycle preStop hook, for example by using this StateFulSet: ```yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: web spec: selector: matchLabels: app: ubi serviceName: "ubi" replicas: 1 template: metadata: labels: app: ubi spec: terminationGracePeriodSeconds: 1000 containers: - name: ubi image: ubuntu:22.04 command: ['sh', '-c', 'echo The app is running! && sleep 360000'] ports: - containerPort: 80 name: web lifecycle: preStop: exec: command: - /bin/sh - -c - 'echo aaa; trap : TERM INT; sleep infinity & wait' ``` After creation, downscaling, forced deletion and upscaling of the replica like this: ``` > kubectl apply -f sts.yml > kubectl scale sts web --replicas=0 > kubectl delete pod web-0 --grace-period=0 --force > kubectl scale sts web --replicas=1 ``` We will end up having two pods running by the container runtime, while the API only reports one: ``` > kubectl get pods NAME READY STATUS RESTARTS AGE web-0 1/1 Running 0 92s ``` ``` > sudo crictl pods POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME e05bb7dbb7e44 12 minutes ago Ready web-0 default 0 (default) d90088614c73b 12 minutes ago Ready web-0 default 0 (default) ``` When now running `kubectl exec -it web-0 -- ps -ef`, there is a random chance that we hit the wrong container reporting the lifecycle command `/bin/sh -c echo aaa; trap : TERM INT; sleep infinity & wait`. This is caused by the container lookup via its name (and no podUID) at: `02109414e8/pkg/kubelet/kubelet_pods.go (L1905-L1914)` And more specifiy by the conversion of the pod result map to a slice in `GetPods`: `02109414e8/pkg/kubelet/kuberuntime/kuberuntime_manager.go (L407-L411)` We now solve that unexpected behavior by tracking the creation time of the pod and sorting the result based on that. This will cause to always match the most recently created pod. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2022-10-13 16:32:44 +02:00
Dixita Narang	ff1f525511	Setting LockToDefault as true for KubeletCredentialProviders feature, and removing conditions that check if the feature is enabled since now the feature is enabled by default	2022-09-29 16:42:48 +00:00

1 2 3 4 5 ...

715 Commits