kubernetes

mirror of https://github.com/outbackdingo/kubernetes.git synced 2026-01-27 18:19:28 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	9d3fff5048	Merge pull request #133353 from ffromani/e2e-node-serial-unblock node: unblock e2e serial lanes	2025-08-06 11:37:25 -07:00
Francesco Romani	aca402f25b	e2e: node: skip breaking tests Skip problematic tests to recover signal, then we will reintroduce them gradually See: https://github.com/kubernetes/kubernetes/issues/133314 See: https://github.com/kubernetes/kubernetes/pull/133336 Signed-off-by: Francesco Romani <fromani@redhat.com>	2025-08-01 13:59:32 +02:00
Kevin Hannon	e83e5815e5	always pull pause image for eviction tests	2025-08-01 00:55:10 -04:00
Kubernetes Prow Robot	91731d05e2	Merge pull request #133279 from ffromani/pod-level-resource-managers [PodLevelResources] handle pod-level resource manager alignment	2025-07-29 17:28:33 -07:00
Francesco Romani	a3a767b37e	WIP: fix e2e tests Signed-off-by: Francesco Romani <fromani@redhat.com>	2025-07-29 20:20:08 +02:00
Kubernetes Prow Robot	ba76d238ad	Merge pull request #133259 from ndixita/code-refactor Adding check for nil pod resources in huge pages test	2025-07-28 23:18:30 -07:00
Kubernetes Prow Robot	dd4e4f1dd1	Merge pull request #133262 from BenTheElder/no-authenticated-image-pulling remove broken test that depends on expired credential, remove hardcoded credential, add TODOs	2025-07-28 17:28:28 -07:00
Benjamin Elder	8ace0fb89f	remove failing test that depends on expired credential, remove credential, add TODOs see: https://github.com/kubernetes/kubernetes/issues/130271	2025-07-28 15:43:43 -07:00
ndixita	4b698656be	Returning early if podResources is nil to avoid nil pointer dereferencing Signed-off-by: ndixita <ndixita@google.com>	2025-07-28 19:31:08 +00:00
Kevin Torres	766d011bba	E2E tests for no hints nor aligment of CPU and Memory managers	2025-07-28 18:53:04 +00:00
Kubernetes Prow Robot	6d4ca967f7	Merge pull request #132824 from roycaihw/psi-pressure-test Extend E2E test coverage for PSI metrics under pressure	2025-07-25 00:32:27 -07:00
Kubernetes Prow Robot	72f9a9260a	Merge pull request #130606 from Jpsassine/dra_device_health_status Expose DRA device health in PodStatus	2025-07-24 20:14:27 -07:00
Kubernetes Prow Robot	3fd1251165	Merge pull request #131089 from KevinTMtz/pod-level-hugepage-cgroups [PodLevelResources] Propagate Pod level hugepage cgroup to containers	2025-07-24 19:08:26 -07:00
Kubernetes Prow Robot	63011fe547	Merge pull request #132277 from KevinTMtz/pod-level-resources-eviction-manager [PodLevelResources] Pod Level Resources Eviction Manager	2025-07-24 16:44:34 -07:00
John-Paul Sassine	b7de71f9ce	feat(kubelet): Add ResourceHealthStatus for DRA pods This change introduces the ability for the Kubelet to monitor and report the health of devices allocated via Dynamic Resource Allocation (DRA). This addresses a key part of KEP-4680 by providing visibility into device failures, which helps users and controllers diagnose pod failures. The implementation includes: - A new `v1alpha1.NodeHealth` gRPC service with a `WatchResources` stream that DRA plugins can optionally implement. - A health information cache within the Kubelet's DRA manager to track the last known health of each device and handle plugin disconnections. - An asynchronous update mechanism that triggers a pod sync when a device's health changes. - A new `allocatedResourcesStatus` field in `v1.ContainerStatus` to expose the device health information to users via the Pod API. Update vendor KEP-4680: Fix lint, boilerplate, and codegen issues Add another e2e test, add TODO for KEP4680 & update test infra helpers Add Feature Gate e2e test Fixing presubmits Fix var names, feature gating, and nits Fix DRA Health gRPC API according to review feedback	2025-07-24 23:23:18 +00:00
Haowei Cai	252513a1b9	Add WithFeature and WithSerial, also check if cgroup v2 is used in test	2025-07-24 21:40:08 +00:00
Kevin Torres	f925e55548	E2E tests for container hugepage resources immutability Pod level hugepage resources are not propagated to the containers, only pod level cgroup values are propagated to the containers when they do not specify hugepage resources.	2025-07-24 21:29:04 +00:00
Kubernetes Prow Robot	ebbebe8be6	Merge pull request #133157 from haircommander/cgroup-driver-cri-ga KEP 4033: Add metric for out of support CRI and bump feature to GA	2025-07-24 13:05:04 -07:00
Kubernetes Prow Robot	e4e13c1e80	Merge pull request #132818 from ffromani/e2e-node-cpumanager-cgroupv1-compat e2e: node: cpumanager cgroup v1 compatibility	2025-07-24 13:04:41 -07:00
Kevin Torres	add7132a6d	E2E tests for pod level resources Kubelet Preemption	2025-07-24 17:08:13 +00:00
Kevin Torres	976a617d05	E2E tests for pod level resources eviction manager	2025-07-24 17:07:09 +00:00
Peter Hunt	83a0d0c660	kubelet: add metric for version CRI implementation will lose support Signed-off-by: Peter Hunt <pehunt@redhat.com>	2025-07-24 11:42:59 -04:00
Kubernetes Prow Robot	d21da29c9e	Merge pull request #133170 from ffromani/e2e-node-podres-memmgr e2e: podresources: disable memory manager integration	2025-07-24 07:56:48 -07:00
Francesco Romani	449763fb11	e2e: podresources: disable memory manager integration As part of the PR 132028 we added more e2e test coverage to validate the fix, and check as much as possible there are no regressions. The issue and the fix become evident largely when inspecting memory allocation with the Memory Manager static policy enabled. Quoting the commit message of `bc56d0e45a` ``` The podresources API List implementation uses the internal data of the resource managers as source of truth. Looking at the implementation here: https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/pkg/kubelet/apis/podresources/server_v1.go#L60 we take care of syncing the device allocation data before querying the device manager to return its pod->devices assignment. This is needed because otherwise the device manager (and all the other resource managers) would do the cleanup asynchronously, so the `List` call will return incorrect data. But we don't do this syncing neither for CPUs or for memory, so when we report these we will get stale data as the issue #132020 demonstrates. For CPU manager, we however have the reconcile loop which cleans the stale data periodically. Turns out this timing interplay was actually the reason the existing issue #119423 seemed fixed (see: #119423 (comment)). But it's actually timing. If in the reproducer we set the `cpuManagerReconcilePeriod` to a time very high (>= 5 minutes), then the issue still reproduces against current master branch (https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/test/e2e_node/podresources_test.go#L983). ``` The missing actor here is memory manager. Memory manager has no reconcile loop (implicit fixing the stale data problem) no explicit synchronization, so it is the unlucky one which reported stale data, leading to the eventual understanding of the problem. For this reason it was (and still is) important to exercise it during the test. Turns out the test is however wrong, likely because a hidden dependency between the test expectations and the lane configuration (notably machine specs), so we disable the memory manager activation for the time being, until we figure out a safe way to enable it. Note this significantly weakens the signal for this specific test. Signed-off-by: Francesco Romani <fromani@redhat.com>	2025-07-24 12:35:45 +02:00
Patrick Ohly	5c4f81743c	DRA: use v1 API As before when adding v1beta2, DRA drivers built using the k8s.io/dynamic-resource-allocation helper packages remain compatible with all Kubernetes release >= 1.32. The helper code picks whatever API version is enabled from v1beta1/v1beta2/v1. However, the control plane now depends on v1, so a cluster configuration where only v1beta1 or v1beta2 are enabled without the v1 won't work.	2025-07-24 08:33:45 +02:00
Kubernetes Prow Robot	dd6fa8bafd	Merge pull request #133129 from ffromani/podres-get-add-tests node: podresources: improve test coverage for the `Get` endpoint	2025-07-23 19:56:40 -07:00
Kubernetes Prow Robot	aee92cd6c3	Merge pull request #132968 from wongchar/uncore-e2e-beta cpumanager: expand test coverage for prefer-align-cpus-by-uncore-cache	2025-07-22 13:40:50 -07:00
Francesco Romani	303a7056ff	e2e: node: podresources: enable multi-container tests fix the utilities to enable multi-app-container tests, which were previously quite hard to implement. Add a consumer of the new utility to demonstrate the usage and to initiate the basic coverage. Signed-off-by: Francesco Romani <fromani@redhat.com>	2025-07-22 19:58:29 +02:00
Francesco Romani	38a9a8a59d	e2e: node: podresources: add tests for missing pod add a e2e test to ensure that if the Get endpoint is asked about a non-existing pod, it returns error. Likewise, add a e2e test for terminated pods, which should not be returned because they don't consume nor hold resources, much like `List` does. The expected usage patterns is to iterate over the list of pods returned by `List`, but nevertheless the endpoint must handle this case. Signed-off-by: Francesco Romani <fromani@redhat.com>	2025-07-22 19:55:09 +02:00
Charles Wong	545b36ba29	fix uncore e2e check	2025-07-22 09:50:18 -05:00
Kubernetes Prow Robot	4dfb8523fc	Merge pull request #128239 from HirazawaUi/fix-e2e-tests Fix container lifecycle flaking e2e tests	2025-07-21 18:08:25 -07:00
Kubernetes Prow Robot	b8eda18fc9	Merge pull request #132198 from natasha41575/mirror-obs-gen add generation / observedGeneration test for mirror pods	2025-07-21 16:30:25 -07:00
Natasha Sarkar	c659b41826	e2e test for mirror pod with pod generation	2025-07-21 22:27:13 +00:00
Kubernetes Prow Robot	47d9d86326	Merge pull request #133028 from saschagrunert/deviceplugin-proto Convert `k8s.io/kubelet/pkg/apis/deviceplugin` from gogo to protoc	2025-07-21 14:14:55 -07:00
Kubernetes Prow Robot	7d758620bc	Merge pull request #132083 from carlory/cleanup-GAed-fg-DevicePluginCDIDevices remove general avaliable feature-gate DevicePluginCDIDevices	2025-07-21 13:06:27 -07:00
Charles Wong	ccc82775f4	expand test coverage for uncore alignment add feature compatibility check uncore cpuset alignment check shared uncores	2025-07-21 11:19:25 -05:00
Haowei Cai	cb29414b44	Extend E2E test coverage for PSI metrics under pressure Validate that PSI metrics are correctly reported under various resource pressure scenarios.	2025-07-21 16:13:32 +00:00
Francesco Romani	ea326373ef	e2e: node: cpumanager cgroup v1 compatibility While we support cgroup v1, we want some test coverage. This patch enables v1 coverage for most of the testcases. We intentionally rule out the CFS quota tests because we want to support this change only on cgroup v2. Signed-off-by: Francesco Romani <fromani@redhat.com>	2025-07-21 13:57:50 +02:00
Sascha Grunert	3026020b44	Convert `k8s.io/kubelet/pkg/apis/deviceplugin` from gogo to protoc Use standard protoc for the device plugin API instead of gogo. Part of kubernetes#96564 Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2025-07-21 10:04:01 +02:00
Kubernetes Prow Robot	5e83b9c2c2	Merge pull request #129942 from bart0sh/PR171-migrate-some-kubelet-components-to-contextual-logging Migrate kubelet/{apis,kubeletconfig,nodeshutdown,pod,preemption} to contextual logging	2025-07-18 20:28:25 -07:00
Kubernetes Prow Robot	daee8efa4d	Merge pull request #132811 from ffromani/e2e-serial-cpumanager-tests-cleanup e2e: node: cpumanager: fix cpu quota non-regression tests	2025-07-18 15:24:38 -07:00
Kubernetes Prow Robot	7fa6cdde88	Merge pull request #127630 from dshebib/e2eNode_UpdateToAgnhost [e2e_node] containers_lifecycle update from busybox to agnhost	2025-07-18 15:24:25 -07:00
Kubernetes Prow Robot	9212246d78	Merge pull request #132827 from guptaNswati/e2e-podresourcesGet-featuregate Add feature gate enable test for KubeletPodResourcesGet	2025-07-18 12:12:25 -07:00
Swati Gupta	14a5ef56a3	fix pipeline failure Signed-off-by: Swati Gupta <swatig@nvidia.com>	2025-07-17 23:21:26 +00:00
Sascha Grunert	532d48fe6a	Convert `k8s.io/kubelet/pkg/apis/podresources` from gogo to protoc Use standard protoc for the pod resources instead of gogo. Part of kubernetes#96564 Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2025-07-17 14:56:44 +02:00
Ed Bartosh	75ccd69bab	migrate pkg/kubelet/kubeletconfig to contextual logging	2025-07-17 10:16:03 +03:00
Kubernetes Prow Robot	8f312e6fbf	Merge pull request #132348 from iholder101/swap/add-container-swap-limit-metric [KEP-2400] Add a container_swap_limit_bytes metric	2025-07-16 20:02:30 -07:00
Kubernetes Prow Robot	9f545c5b46	Merge pull request #130992 from dshebib/addRegularContainerImageChangeToE2E_reverted E2E Node Tests: Remove failing test from reverted PR	2025-07-16 20:02:23 -07:00
Swati Gupta	8f4a624a59	Fix pipeline errors Signed-off-by: Swati Gupta <swatig@nvidia.com>	2025-07-16 22:56:59 +00:00
Ed Bartosh	e4320fe25c	e2e_node: DRA: test handling fatal serving failures Added an e2e_node test to verify that the DRA plugin and registration services cancel provided context when handling fatal gRPC serving errors.	2025-07-16 15:49:41 +03:00

1 2 3 4 5 ...

3238 Commits