kubernetes

mirror of https://github.com/outbackdingo/kubernetes.git synced 2026-01-28 10:19:31 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	88d2355c41	Merge pull request #129951 from parkjeongryul/add-e2e-topology-manager-for-init-ctn Add e2e test for topology manager with restartable init containers	2025-02-28 04:38:23 -08:00
parkjeongryul	dca3f56f64	Add e2e test for topology manager with restartable init containers	2025-02-28 00:48:27 +09:00
Davanum Srinivas	fb3b163ca0	Ensure we switch to k8s root directory for dockerized builds during e2e-node ci job Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2025-02-27 10:05:45 -05:00
Kubernetes Prow Robot	27cbe54b09	Merge pull request #130163 from ffromani/e2e-node-fix-cpu-quota-test e2e: node: cpumgr: cleanup after each test case	2025-02-25 05:08:29 -08:00
Francesco Romani	323410664c	e2e: node: cpumgr: check CPU allocatable for CFS quota test add (admittedly pretty crude) CPU allocatable check. A more incisive refactoring is needed, but we need to unbreak CI first, so this seems the minimal decently clean test. Signed-off-by: Francesco Romani <fromani@redhat.com>	2025-02-18 10:04:57 +01:00
carlory	c48499d360	fix ci Signed-off-by: carlory <baofa.fan@daocloud.io>	2025-02-17 11:49:24 +08:00
Francesco Romani	844c2ef39d	e2e: node: cpumgr: cleanup after each test case Our CI machines happen to have 1 fully allocatable CPU for test workloads. This is really, really the minimal amount. But still should be sufficient for the tests to run the tests; the CFS quota pod, however, does create a series of pods (at time of writing, 6) and does the cleanup only at the very end the end. This means pods requiring resources accumulate on the CI machine node. The fix implemented here is to just clean up after each subcase. Doing so the cpu test footprint is equal to the higher requirement (say, 1000 millicores) vs the sum of all the subcases requirements. Doing like this doesn't change the test behavior, and make it possible to run it on very barebones machines.	2025-02-14 15:45:36 +01:00
Kubernetes Prow Robot	5d57d0c110	Merge pull request #129845 from bitoku/fix-flake Reduce the number of processes used in e2e to prevent unexpected OOM	2025-02-12 13:16:20 -08:00
Kubernetes Prow Robot	5e1c31b9db	Merge pull request #130053 from iholder101/bugfix/swap-resource-metrics-e2e-bug [KEP-2400] [failing-test] resource metrics e2e tests: expect swap node and container level stats	2025-02-12 12:02:28 -08:00
Kubernetes Prow Robot	cd2959b798	Merge pull request #127525 from scott-grimes/patch-1 fix: pods meeting qualifications for static placement when cpu-manager-policy=static should not have cfs quota enforcement	2025-02-12 12:02:21 -08:00
Ayato Tokubi	dbb34a04cc	Reduce the number of processes used in e2e to prevent unexpected OOM Signed-off-by: Ayato Tokubi <atokubi@redhat.com>	2025-02-12 17:39:56 +00:00
Scott Grimes	1c5170ff52	disable cfs quota when exclusive cpus allocated per static cpu policy requirements	2025-02-11 13:42:30 -05:00
Itamar Holder	8a797e42e1	resource metrics e2e tests: expect swap node and container level stats Signed-off-by: Itamar Holder <iholder@redhat.com>	2025-02-11 15:19:45 +02:00
Kubernetes Prow Robot	491a23f079	Merge pull request #129999 from pohly/test-e2e-node-timeout E2E node: fix --timeout default	2025-02-06 03:59:55 -08:00
Patrick Ohly	46a17f60e4	E2E node: fix --timeout default For unknown reasons, hack/make-rules/test-e2e-node.sh adds -timeout instead of --timeout. Therefore the fallback code in test/e2e_node/remote/remote.go didn't find it and added its own --timeout=60m after it. This effectively limits E2E node test runs to 60 minutes, regardless of what is specified in the job: W0206 09:53:51.425532 7151 remote.go:158] ginkgo flags are missing explicit --timeout (ginkgo defaults to 60 minutes) I0206 09:53:51.425565 7151 remote.go:165] updated ginkgo flags: -timeout=24h --label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky && !Slow" --no-color -v --timeout=60m ... I0206 09:53:57.767096 7151 ssh.go:146] Running the command ssh, with args: ... timeout -k 30s 3600.000000s ./ginkgo -timeout=24h --label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky && !Slow" --no-color -v --timeout=60m ... Note that the timeout for the test was 60m in this case (hence the "timeout -k 30s 3600.000000s") but it could also be something larger.	2025-02-06 11:45:12 +01:00
Kubernetes Prow Robot	c4434c3161	Merge pull request #129910 from bitoku/fix-129836 Fix flaky test for container life cycle	2025-02-04 16:23:09 -08:00
Kubernetes Prow Robot	f82439f536	Merge pull request #129486 from iholder101/bugfix/swap-container-cri-stats [KEP-2400] [Bugfix]: Ensure container-level swap metrics are collected	2025-02-04 08:14:59 -08:00
Kubernetes Prow Robot	a376ae5dad	Merge pull request #128845 from SergeyKanzhelev/staticPodUpgrade static pod upgrade test with hostNetwork	2025-02-03 23:30:58 -08:00
Vinayak Goyal	81f09811ca	Fix kubelet_authz_test.go	2025-01-31 15:38:18 +00:00
Ayato Tokubi	da5a76bd39	Fix flaky test for container life cycle Signed-off-by: Ayato Tokubi <atokubi@redhat.com>	2025-01-30 16:23:51 +00:00
Vinayak Goyal	ce7d2130ad	Fix kubelet_authz_test.go	2025-01-29 23:06:56 +00:00
Swati Sehgal	82f0303f89	node: e2e: Remove flaky label as device plugin reboot test is deflaked With the device plugin node reboot test fixed, we can see in testgrid [node-kubelet-containerd-flaky](https://testgrid.k8s.io/sig-node-containerd#node-kubelet-containerd-flaky) that the test is passing consitently and we can remove the flaky label. With the test not flaky anymore, we can validate new PRs against it and ensure we don't cause regressions. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2025-01-29 11:12:40 +00:00
Kubernetes Prow Robot	48dce2e9b3	Merge pull request #129776 from saschagrunert/cni-plugins-1.6.2 Update CNI plugins to v1.6.2 and avoid using k8s-artifacts-cni bucket	2025-01-28 07:29:26 -08:00
Kubernetes Prow Robot	2bda5dd8c7	Merge pull request #129656 from vinayakankugoyal/kep2862beta KEP-2862: Graduate to BETA.	2025-01-27 19:05:23 -08:00
Itamar Holder	617c094435	Add an e2e test Signed-off-by: Itamar Holder <iholder@redhat.com>	2025-01-27 15:44:18 +02:00
Vinayak Goyal	3a780a1c1b	KEP-2862: Graduate to BETA.	2025-01-24 21:36:00 +00:00
Kubernetes Prow Robot	29bf17b6cf	Merge pull request #129168 from kannon92/drop-node-features [KEP-3041] - remove nodefeatures from k/k repo	2025-01-23 12:07:29 -08:00
Kubernetes Prow Robot	4f979c9db8	Merge pull request #129010 from ffromani/e2e-fix-device-plugin-reboot-test node: e2e: fix device plugin reboot test	2025-01-23 12:07:22 -08:00
Sascha Grunert	da999fbc1b	Update CNI plugins to v1.6.2 and avoid using k8s-artifacts-cni bucket Updating the CNI plugins to the latest release and switch over to use GitHub releases instead of the `k8s-artifacts-cni` bucket. Follow-up on https://github.com/kubernetes/kubernetes/pull/129095 Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2025-01-23 10:50:58 +01:00
Kubernetes Prow Robot	a271299643	Merge pull request #129717 from esotsal/fix-128837 testing: Fix pod delete timeout failures after InPlacePodVerticalScaling Graduate to Beta commit	2025-01-21 15:50:47 -08:00
Kubernetes Prow Robot	0d988d7209	Merge pull request #129619 from ffromani/sig-node-approvers-ffromani Self-nominating ffromani as approver for sig-node container and resource managers	2025-01-21 15:50:36 -08:00
Kubernetes Prow Robot	3d2ee2fbb7	Merge pull request #129609 from carlory/cleanup-exec-utils Move some exec helper functions from framework/volume to framework/pod	2025-01-21 09:00:37 -08:00
Sotiris Salloumis	c5fc4193bb	Fix pod delete issues in podresize tests	2025-01-21 07:25:14 +01:00
Kevin Hannon	bae4122f56	deprecate nodefeature for feature labels	2025-01-20 17:02:59 -05:00
carlory	8b4eae24ab	Move some exec helper functions from framework/volume to framework/pod	2025-01-18 21:42:42 +08:00
Kubernetes Prow Robot	2d0a4f7556	Merge pull request #129166 from kannon92/move-node-features-to-features [KEP-3041]: deprecate nodefeature for feature labels	2025-01-14 20:02:33 -08:00
Francesco Romani	8221e28e4d	Add ffromani as approver for kubelet resource managers and their tests Signed-off-by: Francesco Romani <fromani@redhat.com>	2025-01-14 13:18:40 +01:00
Kevin Hannon	ca4529574e	remove node special feature typos	2024-12-20 16:33:45 -05:00
Kubernetes Prow Robot	4c466d8f98	Merge pull request #129095 from borg-land/cni-bucket-change fetch cni plugins from GitHub releases	2024-12-18 13:40:08 +01:00
Kevin Hannon	8495df64b2	deprecate nodefeature for feature labels	2024-12-17 13:58:12 -05:00
Kevin Hannon	6a608c3cdb	drop NodeSpecialFeature and NodeAlphaFeature from e2e-node	2024-12-16 09:29:04 -05:00
Kubernetes Prow Robot	5cc6f6633f	Merge pull request #129070 from zhifei92/fix-typo e2e_node: Simplify the code logic	2024-12-13 12:24:25 +01:00
Kubernetes Prow Robot	e8615e2712	Merge pull request #129054 from pohly/remove-import-name remove import doc comments	2024-12-12 09:58:35 +01:00
Kubernetes Prow Robot	c0862c3184	Merge pull request #129105 from carlory/sig-scheduling scheduling e2e tests: add feature-gate label when these tests depend feature-gate	2024-12-12 06:40:25 +00:00
carlory	060c653b53	scheduling e2e tests: add feature-gate label when these tests depend feature-gate	2024-12-06 17:22:43 +08:00
upodroid	dce863e5e6	fetch cni plugins from GitHub releases	2024-12-05 13:31:35 +03:00
Francesco Romani	29d26297a1	e2e: node: fix misleading device plugin test We have a e2e test which tries to ensure device plugin assignments to pods are kept across node reboots. And this tests is permafailing since many weeks at time of writing (xref: #128443). Problem is: closer inspection reveals the test was well intentioned, but puzzling: The test runs a pod, then restarts the kubelet, then _expects the pod to end up in admission failure_ and yet _ensure the device assignment is kept_! https://github.com/kubernetes/kubernetes/blob/v1.32.0-rc.0/test/e2e_node/device_plugin_test.go#L97 A reader can legitmately wonder if this means the device will be kept busy forever? This is not the case, luckily. The test however embodied the behavior at time of the kubelet, in turn caused by #103979 Device manager used to record the last admitted pod and forcibly added to the list of active pod. The retention logic had space for exactly one pod, the last which attempted admission. This retention prevented the cleanup code (see: https://github.com/kubernetes/kubernetes/blob/v1.32.0-rc.0/pkg/kubelet/cm/devicemanager/manager.go#L549 compare to: https://github.com/kubernetes/kubernetes/blob/v1.31.0-rc.0/pkg/kubelet/cm/devicemanager/manager.go#L549) to clear the registration, so the device was still (mis)reported allocated to the failed pod. This fact was in turn leveraged by the test in question: the test uses the podresources API to learn about the device assignment, and because of the chain of events above the pod failed admission yet was still reported as owning the device. What happened however was the next pod trying admission would have replaced the previous pod in the device manager data, so the previous pod was no longer forced to be added into the active list, so its assignment were correctly cleared once the cleanup code runs; And the cleanup code is run, among other things, every time device manager is asked to allocated devices and every time podresources API queries the device assignment Later, in PR https://github.com/kubernetes/kubernetes/pull/120661 the forced retention logic was removed from all the resource managers, thus also from device manager, and this is what caused the permafailure. Because all of the above, it should be evident that the e2e test was actually enforcing a very specific and not really work-as-intended behavior, which was also overall quite puzzling for users. The best we can do is to fix the test to record and ensure that pods which did fail admission _do not_ retain device assignment. Unfortunately, we _cannot_ guarantee the desirable property that pod going running retain their device assignment across node reboots. In the kubelet restart flow, all pods race to be admitted. There's no order enforced between device plugin pods and application pods. Unless an application pod is lucky enough to _lose_ the race with both the device plugin (to go running before the app pod does) and _also_ with the kubelet (which needs to set devices healthy before the pod tries admission). Signed-off-by: Francesco Romani <fromani@redhat.com>	2024-12-04 17:06:27 +01:00
zhifei92	cb74323e07	refactor: Simplify the code logic.	2024-12-03 20:31:09 +08:00
Patrick Ohly	8a908e0c0b	remove import doc comments The "// import <path>" comment has been superseded by Go modules. We don't have to remove them, but doing so has some advantages: - They are used inconsistently, which is confusing. - We can then also remove the (currently broken) hack/update-vanity-imports.sh. - Last but not least, it would be a first step towards avoiding the k8s.io domain. This commit was generated with sed -i -e 's;^package $.$ // import.;package \1;' $(git grep -l '^package.*// import' \| grep -v 'vendor/') Everything was included, except for package labels // import k8s.io/kubernetes/pkg/util/labels because that package is marked as "read-only".	2024-12-02 16:59:34 +01:00
HirazawaUi	53e9f29d29	Fix kubelet e2e tests incorrect message	2024-12-01 22:45:29 +08:00

1 2 3 4 5 ...

3040 Commits