kubernetes

mirror of https://github.com/outbackdingo/kubernetes.git synced 2026-01-27 18:19:28 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	491a23f079	Merge pull request #129999 from pohly/test-e2e-node-timeout E2E node: fix --timeout default	2025-02-06 03:59:55 -08:00
Patrick Ohly	46a17f60e4	E2E node: fix --timeout default For unknown reasons, hack/make-rules/test-e2e-node.sh adds -timeout instead of --timeout. Therefore the fallback code in test/e2e_node/remote/remote.go didn't find it and added its own --timeout=60m after it. This effectively limits E2E node test runs to 60 minutes, regardless of what is specified in the job: W0206 09:53:51.425532 7151 remote.go:158] ginkgo flags are missing explicit --timeout (ginkgo defaults to 60 minutes) I0206 09:53:51.425565 7151 remote.go:165] updated ginkgo flags: -timeout=24h --label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky && !Slow" --no-color -v --timeout=60m ... I0206 09:53:57.767096 7151 ssh.go:146] Running the command ssh, with args: ... timeout -k 30s 3600.000000s ./ginkgo -timeout=24h --label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky && !Slow" --no-color -v --timeout=60m ... Note that the timeout for the test was 60m in this case (hence the "timeout -k 30s 3600.000000s") but it could also be something larger.	2025-02-06 11:45:12 +01:00
Kubernetes Prow Robot	c4434c3161	Merge pull request #129910 from bitoku/fix-129836 Fix flaky test for container life cycle	2025-02-04 16:23:09 -08:00
Kubernetes Prow Robot	f82439f536	Merge pull request #129486 from iholder101/bugfix/swap-container-cri-stats [KEP-2400] [Bugfix]: Ensure container-level swap metrics are collected	2025-02-04 08:14:59 -08:00
Kubernetes Prow Robot	a376ae5dad	Merge pull request #128845 from SergeyKanzhelev/staticPodUpgrade static pod upgrade test with hostNetwork	2025-02-03 23:30:58 -08:00
Vinayak Goyal	81f09811ca	Fix kubelet_authz_test.go	2025-01-31 15:38:18 +00:00
Ayato Tokubi	da5a76bd39	Fix flaky test for container life cycle Signed-off-by: Ayato Tokubi <atokubi@redhat.com>	2025-01-30 16:23:51 +00:00
Vinayak Goyal	ce7d2130ad	Fix kubelet_authz_test.go	2025-01-29 23:06:56 +00:00
Swati Sehgal	82f0303f89	node: e2e: Remove flaky label as device plugin reboot test is deflaked With the device plugin node reboot test fixed, we can see in testgrid [node-kubelet-containerd-flaky](https://testgrid.k8s.io/sig-node-containerd#node-kubelet-containerd-flaky) that the test is passing consitently and we can remove the flaky label. With the test not flaky anymore, we can validate new PRs against it and ensure we don't cause regressions. Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2025-01-29 11:12:40 +00:00
Kubernetes Prow Robot	48dce2e9b3	Merge pull request #129776 from saschagrunert/cni-plugins-1.6.2 Update CNI plugins to v1.6.2 and avoid using k8s-artifacts-cni bucket	2025-01-28 07:29:26 -08:00
Kubernetes Prow Robot	2bda5dd8c7	Merge pull request #129656 from vinayakankugoyal/kep2862beta KEP-2862: Graduate to BETA.	2025-01-27 19:05:23 -08:00
Itamar Holder	617c094435	Add an e2e test Signed-off-by: Itamar Holder <iholder@redhat.com>	2025-01-27 15:44:18 +02:00
Vinayak Goyal	3a780a1c1b	KEP-2862: Graduate to BETA.	2025-01-24 21:36:00 +00:00
Kubernetes Prow Robot	29bf17b6cf	Merge pull request #129168 from kannon92/drop-node-features [KEP-3041] - remove nodefeatures from k/k repo	2025-01-23 12:07:29 -08:00
Kubernetes Prow Robot	4f979c9db8	Merge pull request #129010 from ffromani/e2e-fix-device-plugin-reboot-test node: e2e: fix device plugin reboot test	2025-01-23 12:07:22 -08:00
Sascha Grunert	da999fbc1b	Update CNI plugins to v1.6.2 and avoid using k8s-artifacts-cni bucket Updating the CNI plugins to the latest release and switch over to use GitHub releases instead of the `k8s-artifacts-cni` bucket. Follow-up on https://github.com/kubernetes/kubernetes/pull/129095 Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2025-01-23 10:50:58 +01:00
Kubernetes Prow Robot	a271299643	Merge pull request #129717 from esotsal/fix-128837 testing: Fix pod delete timeout failures after InPlacePodVerticalScaling Graduate to Beta commit	2025-01-21 15:50:47 -08:00
Kubernetes Prow Robot	0d988d7209	Merge pull request #129619 from ffromani/sig-node-approvers-ffromani Self-nominating ffromani as approver for sig-node container and resource managers	2025-01-21 15:50:36 -08:00
Kubernetes Prow Robot	3d2ee2fbb7	Merge pull request #129609 from carlory/cleanup-exec-utils Move some exec helper functions from framework/volume to framework/pod	2025-01-21 09:00:37 -08:00
Sotiris Salloumis	c5fc4193bb	Fix pod delete issues in podresize tests	2025-01-21 07:25:14 +01:00
Kevin Hannon	bae4122f56	deprecate nodefeature for feature labels	2025-01-20 17:02:59 -05:00
carlory	8b4eae24ab	Move some exec helper functions from framework/volume to framework/pod	2025-01-18 21:42:42 +08:00
Kubernetes Prow Robot	2d0a4f7556	Merge pull request #129166 from kannon92/move-node-features-to-features [KEP-3041]: deprecate nodefeature for feature labels	2025-01-14 20:02:33 -08:00
Francesco Romani	8221e28e4d	Add ffromani as approver for kubelet resource managers and their tests Signed-off-by: Francesco Romani <fromani@redhat.com>	2025-01-14 13:18:40 +01:00
Kevin Hannon	ca4529574e	remove node special feature typos	2024-12-20 16:33:45 -05:00
Kubernetes Prow Robot	4c466d8f98	Merge pull request #129095 from borg-land/cni-bucket-change fetch cni plugins from GitHub releases	2024-12-18 13:40:08 +01:00
Kevin Hannon	8495df64b2	deprecate nodefeature for feature labels	2024-12-17 13:58:12 -05:00
Kevin Hannon	6a608c3cdb	drop NodeSpecialFeature and NodeAlphaFeature from e2e-node	2024-12-16 09:29:04 -05:00
Kubernetes Prow Robot	5cc6f6633f	Merge pull request #129070 from zhifei92/fix-typo e2e_node: Simplify the code logic	2024-12-13 12:24:25 +01:00
Kubernetes Prow Robot	e8615e2712	Merge pull request #129054 from pohly/remove-import-name remove import doc comments	2024-12-12 09:58:35 +01:00
Kubernetes Prow Robot	c0862c3184	Merge pull request #129105 from carlory/sig-scheduling scheduling e2e tests: add feature-gate label when these tests depend feature-gate	2024-12-12 06:40:25 +00:00
carlory	060c653b53	scheduling e2e tests: add feature-gate label when these tests depend feature-gate	2024-12-06 17:22:43 +08:00
upodroid	dce863e5e6	fetch cni plugins from GitHub releases	2024-12-05 13:31:35 +03:00
Francesco Romani	29d26297a1	e2e: node: fix misleading device plugin test We have a e2e test which tries to ensure device plugin assignments to pods are kept across node reboots. And this tests is permafailing since many weeks at time of writing (xref: #128443). Problem is: closer inspection reveals the test was well intentioned, but puzzling: The test runs a pod, then restarts the kubelet, then _expects the pod to end up in admission failure_ and yet _ensure the device assignment is kept_! https://github.com/kubernetes/kubernetes/blob/v1.32.0-rc.0/test/e2e_node/device_plugin_test.go#L97 A reader can legitmately wonder if this means the device will be kept busy forever? This is not the case, luckily. The test however embodied the behavior at time of the kubelet, in turn caused by #103979 Device manager used to record the last admitted pod and forcibly added to the list of active pod. The retention logic had space for exactly one pod, the last which attempted admission. This retention prevented the cleanup code (see: https://github.com/kubernetes/kubernetes/blob/v1.32.0-rc.0/pkg/kubelet/cm/devicemanager/manager.go#L549 compare to: https://github.com/kubernetes/kubernetes/blob/v1.31.0-rc.0/pkg/kubelet/cm/devicemanager/manager.go#L549) to clear the registration, so the device was still (mis)reported allocated to the failed pod. This fact was in turn leveraged by the test in question: the test uses the podresources API to learn about the device assignment, and because of the chain of events above the pod failed admission yet was still reported as owning the device. What happened however was the next pod trying admission would have replaced the previous pod in the device manager data, so the previous pod was no longer forced to be added into the active list, so its assignment were correctly cleared once the cleanup code runs; And the cleanup code is run, among other things, every time device manager is asked to allocated devices and every time podresources API queries the device assignment Later, in PR https://github.com/kubernetes/kubernetes/pull/120661 the forced retention logic was removed from all the resource managers, thus also from device manager, and this is what caused the permafailure. Because all of the above, it should be evident that the e2e test was actually enforcing a very specific and not really work-as-intended behavior, which was also overall quite puzzling for users. The best we can do is to fix the test to record and ensure that pods which did fail admission _do not_ retain device assignment. Unfortunately, we _cannot_ guarantee the desirable property that pod going running retain their device assignment across node reboots. In the kubelet restart flow, all pods race to be admitted. There's no order enforced between device plugin pods and application pods. Unless an application pod is lucky enough to _lose_ the race with both the device plugin (to go running before the app pod does) and _also_ with the kubelet (which needs to set devices healthy before the pod tries admission). Signed-off-by: Francesco Romani <fromani@redhat.com>	2024-12-04 17:06:27 +01:00
zhifei92	cb74323e07	refactor: Simplify the code logic.	2024-12-03 20:31:09 +08:00
Patrick Ohly	8a908e0c0b	remove import doc comments The "// import <path>" comment has been superseded by Go modules. We don't have to remove them, but doing so has some advantages: - They are used inconsistently, which is confusing. - We can then also remove the (currently broken) hack/update-vanity-imports.sh. - Last but not least, it would be a first step towards avoiding the k8s.io domain. This commit was generated with sed -i -e 's;^package $.$ // import.;package \1;' $(git grep -l '^package.*// import' \| grep -v 'vendor/') Everything was included, except for package labels // import k8s.io/kubernetes/pkg/util/labels because that package is marked as "read-only".	2024-12-02 16:59:34 +01:00
HirazawaUi	53e9f29d29	Fix kubelet e2e tests incorrect message	2024-12-01 22:45:29 +08:00
Paco Xu	59dfb0e779	skip if cri proxy is disabled/undefined	2024-11-19 11:17:07 +08:00
Sergey Kanzhelev	a9c311b96a	static pod upgrade test with hostNetwork	2024-11-19 00:27:01 +00:00
Laura Lorenz	9ab0d81d76	Now that sleep is shorter, only expect to reach 3 within 30s Focused too much on the container restart one in commit that fixed that Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-13 01:39:58 +00:00
Laura Lorenz	59f9858086	Move function specific to container restart test inline Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 23:59:30 +00:00
Laura Lorenz	529d5ba9d3	Don't overly indirect image name Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 23:34:57 +00:00
Laura Lorenz	8e7b2af712	Use a better util Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 23:30:03 +00:00
Laura Lorenz	285d433dea	Clearer image pull test and utils Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 23:30:00 +00:00
Laura Lorenz	e03d0f60ef	Orient tests to run faster, but tolerate infra slowdowns up to 5 minutes Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 21:48:28 +00:00
Laura Lorenz	d293c5088f	Fix spelling Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 21:12:20 +00:00
Laura Lorenz	1da8ca816e	Extract restart number properly Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 20:00:11 +00:00
Laura Lorenz	2732d57e33	Missed refactor of container name here Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 19:50:11 +00:00
Laura Lorenz	e6059d7386	Fix typecheck and verify Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 19:48:38 +00:00
Laura Lorenz	f032068ef7	Focus on restart numbers instead of timing Signed-off-by: Laura Lorenz <lauralorenz@google.com>	2024-11-12 07:12:24 +00:00

1 2 3 4 5 ...

3027 Commits