kubernetes

mirror of https://github.com/optim-enterprises-bv/kubernetes.git synced 2025-12-15 20:37:39 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	74f6c263d8	Merge pull request #118544 from sohankunkerkar/remove-sandbox-image-ref pkg/kubelet: allow sandbox image pinning from CRI	2023-09-11 11:52:12 -07:00
Kubernetes Prow Robot	aa4ec3c5b0	Merge pull request #119944 from Sharpz7/jm/backup-finalizers Adding backup code for removing finalizers to more Job End States.	2023-09-11 09:30:30 -07:00
Kensei Nakada	0d3eafdfa3	fix(scheduling_queue): always put Pods with no unschedulable plugins into activeQ/backoffQ (#119105 ) * always put Pods with no unschedulable plugins into activeQ/backoffQ * address review comments	2023-09-11 09:30:11 -07:00
Han Kang	e6435e98ed	promote component SLIs to GA; remove feature gates for component slis	2023-09-11 09:15:32 -07:00
Patrick Ohly	6f9140e421	DRA scheduler: stop allocating before deallocation This fixes a test flake: [sig-node] DRA [Feature:DynamicResourceAllocation] multiple nodes reallocation [It] works /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/dra.go:552 [FAILED] number of deallocations Expected <int64>: 2 to equal <int64>: 1 In [It] at: /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/dra.go:651 @ 09/05/23 14:01:54.652 This can be reproduced locally with stress -p 10 go test ./test/e2e -args -ginkgo.focus=DynamicResourceAllocation.reallocation.works -ginkgo.no-color -v=4 -ginkgo.v Log output showed that the sequence of events leading to this was: - claim gets allocated because of selected node - a different node has to be used, so PostFilter sets claim.status.deallocationRequested - the driver deallocates - before the scheduler can react and select a different node, the driver allocates again* for the original node - the scheduler asks for deallocation again - the driver deallocates again (causing the test failure) - eventually the pod runs The fix is to disable allocations first by removing the selected node and then starting to deallocate.	2023-09-11 10:56:17 +02:00
Rohit Singh	61ecc2ad88	Retry operations if CSI Driver Isn't Found by Treating this Error as Transient	2023-09-11 06:07:40 +00:00
Qiutong Song	d3eb082568	Create a node startup latency tracker Signed-off-by: Qiutong Song <songqt01@gmail.com>	2023-09-11 05:54:25 +00:00
pegasas	f446745777	Improve logging on kube-proxy exit	2023-09-11 00:50:29 +08:00
Kubernetes Prow Robot	49768134e5	Merge pull request #119754 from pbxqdown/kubelet-fix-typo Fix some typos in kubelet component source code	2023-09-09 19:36:11 -07:00
Kubernetes Prow Robot	33c5bd631d	Merge pull request #120008 from skitt/drop-intstr-ptr-wrappers Use ptr.To to retrieve intstr addresses	2023-09-09 07:24:09 -07:00
Kubernetes Prow Robot	41689233b4	Merge pull request #120334 from pohly/scheduler-clear-unschedulable-plugins scheduler: avoid false "unschedulable" pod state	2023-09-08 12:01:23 -07:00
Alexander Zielenski	f135eed37b	update codegen	2023-09-08 09:49:35 -07:00
Aleksandra Malinowska	d7264d0af0	Make StatefulSet restart pods with phase Succeeded	2023-09-08 17:47:17 +02:00
Patrick Ohly	4e73634b53	scheduler: start scheduling attempt with clean UnschedulablePlugins When some plugin was registered as "unschedulable" in some previous scheduling attempt, it kept that attribute for a pod forever. When that plugin then later failed with an error that requires backoff, the pod was incorrectly moved to the "unschedulable" queue where it got stuck until the periodic flushing because there was no event that the plugin was waiting for. Here's an example where that happened: framework.go:1280: E0831 20:03:47.184243] Reserve/DynamicResources: Plugin failed err="Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" node="scheduler-perf-dra-7l2v2" plugin="DynamicResources" pod="test/test-dragxd5c" schedule_one.go:1001: E0831 20:03:47.184345] Error scheduling pod; retrying err="running Reserve plugin \"DynamicResources\": Operation cannot be fulfilled on podschedulingcontexts.resource.k8s.io \"test-dragxd5c\": the object has been modified; please apply your changes to the latest version and try again" pod="test/test-dragxd5c" ... scheduling_queue.go:745: I0831 20:03:47.198968] Pod moved to an internal scheduling queue pod="test/test-dragxd5c" event="ScheduleAttemptFailure" queue="Unschedulable" schedulingCycle=9576 hint="QueueSkip" Pop still needs the information about unschedulable plugins to update the UnschedulableReason metric. It can reset that information before returning the PodInfo for the next scheduling attempt.	2023-09-08 16:52:36 +02:00
Sharpz7	7e4b5d0d49	Final Fix	2023-09-08 14:44:22 +00:00
Kubernetes Prow Robot	c084719291	Merge pull request #120509 from aojea/aojea_cp_owner add aojea as controplane reviewer	2023-09-08 02:48:26 -07:00
Kubernetes Prow Robot	a64a3e16ec	Merge pull request #120253 from pohly/dra-scheduler-podschedulingcontext-updates dra scheduler: refactor PodSchedulingContext updates	2023-09-08 02:48:14 -07:00
Stephen Kitt	aa89e6dc97	Use ptr.To to retrieve intstr addresses This uses the generic ptr.To in k8s.io/utils to replace functions and code constructs which only serve to return pointers to intstr values. Other uses of the deprecated pointer package are updated in modified files. Signed-off-by: Stephen Kitt <skitt@redhat.com>	2023-09-08 11:10:50 +02:00
Kubernetes Prow Robot	80cd9d7a9a	Merge pull request #120105 from princepereira/ppereira-kubeproxy-mock-tests New mock test framework for windows kubeproxy.	2023-09-08 00:32:14 -07:00
Sascha Grunert	5e0931336b	kubelet: fix metric container_start_time_seconds's timestamp Adapting the tests and reverting https://github.com/kubernetes/kubernetes/pull/103429 Carry-over from https://github.com/kubernetes/kubernetes/pull/117881 Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2023-09-08 09:13:37 +02:00
Patrick Ohly	5c7dac2d77	dra scheduler: refactor PodSchedulingContext updates Instead of modifying the PodSchedulingContext and then creating or updating it, now the required changes (selected node, potential nodes) are tracked and the actual input for an API call is created if (and only if) needed at the end. This makes the code easier to read and change. In particular, replacing the Update call with Patch or Apply is easy.	2023-09-08 08:06:06 +02:00
Prince Pereira	7dea3d6c3b	New mock test framework for windows kubeproxy.	2023-09-08 08:38:46 +05:30
Antonio Ojea	3a1a67e33d	add aojea as controplane reviewer Change-Id: Ie1aa38791c1cf1399c762120e687fedd360f6067	2023-09-07 21:26:32 +00:00
Kubernetes Prow Robot	440eb7eadb	Merge pull request #119495 from bzsuni/cleanup/api/legacyBetaEnabledByDefaultResources remove resource flowschemas and prioritylevelconfigurations from legacyBetaEnabledByDefaultResources in v1.29	2023-09-07 08:10:58 -07:00
Kubernetes Prow Robot	58ce734223	Merge pull request #120255 from likakuli/feat-addreferenceonlyfirsttime feat: minimize unnecessary API requests to the API server for the configmap/secret get API	2023-09-07 06:42:57 -07:00
Kubernetes Prow Robot	2d5b6f16f5	Merge pull request #120213 from pohly/dra-scheduler-resourceclass-missing dra: resourceclass missing	2023-09-06 23:47:09 -07:00
Kubernetes Prow Robot	b27670dfbd	Merge pull request #118740 from saschagrunert/kubelet-label-types Make kubelet label types public	2023-09-06 23:46:57 -07:00
Francesco Romani	2ea47038b9	podresources: e2e: force eager connection Add and use more facilities to the internal podresources client. Checking e2e test runs, we have quite some ``` rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused" ``` This is likely caused by kubelet restarts, which we do plenty in e2e tests, combined with the fact gRPC does lazy connection AND we don't really check the errors in client code - we just bubble them up. While it's arguably bad we don't check properly error codes, it's also true that in the main case, e2e tests, the functions should just never fail besides few well known cases, we're connecting over a super-reliable unix domain socket after all. So, we centralize the fix adding a function (alongside with minor cleanups) which wants to trigger and ensure the connection happens, localizing the changes just here. The main advantage is this approach is opt-in, composable, and doesn't leak gRPC details into the client code. Signed-off-by: Francesco Romani <fromani@redhat.com>	2023-09-07 08:24:49 +02:00
Patrick Ohly	c682d2b8c5	scheduler: add ResourceClass events When filtering fails because a ResourceClass is missing, we can treat the pod as "unschedulable" as long as we then also register a cluster event that wakes up the pod. This is more efficient than periodically retrying.	2023-09-06 11:14:08 +02:00
Kubernetes Prow Robot	abb74c7afa	Merge pull request #120412 from aojea/proxy_invalid only drop invalid cstate packets if non liberal	2023-09-05 21:32:50 -07:00
Sharpz7	43fc6b5bdb	Added suggests changes	2023-09-06 03:05:14 +00:00
Kubernetes Prow Robot	debe30de70	Merge pull request #120281 from gjkim42/feature-gate-sidecar-containers-in-kuberuntime Feature-gate SidecarContainers code in pkg/kubelet/kuberuntime	2023-09-05 18:34:54 -07:00
Kubernetes Prow Robot	a7f9e70384	Merge pull request #120413 from pohly/scheduler-in-flight-events-fix scheduler: fix tracking of concurrent events	2023-09-05 15:17:03 -07:00
Kubernetes Prow Robot	f68c66f96d	Merge pull request #119142 from aramase/aramase/f/kep_3331_add_feature_flag [StructuredAuthenticationConfig] Add feature flag and wire up `--authentication-config` flag	2023-09-05 13:08:51 -07:00
Patrick Ohly	c131c92b9f	scheduler: unit test case for concurrent event with other pod The problematic scenario was having one pod in flight, one event in the list, and then detecting a concurrent event for a second pod after the first pod is done. The new test case covers that. To make it work without assumptions about the implementation, the QueuedPodInfo returned by Pop must be the one passed to AddUnschedulableIfNotPresent after (potentially) populating UnschedulablePlugins. This is done via callback functions which bind to the same shared variable.	2023-09-05 21:01:13 +02:00
Kubernetes Prow Robot	73580b2038	Merge pull request #120336 from pohly/dra-generated-name-hyphen resource claim controller: separate generated suffix from base	2023-09-05 11:22:51 -07:00
Patrick Ohly	cd943dd95e	scheduler: fix tracking of concurrent events The previous approach was based on the assumption that an in-flight pod can use the head of the received event list as marker for identifying all events that occur while the pod is in flight. That assumption is incorrect: when that existing element gets removed from the list because all pods that were in-flight when it was received are done, that marker's Next method returns nil and the code which should have seen several concurrent events (if there were any) missed all of those. As a result, a pod with concurrent events could incorrectly get moved to the unschedulable queue where it could got stuck until the next periodic purging after 5 minutes if there was no other event for it. The approach with maintaining a single list of concurrent events can be fixed by inserting each in-flight pod into the list and using that element to identify "more recent" events for the pod.	2023-09-05 19:58:38 +02:00
Antonio Ojea	933bcc123b	only drop invalid cstate packets if non liberal Conntrack invalid packets may cause unexpected and subtle bugs on esblished connections, because of that we install by default an iptables rules that drops the packets with this conntrack state. However, there are network scenarios, specially those that use multihoming nodes, that may have legit traffic that is detected by conntrack as invalid, hence these iptables rules are causing problems dropping this traffic. An alternative to solve the spurious problems caused by the invalid connectrack packets is to set the sysctl nf_conntrack_tcp_be_liberal option, but this is a system wide setting and we don't want kube-proxy to be opinionated about the whole node networking configuration. Kube-proxy will only install the DROP rules for invalid conntrack states if the nf_conntrack_tcp_be_liberal is not set. Change-Id: I5eb326931ed915f5ae74d210f0a375842b6a790e	2023-09-05 14:16:17 +00:00
Kubernetes Prow Robot	8e2b12a220	Merge pull request #119068 from lauchokyip/podgc-unit-test added podgc orphaned pod unit tests	2023-09-05 03:19:49 -07:00
Kubernetes Prow Robot	9fca4ec44a	Merge pull request #120399 from SataQiu/clean-scheduler-20230904 scheduler: remove unused constant SchedulerPolicyConfigMapKey	2023-09-05 00:05:52 -07:00
bzsuni	14e7d97151	Remove GA featuregate for JobTrackingWithFinalizers in 1.28 Signed-off-by: bzsuni <bingzhe.sun@daocloud.io>	2023-09-05 14:25:02 +08:00
bzsuni	7c33b78418	remove resource flowschemas and prioritylevelconfigurations for legacyBetaEnabledByDefaultResources in v1.29 Signed-off-by: bzsuni <bingzhe.sun@daocloud.io>	2023-09-04 21:54:51 +08:00
Kubernetes Prow Robot	5d94b2a8e8	Merge pull request #118709 from ty-dc/pr/ut [UT] add ut for pkg/registry/networking/ipaddress	2023-09-04 02:49:48 -07:00
SataQiu	cae090e7fe	scheduler: remove unused constant SchedulerPolicyConfigMapKey	2023-09-04 17:48:36 +08:00
Patrick Ohly	3c2cfd9a4f	resource claim controller: separate generated suffix from base When the resource claim name inside the pod had some suffix like "1a" in "resource-1a", the generated name suffix got added directly after that, leading to "my-pod-resource-1ax6zgt". Adding another hyphen makes the result more readable: "my-pod-resource-1a-x6zgt".	2023-09-04 09:45:25 +02:00
Kubernetes Prow Robot	d4050a80c7	Merge pull request #119394 from aroradaman/fix/proxy-conntrack Fix stale conntrack flow detection logic	2023-09-03 14:53:46 -07:00
Kubernetes Prow Robot	03762cbcb5	Merge pull request #120316 from dims/move-to-new-repo-for-reference New repo who dis? distribution/reference	2023-09-02 21:05:11 -07:00
Davanum Srinivas	ceaed508ce	Validate the cloud-provider passed in and the corresponding feature flags Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2023-09-02 13:08:04 -04:00
Daman Arora	2e5f17166b	pkg/proxy: fix stale detection logic Signed-off-by: Daman Arora <aroradaman@gmail.com>	2023-09-02 12:45:19 +05:30
Davanum Srinivas	42e8cfa28a	fix failing metadata test Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2023-09-01 15:22:07 -04:00

... 2 3 4 5 6 ...

48319 Commits