kubernetes

mirror of https://github.com/outbackdingo/kubernetes.git synced 2026-01-28 10:19:31 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	a26f3fd5c6	Merge pull request #132109 from linxiulei/jobdelay Clean backoff record earlier	2025-06-06 13:38:38 -07:00
Eric Lin	1f46b3fdbf	Clean backoff record earlier Once received job deletion event, it cleans the backoff records for that job before enqueueing this job so that we can avoid a race condition that the syncJob() may incorrect use stale backoff records for a newly created job with same key. Co-authored-by: Michal Wozniak <michalwozniak@google.com>	2025-06-06 18:31:38 +00:00
Kubernetes Prow Robot	a883be6e36	Merge pull request #132031 from atiratree/update-getRSPods add orphanedPods parameter to getRSPods and improve code flow in syncReplicaSet	2025-06-03 12:10:39 -07:00
Kubernetes Prow Robot	62f72addf2	Merge pull request #120816 from tnqn/fix-unreachable-taint-delay NoExecute taint should be added when a Node's ready condition becomes Unknown	2025-06-03 00:54:44 -07:00
Filip Křepinský	aac00c1f0e	add orphanedPods parameter to getRSPods and improve code flow in syncReplicaSet	2025-05-29 10:50:32 +02:00
Antonio Ojea	b9fec8bf4f	fix scheme import Change-Id: I9a94c06b931031a1c2391184342fd5ffa79e3128	2025-05-15 13:46:48 +00:00
Kubernetes Prow Robot	b587977f7c	Merge pull request #131445 from natasha41575/renameObservedGenHelperFns update godoc for and rename observedGeneration helpers	2025-05-14 11:39:19 -07:00
Kubernetes Prow Robot	1325262b5f	Merge pull request #130961 from hakuna-matatah/rs Optimize RS Controller Performance: Reduce Work Duration Time & Minimize Cache Locking	2025-05-13 08:43:15 -07:00
Kubernetes Prow Robot	b8d9c12d1b	Merge pull request #131330 from aojea/servicecidr_fixes servicecidr: only patch status if necessary	2025-05-12 17:53:16 -07:00
Harish Kuna	e42aba6c0c	Optimize RS Controller Performance: Reduce Work Duration Time & Minimize Cache Locking	2025-05-12 19:56:46 +00:00
Quan Tian	f718096b74	NoExecute taint should be added when a Node's ready condition becomes Unknown After a Node has stopped posting heartbeats for nodeMonitorGracePeriod, it will be considered unreachable, its ready condition will be set to Unknown, NoSchedule taint will be added, all Pods on it will be set to NotReady, but there is always a delay of 5s before NoExecute taint is added to the Node, adding 5s to the recovery time of Pods which are supposed to be evicted by the taint and recreated on other Nodes sooner. The delay is because processTaintBaseEviction() uses the last observed ready condition of the Node instead of the current one to determine whether it should add the Node to the taint queue. When a Node is set to unreachable due to missing heartbeats, the last observed ready condition is still true and the current ready condition is unknown, we should use the latter for processTaintBaseEviction(). Signed-off-by: Quan Tian <qtian@vmware.com>	2025-05-10 17:22:11 +08:00
Kubernetes Prow Robot	fa10ea63a6	Merge pull request #127050 from omerap12/podautoscaler-ExternalPerpodMetricReplicas-intmax HPA: Fix int overflow in GetExternalPerPodMetricReplicas	2025-05-09 13:37:14 -07:00
Omer Aplatony	af1d60f30b	Add hpa reviewers Signed-off-by: Omer Aplatony <omerap12@gmail.com>	2025-05-07 18:16:15 +00:00
Omer Aplatony	0acc7bd4dc	HPA: Fix int overflow in GetExternalPerPodMetricReplicas Signed-off-by: Omer Aplatony <omerap12@gmail.com>	2025-05-07 16:26:27 +00:00
Kubernetes Prow Robot	d2507bb01a	Merge pull request #130806 from hakuna-matatah/master Optimize Statefulset Controller Performance: Reduce Work Duration Time & Minimize Cache Locking.	2025-05-06 06:03:13 -07:00
Kubernetes Prow Robot	0b8133816b	Merge pull request #131477 from pohly/golangci-lint@v2 golangci-lint v2	2025-05-02 23:03:55 -07:00
Jordan Liggitt	6bb6c99342	Drop null creationTimestamp from test fixtures	2025-05-02 15:38:40 -04:00
Matthieu MOREL	4adb58565c	chore: bump golangci-lint to v2 Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2025-05-02 12:51:02 +02:00
Antonio Ojea	56e533f4a0	servicecidr: only patch status if necessary Change-Id: I1fadec3e48bd3cb734658b8bfca58bb80ab911b9	2025-05-02 08:26:17 +00:00
Kubernetes Prow Robot	fe5afa919b	Merge pull request #130333 from kmala/job handle job complete update delayed event	2025-04-25 17:55:22 -07:00
Natasha Sarkar	92359cdc69	update godoc for and rename observedGeneration helpers	2025-04-24 16:05:01 +00:00
Kubernetes Prow Robot	c59203e051	Merge pull request #121967 from torredil/update-logging Update log verbosity for node health and taint checks	2025-04-24 06:22:34 -07:00
Keerthan Reddy Mala	d4fd41285b	update the log message to reflect success and failed jobs	2025-04-08 10:21:02 -07:00
Keerthan Reddy Mala	551f3c7824	merge the integration tests into a single one	2025-04-07 17:37:19 -07:00
Keerthan Reddy Mala	c7d0ed5c48	add integration test for job failure event delay and remove the unit test	2025-04-01 12:38:15 -07:00
Filip Křepinský	8db1426554	rename DeploymentPodReplacementPolicy FG to DeploymentReplicaSetTerminatingReplicas	2025-03-27 20:27:44 +01:00
Jean-Marc François	2dd9eda47f	Add configurable tolerance logic.	2025-03-21 18:48:37 -04:00
Harish Kuna	c005b85d4d	Reduce locking duration on cache to fetch data from Cache	2025-03-21 15:23:08 +00:00
Keerthan Reddy Mala	1b8bbcac44	Add integration test	2025-03-20 15:04:44 -07:00
Kubernetes Prow Robot	b0d6079ddc	Merge pull request #130947 from pohly/dra-device-taints-flake DRA device taints: fix some race conditions	2025-03-20 14:16:55 -07:00
Kubernetes Prow Robot	dca334e350	Merge pull request #130859 from hakuna-matatah/optimize-ds Optimize DS Controller Performance: Reduce Work Duration Time & Minimize Cache Locking.	2025-03-20 14:16:39 -07:00
Patrick Ohly	cfb9486417	DRA taint eviction: avoid nil panic The timed worker queue actually can have nil entries in its map if the work was kicked off immediately. This looks like an unnecessary special case (it would be fine to call AfterFunc with a duration <= 0 and it would do the right thing), but to avoid more sweeping changes the fix consists of documenting this special behavior and adding a nil check.	2025-03-20 19:49:54 +01:00
Patrick Ohly	56adcd06f3	DRA device eviction: fix eviction triggered by pod scheduling Normally the scheduler shouldn't schedule when there is a taint, but perhaps it didn't know yet. The TestEviction/update test covered this, but only failed under the right timing conditions. The new event handler test case covers it reliably.	2025-03-20 19:49:54 +01:00
Patrick Ohly	5856d3ee6f	DRA taint eviction: fix waiting in unit test Events get recorded in the apiserver asynchronously, so even if the test knows that the event has been evicted because the pod is deleted, it still has to also check for the event to be recorded. This caused a flake in the "Consistently" check of events.	2025-03-20 17:59:48 +01:00
Patrick Ohly	ac6e47cb14	DRA taint eviction: improve error handling There was one error path that led to a "controller has shut down" log message. Other errors caused different log entries or are so unlikely (event handler registration failure!) that they weren't checked at all. It's clearer to let Run return an error in all cases and then log the "controller has shut down" error at the call site. This also enables tests to mark themselves as failed, should that ever happen.	2025-03-20 17:59:06 +01:00
Harish Kuna	a67cc3aac1	Reduce locking duration on cache to fetch data in DaemonSet Controller	2025-03-20 16:00:27 +00:00
Kubernetes Prow Robot	68ba091fca	Merge pull request #130844 from danwinship/improved-traffic-distribution KEP-3015 PreferSameZone/PreferSameNode traffic distribution	2025-03-19 13:00:48 -07:00
Kubernetes Prow Robot	ab3cec0701	Merge pull request #130447 from pohly/dra-device-taints device taints and tolerations (KEP 5055)	2025-03-19 13:00:32 -07:00
Kubernetes Prow Robot	2b79593ece	Merge pull request #130225 from ritazh/dra-admin-access-namespace DRA: AdminAccess validate based on namespace label	2025-03-19 10:18:50 -07:00
Dan Winship	19952a2b7b	Implement the EndpointSlice controller side of PreferSameZone/PreferSameNode	2025-03-19 08:39:13 -04:00
Patrick Ohly	9f161590be	metrics testing: add type aliases to avoid direct prometheus imports In tests it is sometimes unavoidable to use the Prometheus types directly, for example when writing a custom gatherer which needs to normalize data before testing it. device_taint_eviction_test.go does this to strip out unpredictable data in a histogram. With type aliases in a package that is explicitly meant for tests we can avoid adding exceptions for such tests to the global exception list.	2025-03-19 09:18:38 +01:00
Patrick Ohly	a027b439e5	DRA: add device taint eviction controller The controller is derived from the node taint eviction controller. In contrast to that controller it tracks the UID of pods to prevent deleting the wrong pod when it got replaced.	2025-03-19 09:18:38 +01:00
Rita Zhang	0301e5a9f8	DRA: AdminAccess validate based on namespace label Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>	2025-03-18 22:56:54 -07:00
Kubernetes Prow Robot	a6227695ab	Merge pull request #128402 from richabanker/mvp-agg-discovery KEP 4020: Replace StorageVersionAPI with aggregated discovery to fetch served resources by a peer apiserver	2025-03-18 21:43:49 -07:00
Kubernetes Prow Robot	9f8a84930d	Merge pull request #130573 from natasha41575/pod-conditions [FG:PodObservedGenerationTracking] kubelet sets observedGeneration on pod conditions	2025-03-18 20:34:08 -07:00
Kubernetes Prow Robot	fe60c4316e	Merge pull request #130514 from xigang/daemonset Add workqueue for node updates in DaemonSetController	2025-03-18 13:52:04 -07:00
Richa Banker	8b2cee83c1	Replace StorageVersion API with aggregated discovery to fetch served resources by a peer for MVP Co-authored-by: Joe Betz <jpbetz@google.com> Co-authored-by: Jordan Liggitt <jordan@liggitt.net>	2025-03-18 13:27:27 -07:00
Patrick Ohly	13d04d4a92	DRA device taints: copy taintseviction controller This is a verbatim copy of the current pkg/controller/taintseviction code, revision `fc268ecd09` (v1.33.0 plus one commit), minus the TimedWorker helper. The intent is to modify the code such that it enforces eviction of pods which use tainted devices.	2025-03-18 20:52:54 +01:00
Eddie Torres	c766a52356	Implement KEP 4876 Mutable CSINode (#130007 ) * Implement KEP-4876 Mutable CSINode Allocatable Count Signed-off-by: torredil <torredil@amazon.com> * Update TestGetNodeAllocatableUpdatePeriod Signed-off-by: torredil <torredil@amazon.com> * Implement CSINodeUpdater Signed-off-by: torredil <torredil@amazon.com> * Use sync.Once in csiNodeUpdater Signed-off-by: torredil <torredil@amazon.com> * ImVerify driver is installed before running periodic updates Signed-off-by: torredil <torredil@amazon.com> * Update NodeAllocatableUpdatePeriodSeconds type comment Signed-off-by: torredil <torredil@amazon.com> * Leverage apivalidation.ValidateImmutableField in ValidateCSINodeUpdate Signed-off-by: torredil <torredil@amazon.com> * Update strategy functions Signed-off-by: torredil <torredil@amazon.com> * Run hack/update-openapi-spec.sh Signed-off-by: torredil <torredil@amazon.com> * Update VolumeError.ErrorCode field Signed-off-by: torredil <torredil@amazon.com> * CSINodeUpdater improvements Signed-off-by: torredil <torredil@amazon.com> * Iron out concurrency in syncDriverUpdater Signed-off-by: torredil <torredil@amazon.com> * Run hack/update-openapi-spec.sh Signed-off-by: torredil <torredil@amazon.com> * Revise logging Signed-off-by: torredil <torredil@amazon.com> * Revise log in VerifyExhaustedResource Signed-off-by: torredil <torredil@amazon.com> * Update API validation Signed-off-by: torredil <torredil@amazon.com> * Add more code coverage Signed-off-by: torredil <torredil@amazon.com> * Fix pull-kubernetes-linter-hints Signed-off-by: torredil <torredil@amazon.com> * Update API types documentation Signed-off-by: torredil <torredil@amazon.com> * Update strategy and validation for new errorCode field Signed-off-by: torredil <torredil@amazon.com> * Update validation tests after strategy changes Signed-off-by: torredil <torredil@amazon.com> * Update VA status strategy Signed-off-by: torredil <torredil@amazon.com> --------- Signed-off-by: torredil <torredil@amazon.com>	2025-03-18 12:45:49 -07:00
xigang	aa32537e9a	Add workqueue for node updates in DaemonSetController Signed-off-by: xigang <wangxigang2014@gmail.com>	2025-03-19 01:09:44 +08:00

1 2 3 4 5 ...

6834 Commits