6892 Commits

Author SHA1 Message Date
Filip Křepinský
b7d16fea7f disable terminatingReplicas reconciliation in ReplicationController 2025-05-30 21:08:12 +02:00
Filip Křepinský
aac00c1f0e add orphanedPods parameter to getRSPods
and improve code flow in syncReplicaSet
2025-05-29 10:50:32 +02:00
Antonio Ojea
b9fec8bf4f fix scheme import
Change-Id: I9a94c06b931031a1c2391184342fd5ffa79e3128
2025-05-15 13:46:48 +00:00
Kubernetes Prow Robot
b587977f7c Merge pull request #131445 from natasha41575/renameObservedGenHelperFns
update godoc for and rename observedGeneration helpers
2025-05-14 11:39:19 -07:00
carlory
fe1b1fff7c Remove unused GetHostIP method 2025-05-14 14:50:59 +08:00
Kubernetes Prow Robot
1325262b5f Merge pull request #130961 from hakuna-matatah/rs
Optimize RS Controller Performance: Reduce Work Duration Time & Minimize Cache Locking
2025-05-13 08:43:15 -07:00
Kubernetes Prow Robot
b8d9c12d1b Merge pull request #131330 from aojea/servicecidr_fixes
servicecidr: only patch status if necessary
2025-05-12 17:53:16 -07:00
Harish Kuna
e42aba6c0c Optimize RS Controller Performance: Reduce Work Duration Time & Minimize Cache Locking 2025-05-12 19:56:46 +00:00
Quan Tian
f718096b74 NoExecute taint should be added when a Node's ready condition becomes Unknown
After a Node has stopped posting heartbeats for nodeMonitorGracePeriod,
it will be considered unreachable, its ready condition will be set to
Unknown, NoSchedule taint will be added, all Pods on it will be set to
NotReady, but there is always a delay of 5s before NoExecute taint is
added to the Node, adding 5s to the recovery time of Pods which are
supposed to be evicted by the taint and recreated on other Nodes sooner.

The delay is because processTaintBaseEviction() uses the last observed
ready condition of the Node instead of the current one to determine
whether it should add the Node to the taint queue. When a Node is set to
unreachable due to missing heartbeats, the last observed ready condition
is still true and the current ready condition is unknown, we should use
the latter for processTaintBaseEviction().

Signed-off-by: Quan Tian <qtian@vmware.com>
2025-05-10 17:22:11 +08:00
Kubernetes Prow Robot
fa10ea63a6 Merge pull request #127050 from omerap12/podautoscaler-ExternalPerpodMetricReplicas-intmax
HPA: Fix int overflow in GetExternalPerPodMetricReplicas
2025-05-09 13:37:14 -07:00
Omer Aplatony
af1d60f30b Add hpa reviewers
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2025-05-07 18:16:15 +00:00
Omer Aplatony
0acc7bd4dc HPA: Fix int overflow in GetExternalPerPodMetricReplicas
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2025-05-07 16:26:27 +00:00
Kubernetes Prow Robot
d2507bb01a Merge pull request #130806 from hakuna-matatah/master
Optimize Statefulset Controller Performance: Reduce Work Duration Time & Minimize Cache Locking.
2025-05-06 06:03:13 -07:00
Kubernetes Prow Robot
0b8133816b Merge pull request #131477 from pohly/golangci-lint@v2
golangci-lint v2
2025-05-02 23:03:55 -07:00
Jordan Liggitt
6bb6c99342 Drop null creationTimestamp from test fixtures 2025-05-02 15:38:40 -04:00
Matthieu MOREL
4adb58565c chore: bump golangci-lint to v2
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-05-02 12:51:02 +02:00
Antonio Ojea
56e533f4a0 servicecidr: only patch status if necessary
Change-Id: I1fadec3e48bd3cb734658b8bfca58bb80ab911b9
2025-05-02 08:26:17 +00:00
Kubernetes Prow Robot
fe5afa919b Merge pull request #130333 from kmala/job
handle job complete update delayed event
2025-04-25 17:55:22 -07:00
Natasha Sarkar
92359cdc69 update godoc for and rename observedGeneration helpers 2025-04-24 16:05:01 +00:00
Kubernetes Prow Robot
c59203e051 Merge pull request #121967 from torredil/update-logging
Update log verbosity for node health and taint checks
2025-04-24 06:22:34 -07:00
Patrick Ohly
ff108e72a5 DRA device taints: fix rare unit test flake
TestCancelEviction flaked with a 0,01% rate because assumed that an event had
already been created once the pod was updated, but that was only true under
some timing conditions.
2025-04-17 17:16:23 +02:00
Patrick Ohly
ff2e6dddc8 DRA device taints: work around fake.ClientSet informer race
fake.Clientset suffers from a race condition related to informers:
it does not implement resource version support in its Watch
implementation and instead assumes that watches are set up
before further changes are made.

If a test waits for caches to be synced and then immediately
adds an object, that new object will never be seen by event handlers
if the race goes wrong and the Watch call hadn't completed yet
(can be triggered by adding a sleep before b53b9fb557/staging/src/k8s.io/client-go/tools/cache/reflector.go (L431)).

To work around this, we count all watches and only proceed when
all of them are in place. This replaces the normal watch reactor
(b53b9fb557/staging/src/k8s.io/client-go/kubernetes/fake/clientset_generated.go (L161-L173)).
2025-04-17 10:57:27 +02:00
Patrick Ohly
638abf0339 DRA device taints: more logging in test 2025-04-17 10:55:13 +02:00
Patrick Ohly
40f2085d68 DRA device taint: clean up test initialization
The creation of the shared informer factory and starting it can be done all in
the same function, which makes it a bit more obvious what happens in which
order and avoids some code duplication.
2025-04-17 10:55:13 +02:00
googs1025
e8dbfc0b6f add miss Shutdown call for selinux_warning controller 2025-04-14 09:07:51 +08:00
Keerthan Reddy Mala
d4fd41285b update the log message to reflect success and failed jobs 2025-04-08 10:21:02 -07:00
Keerthan Reddy Mala
551f3c7824 merge the integration tests into a single one 2025-04-07 17:37:19 -07:00
Keerthan Reddy Mala
c7d0ed5c48 add integration test for job failure event delay and remove the unit test 2025-04-01 12:38:15 -07:00
Filip Křepinský
8db1426554 rename DeploymentPodReplacementPolicy FG to DeploymentReplicaSetTerminatingReplicas 2025-03-27 20:27:44 +01:00
Jean-Marc François
2dd9eda47f Add configurable tolerance logic. 2025-03-21 18:48:37 -04:00
Harish Kuna
c005b85d4d Reduce locking duration on cache to fetch data from Cache 2025-03-21 15:23:08 +00:00
Edwinhr716
8db5f06183 adding commits of the original PR
isHealthy -> isUnavailable, fixed comments

fixed reversed logic

changed logs from unhealthy to unavailable
2025-03-20 22:46:38 +00:00
Keerthan Reddy Mala
1b8bbcac44 Add integration test 2025-03-20 15:04:44 -07:00
Kubernetes Prow Robot
b0d6079ddc Merge pull request #130947 from pohly/dra-device-taints-flake
DRA device taints: fix some race conditions
2025-03-20 14:16:55 -07:00
Kubernetes Prow Robot
dca334e350 Merge pull request #130859 from hakuna-matatah/optimize-ds
Optimize DS Controller Performance: Reduce Work Duration Time & Minimize Cache Locking.
2025-03-20 14:16:39 -07:00
Patrick Ohly
cfb9486417 DRA taint eviction: avoid nil panic
The timed worker queue actually can have nil entries in its map if the work was
kicked off immediately. This looks like an unnecessary special case (it would
be fine to call AfterFunc with a duration <= 0 and it would do the right
thing), but to avoid more sweeping changes the fix consists of documenting this
special behavior and adding a nil check.
2025-03-20 19:49:54 +01:00
Patrick Ohly
56adcd06f3 DRA device eviction: fix eviction triggered by pod scheduling
Normally the scheduler shouldn't schedule when there is a taint, but perhaps it
didn't know yet.

The TestEviction/update test covered this, but only failed under the right
timing conditions. The new event handler test case covers it reliably.
2025-03-20 19:49:54 +01:00
Patrick Ohly
5856d3ee6f DRA taint eviction: fix waiting in unit test
Events get recorded in the apiserver asynchronously, so even if the test knows
that the event has been evicted because the pod is deleted, it still has to
also check for the event to be recorded.

This caused a flake in the "Consistently" check of events.
2025-03-20 17:59:48 +01:00
Patrick Ohly
ac6e47cb14 DRA taint eviction: improve error handling
There was one error path that led to a "controller has shut down" log
message. Other errors caused different log entries or are so unlikely (event
handler registration failure!) that they weren't checked at all.

It's clearer to let Run return an error in all cases and then log the
"controller has shut down" error at the call site. This also enables tests to
mark themselves as failed, should that ever happen.
2025-03-20 17:59:06 +01:00
Harish Kuna
a67cc3aac1 Reduce locking duration on cache to fetch data in DaemonSet Controller 2025-03-20 16:00:27 +00:00
Kubernetes Prow Robot
68ba091fca Merge pull request #130844 from danwinship/improved-traffic-distribution
KEP-3015 PreferSameZone/PreferSameNode traffic distribution
2025-03-19 13:00:48 -07:00
Kubernetes Prow Robot
ab3cec0701 Merge pull request #130447 from pohly/dra-device-taints
device taints and tolerations (KEP 5055)
2025-03-19 13:00:32 -07:00
Kubernetes Prow Robot
2b79593ece Merge pull request #130225 from ritazh/dra-admin-access-namespace
DRA: AdminAccess validate based on namespace label
2025-03-19 10:18:50 -07:00
Dan Winship
19952a2b7b Implement the EndpointSlice controller side of PreferSameZone/PreferSameNode 2025-03-19 08:39:13 -04:00
Patrick Ohly
9f161590be metrics testing: add type aliases to avoid direct prometheus imports
In tests it is sometimes unavoidable to use the Prometheus types directly,
for example when writing a custom gatherer which needs to normalize data
before testing it. device_taint_eviction_test.go does this to strip
out unpredictable data in a histogram.

With type aliases in a package that is explicitly meant for tests we
can avoid adding exceptions for such tests to the global exception list.
2025-03-19 09:18:38 +01:00
Patrick Ohly
a027b439e5 DRA: add device taint eviction controller
The controller is derived from the node taint eviction controller.
In contrast to that controller it tracks the UID of pods to prevent
deleting the wrong pod when it got replaced.
2025-03-19 09:18:38 +01:00
Rita Zhang
0301e5a9f8 DRA: AdminAccess validate based on namespace label
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com>
2025-03-18 22:56:54 -07:00
Kubernetes Prow Robot
a6227695ab Merge pull request #128402 from richabanker/mvp-agg-discovery
KEP 4020: Replace StorageVersionAPI with aggregated discovery to fetch served resources by a peer apiserver
2025-03-18 21:43:49 -07:00
Kubernetes Prow Robot
9f8a84930d Merge pull request #130573 from natasha41575/pod-conditions
[FG:PodObservedGenerationTracking] kubelet sets observedGeneration on pod conditions
2025-03-18 20:34:08 -07:00
Kubernetes Prow Robot
fe60c4316e Merge pull request #130514 from xigang/daemonset
Add workqueue for node updates in DaemonSetController
2025-03-18 13:52:04 -07:00