Kubernetes Prow Robot
c59203e051
Merge pull request #121967 from torredil/update-logging
...
Update log verbosity for node health and taint checks
2025-04-24 06:22:34 -07:00
Filip Křepinský
8db1426554
rename DeploymentPodReplacementPolicy FG to DeploymentReplicaSetTerminatingReplicas
2025-03-27 20:27:44 +01:00
Jean-Marc François
2dd9eda47f
Add configurable tolerance logic.
2025-03-21 18:48:37 -04:00
Kubernetes Prow Robot
b0d6079ddc
Merge pull request #130947 from pohly/dra-device-taints-flake
...
DRA device taints: fix some race conditions
2025-03-20 14:16:55 -07:00
Kubernetes Prow Robot
dca334e350
Merge pull request #130859 from hakuna-matatah/optimize-ds
...
Optimize DS Controller Performance: Reduce Work Duration Time & Minimize Cache Locking.
2025-03-20 14:16:39 -07:00
Patrick Ohly
cfb9486417
DRA taint eviction: avoid nil panic
...
The timed worker queue actually can have nil entries in its map if the work was
kicked off immediately. This looks like an unnecessary special case (it would
be fine to call AfterFunc with a duration <= 0 and it would do the right
thing), but to avoid more sweeping changes the fix consists of documenting this
special behavior and adding a nil check.
2025-03-20 19:49:54 +01:00
Patrick Ohly
56adcd06f3
DRA device eviction: fix eviction triggered by pod scheduling
...
Normally the scheduler shouldn't schedule when there is a taint, but perhaps it
didn't know yet.
The TestEviction/update test covered this, but only failed under the right
timing conditions. The new event handler test case covers it reliably.
2025-03-20 19:49:54 +01:00
Patrick Ohly
5856d3ee6f
DRA taint eviction: fix waiting in unit test
...
Events get recorded in the apiserver asynchronously, so even if the test knows
that the event has been evicted because the pod is deleted, it still has to
also check for the event to be recorded.
This caused a flake in the "Consistently" check of events.
2025-03-20 17:59:48 +01:00
Patrick Ohly
ac6e47cb14
DRA taint eviction: improve error handling
...
There was one error path that led to a "controller has shut down" log
message. Other errors caused different log entries or are so unlikely (event
handler registration failure!) that they weren't checked at all.
It's clearer to let Run return an error in all cases and then log the
"controller has shut down" error at the call site. This also enables tests to
mark themselves as failed, should that ever happen.
2025-03-20 17:59:06 +01:00
Harish Kuna
a67cc3aac1
Reduce locking duration on cache to fetch data in DaemonSet Controller
2025-03-20 16:00:27 +00:00
Kubernetes Prow Robot
68ba091fca
Merge pull request #130844 from danwinship/improved-traffic-distribution
...
KEP-3015 PreferSameZone/PreferSameNode traffic distribution
2025-03-19 13:00:48 -07:00
Kubernetes Prow Robot
ab3cec0701
Merge pull request #130447 from pohly/dra-device-taints
...
device taints and tolerations (KEP 5055)
2025-03-19 13:00:32 -07:00
Kubernetes Prow Robot
2b79593ece
Merge pull request #130225 from ritazh/dra-admin-access-namespace
...
DRA: AdminAccess validate based on namespace label
2025-03-19 10:18:50 -07:00
Dan Winship
19952a2b7b
Implement the EndpointSlice controller side of PreferSameZone/PreferSameNode
2025-03-19 08:39:13 -04:00
Patrick Ohly
9f161590be
metrics testing: add type aliases to avoid direct prometheus imports
...
In tests it is sometimes unavoidable to use the Prometheus types directly,
for example when writing a custom gatherer which needs to normalize data
before testing it. device_taint_eviction_test.go does this to strip
out unpredictable data in a histogram.
With type aliases in a package that is explicitly meant for tests we
can avoid adding exceptions for such tests to the global exception list.
2025-03-19 09:18:38 +01:00
Patrick Ohly
a027b439e5
DRA: add device taint eviction controller
...
The controller is derived from the node taint eviction controller.
In contrast to that controller it tracks the UID of pods to prevent
deleting the wrong pod when it got replaced.
2025-03-19 09:18:38 +01:00
Rita Zhang
0301e5a9f8
DRA: AdminAccess validate based on namespace label
...
Signed-off-by: Rita Zhang <rita.z.zhang@gmail.com >
2025-03-18 22:56:54 -07:00
Kubernetes Prow Robot
a6227695ab
Merge pull request #128402 from richabanker/mvp-agg-discovery
...
KEP 4020: Replace StorageVersionAPI with aggregated discovery to fetch served resources by a peer apiserver
2025-03-18 21:43:49 -07:00
Kubernetes Prow Robot
9f8a84930d
Merge pull request #130573 from natasha41575/pod-conditions
...
[FG:PodObservedGenerationTracking] kubelet sets observedGeneration on pod conditions
2025-03-18 20:34:08 -07:00
Kubernetes Prow Robot
fe60c4316e
Merge pull request #130514 from xigang/daemonset
...
Add workqueue for node updates in DaemonSetController
2025-03-18 13:52:04 -07:00
Richa Banker
8b2cee83c1
Replace StorageVersion API with aggregated discovery to fetch served resources by a peer for MVP
...
Co-authored-by: Joe Betz <jpbetz@google.com >
Co-authored-by: Jordan Liggitt <jordan@liggitt.net >
2025-03-18 13:27:27 -07:00
Patrick Ohly
13d04d4a92
DRA device taints: copy taintseviction controller
...
This is a verbatim copy of the current pkg/controller/taintseviction code,
revision fc268ecd09 (v1.33.0 plus one commit),
minus the TimedWorker helper.
The intent is to modify the code such that it enforces eviction of pods which
use tainted devices.
2025-03-18 20:52:54 +01:00
Eddie Torres
c766a52356
Implement KEP 4876 Mutable CSINode ( #130007 )
...
* Implement KEP-4876 Mutable CSINode Allocatable Count
Signed-off-by: torredil <torredil@amazon.com >
* Update TestGetNodeAllocatableUpdatePeriod
Signed-off-by: torredil <torredil@amazon.com >
* Implement CSINodeUpdater
Signed-off-by: torredil <torredil@amazon.com >
* Use sync.Once in csiNodeUpdater
Signed-off-by: torredil <torredil@amazon.com >
* ImVerify driver is installed before running periodic updates
Signed-off-by: torredil <torredil@amazon.com >
* Update NodeAllocatableUpdatePeriodSeconds type comment
Signed-off-by: torredil <torredil@amazon.com >
* Leverage apivalidation.ValidateImmutableField in ValidateCSINodeUpdate
Signed-off-by: torredil <torredil@amazon.com >
* Update strategy functions
Signed-off-by: torredil <torredil@amazon.com >
* Run hack/update-openapi-spec.sh
Signed-off-by: torredil <torredil@amazon.com >
* Update VolumeError.ErrorCode field
Signed-off-by: torredil <torredil@amazon.com >
* CSINodeUpdater improvements
Signed-off-by: torredil <torredil@amazon.com >
* Iron out concurrency in syncDriverUpdater
Signed-off-by: torredil <torredil@amazon.com >
* Run hack/update-openapi-spec.sh
Signed-off-by: torredil <torredil@amazon.com >
* Revise logging
Signed-off-by: torredil <torredil@amazon.com >
* Revise log in VerifyExhaustedResource
Signed-off-by: torredil <torredil@amazon.com >
* Update API validation
Signed-off-by: torredil <torredil@amazon.com >
* Add more code coverage
Signed-off-by: torredil <torredil@amazon.com >
* Fix pull-kubernetes-linter-hints
Signed-off-by: torredil <torredil@amazon.com >
* Update API types documentation
Signed-off-by: torredil <torredil@amazon.com >
* Update strategy and validation for new errorCode field
Signed-off-by: torredil <torredil@amazon.com >
* Update validation tests after strategy changes
Signed-off-by: torredil <torredil@amazon.com >
* Update VA status strategy
Signed-off-by: torredil <torredil@amazon.com >
---------
Signed-off-by: torredil <torredil@amazon.com >
2025-03-18 12:45:49 -07:00
xigang
aa32537e9a
Add workqueue for node updates in DaemonSetController
...
Signed-off-by: xigang <wangxigang2014@gmail.com >
2025-03-19 01:09:44 +08:00
mchtech
381ccf0f4c
Fix empty describedObject in hpa status ( #124555 )
...
* fix empty DescribedObject in hpa MetricStatus when object target type is AverageValue
Signed-off-by: mchtech <michu_an@126.com >
* add test
Signed-off-by: mchtech <michu_an@126.com >
---------
Signed-off-by: mchtech <michu_an@126.com >
2025-03-18 09:33:56 -07:00
Natasha Sarkar
4c2be4bdde
kubelet sets observedGeneration in conditions
2025-03-18 15:43:24 +00:00
xigang
5c4948ff31
controller: factor out pod node name indexer helper function
...
Signed-off-by: xigang <wangxigang2014@gmail.com >
2025-03-17 20:21:30 +08:00
Kubernetes Prow Robot
9fd0e20bc2
Merge pull request #129345 from pohly/log-client-go-workqueue
...
client-go workqueue: add optional logger
2025-03-14 06:37:53 -07:00
Kubernetes Prow Robot
af3b4cd57a
Merge pull request #130718 from kei01234kei/feature/use_generic_set
...
Use generic set in pkg/controller/nodelifecycle
2025-03-14 01:21:47 -07:00
Kubernetes Prow Robot
04fb7ac18b
Merge pull request #130536 from tenzen-y/promote-successpolicy-to-ga
...
KEP-3998: Promote JobSuccessPolicy to Stable
2025-03-13 13:27:54 -07:00
Kubernetes Prow Robot
1c756849d6
Merge pull request #130591 from fmuyassarov/devel/logging
...
Refine logging levels in job, IPAM, and replicaSet
2025-03-12 07:13:47 -07:00
Kubernetes Prow Robot
309c4c17fb
Merge pull request #128499 from stlaz/ctb_betav1
...
ClusterTrustBundles - move to beta
2025-03-11 12:47:45 -07:00
Kubernetes Prow Robot
652f681c2b
Merge pull request #130650 from natasha41575/pod-conditions-controller
...
[FG:PodObservedGenerationTracking] controller sets observedGeneration on pod conditions
2025-03-11 11:27:54 -07:00
Stanislav Láznička
5b3b68a3a1
KCM: CTBPublisher: use generics to handle both alpha/beta APIs
2025-03-11 18:07:29 +01:00
Stanislav Láznička
e0f536bf1f
use the ClusterTrustBundles beta API
2025-03-11 18:07:24 +01:00
Keisuke Ishigami
efac8fdea2
Delete todo comment to ignore update where 'old' is equivalent to 'cur' ( #130322 )
...
* use resource version to ignore updating pdb
* delete todo comment
2025-03-11 07:13:46 -07:00
Keisuke Ishigami
cdac61b902
use generic set in sig-node
2025-03-11 20:00:15 +09:00
Feruzjon Muyassarov
4c6971007b
Refine logging levels in Job, IPAM, and ReplicaSet controllers.
...
Adjust logging levels in Job, IPAM, and ReplicaSet controllers from
V(0) to V(2), V(4), V(4) respectively to reduce noise. These logs
provide minimal value at the debug level (V(0)), so they have been
adjusted for better log clarity
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@est.tech >
2025-03-11 10:25:16 +02:00
Kubernetes Prow Robot
3782b558a2
Merge pull request #128786 from danwinship/bad-ip-warnings
...
warn on bad IPs in objects
2025-03-11 00:11:47 -07:00
Natasha Sarkar
af9ac325b1
controller sets observedGeneration on pod conditions
2025-03-10 16:37:55 +00:00
Tim Hockin
e54719bb66
Use randfill, do API renames
2025-03-08 15:18:00 -08:00
Kubernetes Prow Robot
2effa5e3cf
Merge pull request #130352 from natasha41575/kubelet-pod-observedgen
...
[FG:PodObservedGenerationTracking] Kubelet sets pod `status.observedGeneration` when updating the pod status
2025-03-07 13:33:45 -08:00
Dan Winship
d4c55d06cf
Export endpoints, endpointslice, mirroring controller names
2025-03-07 10:52:54 -05:00
Kubernetes Prow Robot
9d45ea8b9d
Merge pull request #128586 from mortent/DRAPrioritizedList
...
Prioritized Alternatives in Device Requests
2025-03-06 21:01:44 -08:00
Natasha Sarkar
701b76f10d
pod gc controller sets status.observedGeneration upon pod failure
2025-03-06 22:31:15 +00:00
Yuki Iwai
749f03a49f
Gradeate Job SuccessPolicy to Stable
...
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com >
2025-03-07 07:21:12 +09:00
Cici Huang
6645022d8b
Update status before returning err
2025-03-06 10:54:45 -08:00
Kubernetes Prow Robot
50927130ff
Merge pull request #130582 from tenzen-y/use-suspended-job-util
...
Job: Use jobSuspended util for suspended detection
2025-03-05 15:49:51 -08:00
Kubernetes Prow Robot
8873c7e875
Merge pull request #130564 from danwinship/label-endpoints
...
Add "endpoints.kubernetes.io/managed-by" label to Endpoints
2025-03-05 13:29:45 -08:00
Yuki Iwai
8202b791e9
Job: Use jobSuspended util for suspended detection
...
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com >
2025-03-05 18:12:59 +09:00