Commit Graph

6771 Commits

Author SHA1 Message Date
Aldo Culquicondor
637be82479 Add mimowo to approvers of job controller and its integration tests
Change-Id: Ie834aff7070685757c55b7fbcea3bcdef223f6b8
2024-09-09 20:12:02 +00:00
guozheng-shen
686ccceba3 Update node_lifecycle_controller.go
remove 'pod-eviction-timeout' comment
2024-09-09 14:36:41 +08:00
Joe Betz
2595aa1309 generate 2024-09-03 14:26:26 -04:00
carlory
a9de9a3d07 controller-manager adds a new controller named volumeattributesclass-protection-controller which adds/removes finalizer to VAC for protection 2024-08-30 15:00:46 +08:00
Kubernetes Prow Robot
fd7642cfe4 Merge pull request #126745 from hungnguyen243/pvcScalabilityFix
Improve PVC protection controller's scalability by batch-processing PVCs by namespace & caching live pod list results [fixed dead loop issue with idle work queue]
2024-08-27 19:16:57 +01:00
Hung Nguyen
152ab36a33 update error comment 2024-08-26 16:47:39 +00:00
Hung Nguyen
39b6bd1278 address pwschuurman's comments 2024-08-23 18:53:26 +00:00
Kubernetes Prow Robot
113b12c6fb Merge pull request #124439 from bells17/csi-translation-lib-structured-and-contextual-logging
Migrate k8s.io/csi-translation-lib/.* to structured logging
2024-08-19 18:13:54 -07:00
Kubernetes Prow Robot
8db6fc7e3f Merge pull request #126567 from fusida/refactor-job-controller
support the job controller handles the orphan pod using multi workers
2024-08-19 04:57:15 -07:00
古九
a1f0fc8f72 support the job controller handles the orphan pod using multi workers 2024-08-19 14:03:27 +08:00
Hung Nguyen
f9f8b789ca fix spin loop issue with idle work queue 2024-08-16 18:17:47 +00:00
pwschuurman
dbcbdbf5fb Revert "Improve PVC protection controller's scalability by batch-processing PVCs by namespace & caching live pod list results" 2024-08-15 16:33:11 -07:00
Kubernetes Prow Robot
bb7411120a Merge pull request #126287 from devppratik/121793-update-node-monitor-grace-period
node: Update Node Monitor Grace Period default duration to 50s
2024-08-13 21:03:16 -07:00
Kubernetes Prow Robot
9d140b136c Merge pull request #125372 from hungnguyen243/pvcScalabilityFix
Improve PVC protection controller's scalability by batch-processing PVCs by namespace & caching live pod list results
2024-08-13 18:52:42 -07:00
Hung Nguyen
eb16aa1d4a improve PVC Protection Controller's processing mechanism with sample performance test 2024-08-08 16:57:55 +00:00
Omer Aplatony
18c0d6a79e chore(validatingadmissionpolicystatus): use WaitForCacheSync after sharedInformerFactory Start in unit test
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2024-08-08 14:45:26 +03:00
Dr. Stefan Schimanski
c7a1fa432a Call non-blocking informerFactory.Start synchronously to avoid races
Signed-off-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>
2024-07-27 18:13:09 +02:00
devppratik
f8bf6b97b8 Update Node Monitor Grace Period default duration to 50s
Update description

Improve flag comment

Update Test case value to be 50s by default

Update Description

Run make update

Minor description fix
2024-07-24 22:54:44 +05:30
lan.tian
5fec38dc9b typo: update the daemon update typo
Signed-off-by: lan.tian <lance5890@163.com>
2024-07-24 15:00:52 +08:00
carlory
c4851c64a0 remove volumeoptions from VolumePlugin and BlockVolumePlugin 2024-07-24 14:07:02 +08:00
Kubernetes Prow Robot
107f621462 Merge pull request #126108 from gnufied/changes-volume-recovery
Reduce state changes when expansion fails and mark certain failures as infeasible
2024-07-23 13:30:56 -07:00
Drew Sirenko
16c2ad5b84 Add labels to PVCollector bound/unbound PVC metrics for VolumeAttributesClass Feature (#126166)
* Add labels to PVCollector bound/unbound PVC metrics

* fixup! Add labels to PVCollector bound/unbound PVC metrics

* wip: Fix 'Unknown
    Decorator'

* fixup! Add labels to PVCollector bound/unbound PVC metrics
2024-07-23 12:21:29 -07:00
Kubernetes Prow Robot
a00181d4d4 Merge pull request #121902 from carlory/kep-3751-pv-controller
[kep-3751] pvc bind pv with vac
2024-07-23 11:02:13 -07:00
Kubernetes Prow Robot
1854839ff0 Merge pull request #126067 from tenzen-y/implement-job-success-policy-e2e
Graduate the JobSuccessPolicy to Beta
2024-07-23 06:14:23 -07:00
carlory
3a6a4830df pvc bind pv with vac 2024-07-23 15:04:11 +08:00
Yuki Iwai
551931c6a8 Graduate the JobSuccessPolicy to beta
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-07-23 09:29:06 +09:00
Yuki Iwai
6e8dc2c250 Job: Extend the jobs_finished_total metric reason label with SuccessPolicy and CompletionsReached
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-07-23 09:29:02 +09:00
Kubernetes Prow Robot
6e52e705d0 Merge pull request #125374 from pwschuurman/kep-3335-stable
Promote StatefulSetStartOrdinal to stable in 1.31
2024-07-22 14:25:49 -07:00
Kubernetes Prow Robot
d21b17264e Merge pull request #125488 from pohly/dra-1.31
DRA for 1.31
2024-07-22 11:45:55 -07:00
Patrick Ohly
0fc78b9bcc DRA resource claim controller: update test
The resource claim controller is completely agnostic to the claim spec. It
doesn't care about classes or devices, therefore it needs no changes in 1.31
besides the v1alpha2 -> v1alpha3 renaming from a previous commit.
2024-07-22 18:09:34 +02:00
Kubernetes Prow Robot
1f436e0fba Merge pull request #124108 from carlory/update-test-InTreePluginXXXUnregister
update unit test for adc to test volume migration
2024-07-22 06:49:49 -07:00
Yuki Iwai
594490fd77 Job: Add the CompletionsReached reason to the SuccessCriteriaMet condition
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
2024-07-22 21:24:52 +09:00
Patrick Ohly
8a629b9f15 DRA: remove "sharable" from claim allocation result
Now all claims are shareable up to the limit imposed by the size of the
"reserverFor" array.

This is one of the agreed simplifications for 1.31.
2024-07-21 17:28:14 +02:00
Patrick Ohly
de5742ae83 DRA: remove immediate allocation
As agreed in https://github.com/kubernetes/enhancements/pull/4709, immediate
allocation is one of those features which can be removed because it makes no
sense for structured parameters and the justification for classic DRA is weak.
2024-07-21 17:28:14 +02:00
Patrick Ohly
b51d68bb87 DRA: bump API v1alpha2 -> v1alpha3
This is in preparation for revamping the resource.k8s.io completely. Because
there will be no support for transitioning from v1alpha2 to v1alpha3, the
roundtrip test data for that API in 1.29 and 1.30 gets removed.

Repeating the version in the import name of the API packages is not really
required. It was done for a while to support simpler grepping for usage of
alpha APIs, but there are better ways for that now. So during this transition,
"resourceapi" gets used instead of "resourcev1alpha3" and the version gets
dropped from informer and lister imports. The advantage is that the next bump
to v1beta1 will affect fewer source code lines.

Only source code where the version really matters (like API registration)
retains the versioned import.
2024-07-21 17:28:13 +02:00
Kubernetes Prow Robot
892acaa6a7 Merge pull request #126107 from enj/enj/i/svm_not_found_err
svm: set UID and RV on SSA patch to cause conflict on logical create
2024-07-20 08:18:01 -07:00
googs1025
6626b9ce28 chore(Job): remove deprecated fake.NewSimpleClientset method 2024-07-19 23:46:29 +08:00
googs1025
75a4cfbd58 chore(Job): use ctx.Done() instead of stopCh 2024-07-19 23:43:36 +08:00
googs1025
af5b8bed70 chore(Job): use WaitForCacheSync method after sharedInformerFactory Start 2024-07-19 23:41:20 +08:00
Monis Khan
6a6771b514 svm: set UID and RV on SSA patch to cause conflict on logical create
When a resource gets deleted during migration, the SVM SSA patch
calls are interpreted as a logical create request.  Since the object
from storage is nil, the merged result is just a type meta object,
which lacks a name in the body.  This fails when the API server
checks that the name from the request URL and the body are the same.
Note that a create request is something that SVM controller should
never do.

Once the UID is set on the patch, the API server will fail the
request at a slightly earlier point with an "uid mismatch" conflict
error, which the SVM controller can handle gracefully.

Setting UID by itself is not sufficient.  When a resource gets
deleted and recreated, if RV is not set but UID is set, we would get
an immutable field validation error for attempting to update the
UID.  To address this, we set the resource version on the SSA patch
as well.  This will cause that update request to also fail with a
conflict error.

Added the create verb on all resources for SVM controller RBAC as
otherwise the API server will reject the request before it fails
with a conflict error.

The change addresses a host of other issues with the SVM controller:

1. Include failure message in SVM resource
2. Do not block forever on unsynced GC monitor
3. Do not immediately fail on GC monitor being missing, allow for
   a grace period since discovery may be out of sync
4. Set higher QPS and burst to handle large migrations

Test changes:

1. Clean up CRD webhook convertor logs
2. Allow SVM tests to be run multiple times to make finding flakes easier
3. Create and delete CRs during CRD test to force out any flakes
4. Add a stress test with multiple parallel migrations
5. Enable RBAC on KAS
6. Run KCM directly to exercise wiring and RBAC
7. Better logs during CRD migration
8. Scan audit logs to confirm SVM controller never creates

Signed-off-by: Monis Khan <mok@microsoft.com>
2024-07-18 17:19:11 -04:00
Michal Wozniak
1be4df6e02 Cleanup Job controller isPodFailed function 2024-07-18 09:08:23 +02:00
bells17
1298c8a5fe csi-translation-lib: Support structured and contextual logging 2024-07-18 14:01:27 +09:00
carlory
dae05f3b88 cleanup after JobPodFailurePolicy is promoted to GA 2024-07-18 10:00:56 +08:00
Kubernetes Prow Robot
5d40866fae Merge pull request #125994 from carlory/fix-job-api
clean up codes after PodDisruptionConditions was promoted to GA
2024-07-17 14:37:09 -07:00
Peter Schuurman
585971431b Remove StatefulSetStartOrdinal feature gate to target stable in 1.31 2024-07-16 08:05:09 -07:00
Michal Wozniak
370631eb30 Update the set of reasons in comment for JobFinishedNum metric 2024-07-16 12:34:29 +02:00
Kubernetes Prow Robot
68da9a6762 Merge pull request #125925 from mmorel-35/testifylint/pkg/controller
fix: enable testifylint on `pkg/controller`
2024-07-12 12:50:25 -07:00
Hemant Kumar
7a51999ddf Deprecate intree Volume Expansion controller 2024-07-12 14:42:04 -04:00
Michal Wozniak
f1233ac5e0 JobPodFailurePolicy to GA
# Conflicts:
#	pkg/controller/job/job_controller_test.go
2024-07-12 17:21:32 +02:00
Kubernetes Prow Robot
0a3330d6c9 Merge pull request #125510 from mimowo/extend-job-conditions
Delay setting terminal Job conditions until all pods are terminal
2024-07-12 08:12:46 -07:00