Graduate the feature to beta, by:
- Allowing `subPath`/`subPathExpr` for image volumes
- Modifying the CRI to pass down the (resolved) sub path
- Adding metrics which are outlined in the KEP
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
- Use environment variables to pass string arguments in the node log
query PS command
- Split getLoggingCmd into getLoggingCmdEnv and getLoggingCmdArgs
for better modularization
The original limit of 32 seemed sufficient for a single GPU on a node. But for
shared non-local resources it is too low. For example, a ResourceClaim might be
used to allocate an interconnect channel that connects all pods of a workload
running on several different nodes, in which case the number of pods can be
considerably larger.
256 is high enough for currently planned systems. If we need something even
higher in the future, an alternative approach might be needed to avoid
scalability problems.
Normally, increasing such a limit would have to be done incrementally over two
releases. In this case we decided on
Slack (https://kubernetes.slack.com/archives/CJUQN3E4T/p1734593174791519) to
make an exception and apply this change to current master for 1.33 and backport
it to the next 1.32.x patch release for production usage.
This breaks downgrades to a 1.32 release without this change if there are
ResourceClaims with a number of consumers > 32 in ReservedFor. In practice,
this breakage is very unlikely because there are no workloads yet which need so
many consumers and such downgrades to a previous patch release are also
unlikely. Downgrades to 1.31 already weren't supported when using DRA v1beta1.
* Add feature gate, API, and conflict validation tests for enablecrashloopbackoffmax
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Handle when current base is longer than node max
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Update pkg/features/kube_features.go
Co-authored-by: Tsubasa Nagasawa <toversus2357@gmail.com>
* Fix indentation
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Follow convention for success test
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Normalize casing, and change field to Duration
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Fix json name and some other casing errors
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Another one I missed before
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Don't clobber global max function
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Change to flat value in defaults.go
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Streamline validation and defaults
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Fix typecheck
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Lint
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Tighten up validation for subsecond values
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Rename field from MaxBackOffPeriod to MaxContainerRestartPeriod
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* A few missed references to renames
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Only compare flags in flags test
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Don't mess with SetDefault signature
Nobody messes with SetDefault signature
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Fix stale signature change, and update test data
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Inspect current feature gates at defaulting time
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Don't use the global feature gate for temp usage
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Expose default error, and some comments
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
* Hint fuzzer for less arbitrary values to FeatureGates
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
---------
Signed-off-by: Laura Lorenz <lauralorenz@google.com>
Co-authored-by: Tsubasa Nagasawa <toversus2357@gmail.com>