Commit Graph

43185 Commits

Author SHA1 Message Date
Kevin Klues
876dd9b078 Added algorithm to CPUManager to distribute CPUs across NUMA nodes
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
462544d079 Split CPUManager takeByTopology() into two different algorithms
The first implements the original algorithm which packs CPUs onto NUMA nodes if
more than one NUMA node is required to satisfy the allocation. The second
disitributes CPUs across NUMA nodes if they can't all fit into one.

The "distributing" algorithm is currently a noop and just returns an error of
"unimplemented". A subsequent commit will add the logic to implement this
algorithm according to KEP 2902:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
Kevin Klues
0e7928edce Add new CPUManager policy option for "distribute-cpus-across-numa"
This commit only adds the option to the policy options framework. A
subsequent commit will add the logic to utilize it.

The KEP describing this new option can be found here:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
Francesco Romani
4bae656835 cpumanager: test NUMA node support for CPU assign (2)
This batch of tests adds a fake topology on which each numa node
has multiple sockets. We didn't find yet a real HW topology in the wild
like this, but we need one to fully exercise the code.

So, until we find a HW topology, we add a fake one flipping
the NUMA/socket config of the existing xeon dual gold 6320.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
547996f3f6 cpumanager: test NUMA node support for CPU assign (1)
This batch of tests adds a real topology on which each physical socket
has multiple NUMA zones. Taken by a real dual xeon 6320 gold.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
f6ccc4426a cpumanager: test: use proper subtests
The exisiting unit tests where performing subtests without
actually using the full features of the testing package
(https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks)

Update them with fairly minimal changes. The patch is deceptively
large because we need to move the code inside a new block.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
15caa134b2 cpumanager: topology: use rich cmp package
User the `cmp.Diff` package in the unit tests, moving away from
`reflect.DeepEqual`. This gives us a clearer picture of the differences
when the tests fail.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Kevin Klues
aff54a0914 Abstract out whether NUMA or Sockets come first in the memory hierarchy
This allows us to get rid of the check for determining which one is higher all
throughout the code. Now we just check once and instantiate an interface of the
appropriate type that makes sure the ordering in the hierarchy is preserved
through the appropriate calls.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 10:29:15 +00:00
Kevin Klues
17c7e86c6d Add NUMA support to the CPU assignment algorithm in the CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 08:35:59 +00:00
Kubernetes Prow Robot
30a32a39a4 Merge pull request #105136 from astraw99/fix-csi-mount-log
Fix CSI `mounter.TearDownAt` log msg
2021-10-14 11:54:55 -07:00
Kubernetes Prow Robot
0bfa37dfcc Merge pull request #105676 from alculquicondor/job-name
Fix name for Pods of NonIndexed Jobs
2021-10-14 10:50:12 -07:00
Shivanshu Raj Shrivastava
7d9a6d1de6 Migrated pkg/proxy/ipvs to structured logging (#104932)
* migrated ipset.go

* migrated graceful_termination.go

* fixed vstring

* fixed ip set entry, made it consistent

* fixed rs logging

* resolving review comments for key graceful_termination.go

* refactoring ipset.go

* included review changes
2021-10-14 09:47:29 -07:00
Shivanshu Raj Shrivastava
daf5af2917 Migrated pkg/proxy to structured logging (#104891)
* migrated service.go to structured logging

* fixing capital letter in starting

* migrated topology.go

* migrated endpointslicecache.go

* migrated endpoints.go

* nit typo

* nit plural to singular

* fixed format

* code formatting

* resolving review comment for key ipFamily

* resolving review comment for key endpoints.go

* code formating

* Converted Warningf to ErrorS, wherever applicable

* included review changes

* included review changes
2021-10-14 09:47:17 -07:00
Kubernetes Prow Robot
dea052ceba Merge pull request #105479 from ahg-g/ahg-mutable
Allow updating scheduling directives of suspended jobs that never started
2021-10-14 08:09:18 -07:00
Aldo Culquicondor
4ef9d18abe Fix name for Pods of NonIndexed Jobs
Change-Id: I0ea4685a82f4cdec0caab362d52144476652f95a
2021-10-14 10:55:46 -04:00
Abdullah Gharaibeh
335817cbce Allow updating node affinity, selector and tolerations for suspended jobs that never started 2021-10-14 10:04:47 -04:00
Kubernetes Prow Robot
3aafe75698 Merge pull request #105461 from damemi/wire-contexts-autoscaling
Wire contexts to Autoscaling controllers
2021-10-14 06:59:33 -07:00
Kubernetes Prow Robot
f27e4714ba Merge pull request #105377 from damemi/wire-contexts-apps
Wire contexts to Apps controllers
2021-10-14 06:59:19 -07:00
Kubernetes Prow Robot
baaa53db64 Merge pull request #105211 from xiaopingrubyist/fix-pv-controller-claim-cache-issue
fix:claim cached in pvcontroller is not the newest may cause unexpected issue
2021-10-14 05:47:18 -07:00
Kubernetes Prow Robot
a8bda48abe Merge pull request #105474 from mauriciopoppe/readd-volume-subpath-flag
Add VolumeSubpath feature gate back in preparation for its removal
2021-10-13 21:55:28 -07:00
astraw99
5e789f157c fix CSI mount log 2021-10-14 10:27:50 +08:00
Kubernetes Prow Robot
894ceb63d0 Merge pull request #105003 from swatisehgal/getallocatable-to-beta
podresource-api: getAllocatableResources to Beta
2021-10-13 17:43:27 -07:00
Mike Dame
41fcb95f2f Wire contexts to Apps controllers 2021-10-13 16:32:13 -04:00
torubylist
f28a8d7f2b fix:cached claim is not the newest will cause unexpected issue 2021-10-13 20:03:00 +08:00
Mike Dame
7780024916 Wire contexts to Autoscaling controllers 2021-10-12 14:34:05 -04:00
Maciej Szulik
8322121434 Move test-related utils to test/utils 2021-10-12 14:52:19 +02:00
Maciej Szulik
1fb6bf8a14 Wire context instead of TODO 2021-10-12 13:21:45 +02:00
Kubernetes Prow Robot
a923852ba0 Merge pull request #105215 from rphillips/add_probe_shutdown
kubelet: add probe termination to graceful shutdowns
2021-10-11 21:19:46 -07:00
Kubernetes Prow Robot
67afa05c17 Merge pull request #105531 from aojea/master_leases
improve error message on control-plane endpoint reconciler
2021-10-11 15:01:02 -07:00
Kubernetes Prow Robot
dc9c571166 Merge pull request #105569 from pohly/generic-ephemeral-kubelet-volume-stats
kubelet: also provide filesystem stats for generic ephemeral volumes
2021-10-11 07:52:39 -07:00
Kubernetes Prow Robot
1f2813368e Merge pull request #105542 from pohly/generic-ephemeral-volume-util-kubelet
kubelet: use generic ephemeral volume helper functions
2021-10-11 02:16:40 -07:00
Kubernetes Prow Robot
fb82a0d7eb Merge pull request #104873 from pohly/json-output-stream
JSON output streams
2021-10-10 17:04:37 -07:00
Patrick Ohly
b22263d835 component-base: configurable JSON output
This implements the replacement of klog output to different files per level
with optionally splitting JSON output into two streams: one for info messages
on stdout, one for error messages on stderr. The info messages can get buffered
to increase performance. Because stdout and stderr might be merged by the
consumer, the info stream gets flushed before writing an error, to ensure that
the order of messages is preserved.

This also ensures that the following code pattern doesn't leak info messages:
   klog.ErrorS(err, ...)
   os.Exit(1)

Commands explicitly have to flush before exiting via logs.FlushLogs. Most
already do. But buffered info messages can still get lost during an unexpected
program termination, therefore buffering is off by default.

The new options get added to the v1alpha1 LoggingConfiguration with new command
line flags. Because it is an alpha field, changing it inside the v1beta kubelet
config should be okay as long as the fields are clearly marked as alpha.
2021-10-09 10:10:35 +02:00
Kubernetes Prow Robot
835980ac67 Merge pull request #105424 from kerthcet/cleanup/remove-scheduler-policy-config
remove scheduler policy config
2021-10-08 10:57:23 -07:00
Antonio Ojea
da8ce6aa3e improve error message on control-plane endpoint reconciler 2021-10-08 19:16:46 +02:00
Kubernetes Prow Robot
76c86ce324 Merge pull request #105219 from sahilvv/ga_ttl
GA TTLAfterFinish
2021-10-08 09:38:59 -07:00
kerthcet
a6f695581b remove legacy scheduler policy config, as well as associated flags policy-config-file, policy-configmap, policy-configmap-namespace and use-legacy-policy-config
Signed-off-by: kerthcet <kerthcet@gmail.com>
2021-10-08 23:57:49 +08:00
Kubernetes Prow Robot
63f66e6c99 Merge pull request #105012 from fromanirh/cpumanager-policy-options-beta
node: graduate CPUManagerPolicyOptions to beta
2021-10-08 07:32:59 -07:00
Kubernetes Prow Robot
2face135c7 Merge pull request #97415 from AlexeyPerevalov/ExcludeSharedPoolFromPodResources
Return only isolated cpus in podresources interface
2021-10-08 05:58:58 -07:00
Patrick Ohly
b1ba381ef8 kubelet: also provide filesystem stats for generic ephemeral volumes
When checking for a reference to a PVC, the code also needs to consider that a
PVC might be referenced indirectly through an ephemeral volume source.
2021-10-08 12:11:52 +02:00
Kubernetes Prow Robot
60ab733932 Merge pull request #105546 from Huang-Wei/fix-evt-volumebinding
sched: adjust events to register for VolumeBinding plugin
2021-10-08 02:12:57 -07:00
Kubernetes Prow Robot
dd650bd41f Merge pull request #105527 from rphillips/fixes/filter_terminated_pods
kubelet: set terminated podWorker status for terminated pods
2021-10-07 22:19:51 -07:00
Sahil Vazirani
3988405c8d GA TTLAfterFinish 2021-10-07 16:58:50 -07:00
Ryan Phillips
0166d446b9 kubelet: set terminated podWorker status for terminated pods 2021-10-07 16:18:59 -05:00
Kubernetes Prow Robot
9b45983d3c Merge pull request #104251 from ravisantoshgudimetla/scheduling-v1beta3
Scheduling v1beta3
2021-10-07 10:47:32 -07:00
Wei Huang
b7d90ca991 sched: adjust events to register for VolumeBinding plugin 2021-10-07 08:51:04 -07:00
Patrick Ohly
844662e7fa kubelet: use generic ephemeral volume helper functions
The name concatenation and ownership check were originally considered small
enough to not warrant dedicated functions, but the intent of the code is more
readable with them.
2021-10-07 17:31:54 +02:00
Kubernetes Prow Robot
b0eac84937 Merge pull request #105345 from pohly/generic-ephemeral-volume-util
generic ephemeral volume util, base code and controller
2021-10-07 08:19:47 -07:00
ravisantoshgudimetla
5c7f602f48 Make v1beta3 default 2021-10-07 10:58:06 -04:00
Alexey Perevalov
5d9032007a Return only isolated cpus in podresources interface
Co-Authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2021-10-07 15:34:08 +01:00