After a Node has stopped posting heartbeats for nodeMonitorGracePeriod,
it will be considered unreachable, its ready condition will be set to
Unknown, NoSchedule taint will be added, all Pods on it will be set to
NotReady, but there is always a delay of 5s before NoExecute taint is
added to the Node, adding 5s to the recovery time of Pods which are
supposed to be evicted by the taint and recreated on other Nodes sooner.
The delay is because processTaintBaseEviction() uses the last observed
ready condition of the Node instead of the current one to determine
whether it should add the Node to the taint queue. When a Node is set to
unreachable due to missing heartbeats, the last observed ready condition
is still true and the current ready condition is unknown, we should use
the latter for processTaintBaseEviction().
Signed-off-by: Quan Tian <qtian@vmware.com>
The "// import <path>" comment has been superseded by Go modules.
We don't have to remove them, but doing so has some advantages:
- They are used inconsistently, which is confusing.
- We can then also remove the (currently broken) hack/update-vanity-imports.sh.
- Last but not least, it would be a first step towards avoiding the k8s.io domain.
This commit was generated with
sed -i -e 's;^package \(.*\) // import.*;package \1;' $(git grep -l '^package.*// import' | grep -v 'vendor/')
Everything was included, except for
package labels // import k8s.io/kubernetes/pkg/util/labels
because that package is marked as "read-only".
- Increase the global level for broadcaster's logging to 3 so that users can ignore event messages by lowering the logging level. It reduces information noise.
- Making sure the context is properly injected into the broadcaster, this will allow the -v flag value to be used also in that broadcaster, rather than the above global value.
- test: use cancellation from ktesting
- golangci-hints: checked error return value
Most of the individual controllers were already converted earlier. Some log
calls were missed or added and then not updated during a rebase. Some of those
get updated here to fill those gaps.
Adding of the name to the logger used by each controller gets
consolidated in this commit. By using the name under which the
controller is registered we ensure that the names in the log
are consistent.
Marking the pods not ready on a node requires looping over them and
updating each pod's status one at a time. This is performed serially,
and can take a while if we're processing each node serially as well.
Since the time is spent waiting on io, there's an opportunity to go
faster by processing multiple nodes concurrently. This change modifies
the loop to process nodes in parallel, using the same number of workers
as doNodeProcessingPassWorker.
This change also introduces histogram metrics to better observe
monitorNodeHealth.