Automatic merge from submit-queue
Define interfaces for kubelet pod admission and eviction
There is too much code and logic in `kubelet.go` that makes it hard to test functions in discrete pieces.
I propose an interface that an internal module can implement that will let it make an admission decision for a pod. If folks are ok with the pattern, I want to move the a) predicate checking, b) out of disk, c) eviction preventing best-effort pods being admitted into their own dedicated handlers that would be easier for us to mock test. We can then just write tests to ensure that the `Kubelet` calls a call-out, and we can write easier unit tests to ensure that dedicated handlers do the right thing.
The second interface I propose was a `PodEvictor` that is invoked in the main kubelet sync loop to know if pods should be pro-actively evicted from the machine. The current active deadline check should move into a simple evictor implementation, and I want to plug the out of resource killer code path as an implementation of the same interface.
@vishh @timothysc - if you guys can ack on this, I will add some unit testing to ensure we do the call-outs.
/cc @kubernetes/sig-node @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
kubenet: fix up CNI bridge TX queue length if needed
CNI's bridge plugin mis-handles the TxQLen when creating the bridge,
leading to a zero-length TX queue. This doesn't typically cause
problems (since virtual interfaces don't have hard queue limits)
but when adding traffic shaping, some qdiscs pull their packet
limits from the TX queue length, leading to a packet limit of 0
in some cases. Until we can depend on a new enough version of
CNI, fix up the TX queue length internally.
Closes: https://github.com/kubernetes/kubernetes/issues/25092
Automatic merge from submit-queue
Remove nodeName from predicate signature.
With this approach, I'm getting the initial throughput (in empty cluster) in 1000-node cluster of ~95pods/s.
Which is ~30% improvement.
@kubernetes/sig-scalability
Automatic merge from submit-queue
Kubelet: Cleanup with new engine api
Finish step 2 of #23563
This PR:
1) Cleanup go-dockerclient reference in the code.
2) Bump up the engine-api version.
3) Cleanup the code with new engine-api.
Fixes#24076.
Fixes#23809.
/cc @yujuhong
Automatic merge from submit-queue
Delete pod with uid as precondition.
Addressed https://github.com/kubernetes/kubernetes/issues/25169#issuecomment-217033202.
Fix#25169Fix#24937
This PR change status manager to delete pods with uid as a precondition, so that kubelet won't delete pods with different uid but the same name and namespace accidentally.
/cc @yujuhong
CNI's bridge plugin mis-handles the TxQLen when creating the bridge,
leading to a zero-length TX queue. This doesn't typically cause
problems (since virtual interfaces don't have hard queue limits)
but when adding traffic shaping, some qdiscs pull their packet
limits from the TX queue length, leading to a packet limit of 0
in some cases. Until we can depend on a new enough version of
CNI, fix up the TX queue length internally.
- Expand Attacher/Detacher interfaces to break up work more
explicitly.
- Add arguments to all functions to avoid having implementers store
the data needed for operations.
- Expand unit tests to check that Attach, Detach, WaitForAttach,
WaitForDetach, MountDevice, and UnmountDevice get call where
appropriet.
Fix the following sequence of events:
1. relist call 1 successfully inspects a pod (just has infra container)
1. relist call 2 gets an error inspecting the same pod (has infra container and a transient
container that failed to create) and doesn't update the old/new pod records
1. relist calls 3+ don't inspect the pod any more (just has infra container so it doesn't look like
anything changed)
This change adds a new list that keeps track of pods that failed inspection and retries them the
next time relist is called. Without this change, a pod in this state would never be inspected again,
its entry in the status cache would never be updated, and the pod worker would never call syncPod
again because the most recent entry in the status cache has an error associated with it. Without
this change, pods in this state would be stuck Terminating forever, unless the user issued a
deletion with a grace period value of 0.
This might also preserve fragments, for those crazy enough to pass them.
I am using url.Parse() on the path in order to get path/query/fragment
and also deliberately avoiding the addition of more fields to the API.
Automatic merge from submit-queue
add mutex for kubenet
I saw a bunch of weird cases in kubenet suite. For instance, SetUpPod return successfully, but right after that, kubelet cannot retrieve podIP from podCIDR map.
cc: @dcbw @thockin
ref: #24211
K8 uses CNI_ARGS to pass pod namespace, name and infra container
id to the CNI network plugin. CNI logic will throw an error
if these args are not known to it, unless the user specifies
IgnoreUnknown as part of CNI_ARGS. This PR sets IgnoreUnknown=1
to prevent the CNI logic from erroring and blocking pod setup.
https://github.com/appc/cni/pull/158https://github.com/appc/cni/issues/126
Automatic merge from submit-queue
Promote Pod Hostname & Subdomain to fields (were annotations)
Deprecating the podHostName, subdomain and PodHostnames annotations and created corresponding new fields for them on PodSpec and Endpoints types.
Annotation doc: #22564
Annotation code: #20688
Automatic merge from submit-queue
Store node information in NodeInfo
This is significantly improving scheduler throughput.
On 1000-node cluster:
- empty cluster: ~70pods/s
- full cluster: ~45pods/s
Drop in throughput is mostly related to priority functions, which I will be looking into next (I already have some PR #24095, but we need for more things before).
This is roughly ~40% increase.
However, we still need better understanding of predicate function, because in my opinion it should be even faster as it is now. I'm going to look into it next week.
@gmarek @hongchaodeng @xiang90
Automatic merge from submit-queue
Do not update cache with so much effort
Fixes: #24298
1. Remove automatic update
2. Every time we check if we can get valid value from cache, if not, get the value directly from api
cc @Random-Liu
Automatic merge from submit-queue
rkt: Add post-start hook support.
This adds a poll-and-timeout procedure after the pod is
started, to make sure the post-start hooks execute when the
container is actually running.
This is a temporal workaround for implementing post-hooks,
a long term solution is to use lifecycle event to trigger
those hooks, see https://github.com/kubernetes/kubernetes/issues/23084.
Also this fixes a bug of getting container ID for a non-running
container when running pre-stop hook.
cc @sjpotter @euank @kubernetes/sig-node