Commit Graph

40560 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
8a65055c2d Merge pull request #96638 from hasheddan/cadvisor-util
Fix link to CRI-O sock path
2020-12-08 18:34:31 -08:00
Kubernetes Prow Robot
984bc043d5 Merge pull request #96593 from pandaamanda/typo_fix
fix typo and format for klog
2020-12-08 17:29:43 -08:00
Kubernetes Prow Robot
d7a389ce7a Merge pull request #96582 from chenyw1990/fixSchedulerBug
don't add pod to podQueue when the NodeName of pod is not empty
2020-12-08 17:29:35 -08:00
Kubernetes Prow Robot
4a4bdb0169 Merge pull request #96581 from qingsenLi/201114-unmount
Fix typo unmount for klog
2020-12-08 17:29:27 -08:00
Kubernetes Prow Robot
125530629a Merge pull request #96572 from sjenning/dont-rerun-init
kubelet: do not rerun init containers if any main containers have status
2020-12-08 17:29:18 -08:00
Kubernetes Prow Robot
d2662b9842 Merge pull request #96488 from basantsa1989/kproxy_cleanup
Kube-proxy cleanup: Changing FilterIncorrectIP/CIDR functions to MapIPsToIPFamily that returns a map
2020-12-08 17:28:52 -08:00
Kubernetes Prow Robot
83b2c7a1bf Merge pull request #96311 from thockin/kep-1659-topology-labels
Convert users of old failure-domain labels to new
2020-12-08 17:28:27 -08:00
Kubernetes Prow Robot
9d81c4ebfa Merge pull request #96296 from aojea/extip
kube-proxy treat ExternalIPs as ClusterIPs
2020-12-08 17:28:18 -08:00
Kubernetes Prow Robot
9a175b9b2a Merge pull request #96223 from SataQiu/fix-scheduler-20201104
scheduler: parse Pod's Node affinity once in PreScore phase
2020-12-08 17:28:06 -08:00
Kubernetes Prow Robot
e40cba59e3 Merge pull request #95269 from SataQiu/kubelet-20201003
Fix panic when kubelet register if a node object already exists with no Status.Capacity or Status.Allocatable
2020-12-08 16:29:19 -08:00
Kubernetes Prow Robot
1588d58151 Merge pull request #95099 from brianpursley/TestReadLogs
Added unit tests for ReadLogs
2020-12-08 16:29:02 -08:00
Kubernetes Prow Robot
ce7ac8442e Merge pull request #94599 from verult/adc-op-asw-race
Fixes Attach Detach Controller reconciler race reading ActualStateOfWorld and operation pending states
2020-12-08 16:28:53 -08:00
Kubernetes Prow Robot
b6e0aac05c Merge pull request #93920 from zhouya0/log_with_limited_tail
[Flaky Test] Add limited lines to log when having tail option
2020-12-08 16:28:45 -08:00
Kubernetes Prow Robot
4f2c21f9e8 Merge pull request #93549 from Dean-Coakley/fix-res-quota-comments
Fix ResourceQuota comments
2020-12-08 16:28:36 -08:00
Kubernetes Prow Robot
0ee9c391f1 Merge pull request #92827 from yuanhuaiwang/disruptionresync
Remove resync period for disruption controller
2020-12-08 16:28:26 -08:00
Seth Jennings
c8d02f703b kubelet: do not rerun init containers if any main containers have status 2020-12-01 14:59:03 -06:00
Kubernetes Prow Robot
61dc69ac2c Merge pull request #87461 from bboreham/fix-uid-gen
kubelet: ensure static pod UIDs are unique
2020-12-01 08:18:50 -08:00
SataQiu
2b38078de1 scheduler: parse Pod's Node affinity once in PreScore phase
Signed-off-by: SataQiu <1527062125@qq.com>
2020-11-26 11:19:52 +08:00
Lars Ekman
a0e613363a service.spec.AllocateLoadBalancerNodePorts followup 2020-11-24 08:10:43 +01:00
Jordan Liggitt
5c88880584 Restore beta os/arch labels on initial node registration 2020-11-23 11:23:59 -05:00
Kubernetes Prow Robot
248c116963 Merge pull request #96417 from hvenev-vmware/fix-ipam
Fix double counting of IP addresses
2020-11-23 07:45:33 -08:00
Kubernetes Prow Robot
733582456b Merge pull request #96777 from lianghao208/patch-1
fix: concurrent map writes error in VolumeBinding plugin during Filter
2020-11-23 06:51:34 -08:00
Hristo Venev
c8c81be8af range_allocator: Test (lack of) double counting 2020-11-22 23:09:09 -08:00
Hristo Venev
ee581278bd cidrset: Add test for double counting 2020-11-22 23:09:09 -08:00
Hristo Venev
4d28391c24 Fix double counting of IP addresses
The range allocator in pkg/controller/nodeipam/ipam/range_allocator.go
may call Occupy() on the same range twice:

1. Just before subscribing to the NodeInformer
2. From a callback given to the NodeInformer soon after registration
2020-11-22 23:09:09 -08:00
Antonio Ojea
120472032c kube-proxy: treat ExternalIPs as ClusterIP
Currently kube-proxy treat ExternalIPs differently depending on:
- the traffic origin
- if the ExternalIP is present or not in the system.

It also depends on the CNI implementation to
discriminate between local and non-local traffic.

Since the ExternalIP belongs to a Service, we can avoid the roundtrip
of sending outside the traffic originated in the cluster.

Also, we leverage the new LocalTrafficDetector to detect the local
traffic and not rely on the CNI implementations for this.
2020-11-22 00:54:33 +01:00
rootlh
42c00bc523 fix bug: concurrent map writes error 2020-11-22 01:40:51 +08:00
Kubernetes Prow Robot
ece591f722 Merge pull request #96758 from msau42/revert-84206-refactor/remove-mount-volume-check-orphaned-pod-cleanup
Revert "check volume directories instead of mounts for cleanupOrphanedPodDirs"
2020-11-20 22:37:33 -08:00
Kubernetes Prow Robot
8c7cd8a8cc Merge pull request #96553 from AlexeyPerevalov/FixesKubeletCrashEmptyTopology
Fixes sigfault in case of empty TopologyInfo
2020-11-20 16:03:33 -08:00
Basant Amarkhed
293d4b7c48 Avoiding double parsing of ip/cidr strings and logging bad ips/cidrs 2020-11-20 22:22:55 +00:00
Michelle Au
25edb8bc69 Revert "check volume directories instead of mounts for cleanupOrphanedPodDirs" 2020-11-20 09:06:09 -08:00
Jordan Liggitt
afd92b3b3e Revert "plumb context with request deadline"
This reverts commit 83f869ee13.
2020-11-19 18:15:04 -05:00
Kubernetes Prow Robot
18099e1ef7 Merge pull request #96495 from andrewsykim/dockershim-exec-context
kubelet: dockershim ExecSync should return context.DeadlineExeceeded on timeout
2020-11-19 09:06:51 -08:00
chenyw1990
a8add50ab6 don't add pod to podQueue when the NodeName of pod is not empty 2020-11-19 08:01:59 +08:00
Kubernetes Prow Robot
160c33a6a1 Merge pull request #96533 from gnufied/reduce-vsphere-volume-name
Reduce volume name length for vsphere
2020-11-17 17:34:05 -08:00
hasheddan
97c358fe5b Fix link to cadvisor CRI-O sock path
Fixes link to point to CRI-O sock constant defined in cadvisor. We
cannot pin directly because of linux build tags in transitive dependency
opencontaines/runc.

Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
2020-11-17 12:02:27 -06:00
Jordan Liggitt
e491c3bc70 Add GC unit tests
Adds unit tests covering the problematic scenarios identified
around conflicting data in child owner references

                      Before   After
package level         51%      68%
garbagecollector.go   60%      75%
graph_builder.go      50%      81%
graph.go              50%      68%

Added/improved coverage of key functions that had lacking unit test coverage:

* attemptToDeleteWorker
* attemptToDeleteItem
* processGraphChanges (added coverage of all added code)
2020-11-17 10:49:32 -05:00
Jordan Liggitt
603a0b016e Log cluster-scoped owners referencing namespaced owners, avoid retrying lookups forever
If a cluster-scoped dependent references a namespace-scoped owner,
this is an invalid relationship, and the lookup will never succeed in attemptToDelete.

Short-circuit requeueing in attemptToDelete and log.
2020-11-17 10:49:30 -05:00
Jordan Liggitt
221e4aa2c2 Queue non-matching children for deletion when a virtual node is marked as observed
When we observe valid coordinates for a previously virtual node,
if there are dependents that do not agree with those coordinates,
add them to the attemptToDelete queue.

This queue will check the dependent's ownerReferences using the coordinates specified by the dependent.
If all of the owners can be verified absent, the dependent will be deleted.
If some are still present, or if there are errors looking them up, the dependent will not be deleted.

If the verified owner is namespaced, and the dependent is not in the same namespace,
an event will be recorded for user visibility, since cross-namespace ownerReferences are not supported.
2020-11-17 10:49:27 -05:00
Jordan Liggitt
b655f22509 Handle virtual delete events when children don't agree on owner coordinates
If a virtual delete event is received for a node whose dependents disagree on the parent's coordinates:
1. propagate the delete to children that matched the verified absent coordinates
2. if the existing node is virtual, select a new set of coordinates from the remaining dependents
3. do not delete the parent node from the graph if the parent node is non-virtual,
   or if there are dependents that do not agree with the virtual delete event coordinates
2020-11-17 10:49:07 -05:00
Jordan Liggitt
b8d7ecf73b Make node removal conditional in processGraphChanges 2020-11-17 10:49:04 -05:00
Jordan Liggitt
ac8d419b4c Enqueue dependents for deletion when their ownerReference does not match observed parent coordinates
When adding a dependent to the graph, we ensure there is a node representing each owner reference,
and add the dependent to each parent node.

If the parent node already exists, and the dependent's ownerReference
coordinates disagree with the verified coordinates, add the dependent to the attemptToDelete queue.

This queue will check the dependent's ownerReferences using the coordinates specified by the dependent.
If all of the owners can be verified absent, the dependent will be deleted.
If some are still present, or if there are errors looking them up, the dependent will not be deleted.

If the parent node has been observed via informer event (so we know the coordinates are accurate),
and the verified owner is namespaced, and the dependent is not in the same namespace,
an event will be recorded for user visibility, since cross-namespace ownerReferences are not supported.
2020-11-17 10:47:39 -05:00
Jordan Liggitt
78317edb8b Short-circuit attemptToDelete loop for virtual nodes that are removed or observed
Virtual nodes are added to the attemptToDelete queue, and continue getting requeued
until they are successfully verified absent or are observed via informer.

In the meantime, if the real object associated with that UID is observed via informer,
or is observed to be deleted via informer, the graph node for that UID can be removed
or marked as observed. In that case, we should stop retrying to get the virtual node coordinates.
2020-11-17 10:46:00 -05:00
Jordan Liggitt
cae56bea0a Replace virtual node with observed node if identity differs
If the graph contains a virtual node (because some child object referenced it in an OwnerRef),
and a real informer event is observed for that uid at different coordinates,
we want to fix the coordinates of the node in the graph to match the actual coordinates.

The safe way to do this is to clone the node, replace the identity in the clone,
then replace the node with the clone.

Modifying the identity directly is not safe because it is accessed lock-free from many code paths.

Replacing the node in the graph from processGraphChanges is safe because it is the only graph writer.
2020-11-17 10:42:48 -05:00
Jordan Liggitt
cb7b9ed532 Refactor identityFromEvent 2020-11-17 10:42:48 -05:00
Jordan Liggitt
30eb6683e6 Avoid marking virtual nodes as observed when they haven't been
Virtual nodes can be added to the GC graph in order to represent objects
which have not been observed via an informer, but are referenced via ownerReferences.

These virtual nodes are requeued into attemptToDelete until they are observed via an informer,
or successfully verified absent via a live lookup. Previously, both of those code paths
called markObserved() to stop requeuing into attemptToDelete.

Because it is useful to know whether a particular node has been observed via
a real informer event, this commit does the following:

* adds a `virtual bool` attribute to graph events so we know which ones came from a real informer
* limits the markObserved() call to the code path where a real informer event is observed
* uses an alternative mechanism to stop requeueing into attemptToDelete when a virtual node is verified absent via a live lookup
2020-11-17 10:42:48 -05:00
Jordan Liggitt
445f20dbdb Switch GC absentOwnerCache to full reference
Before deleting an object based on absent owners, GC verifies absence of those owners with a live lookup.

The coordinates used to perform that live lookup are the ones specified in the ownerReference of the child.

In order to performantly delete multiple children from the same parent (e.g. 1000 pods from a replicaset),
a 404 response to a lookup is cached in absentOwnerCache.

Previously, the cache was a simple uid set. However, since children can disagree on the coordinates
that should be used to look up a given uid, the cache should record the exact coordinates verified absent.
This is a [apiVersion, kind, namespace, name, uid] tuple.
2020-11-17 10:42:48 -05:00
Jordan Liggitt
09bdf76b8a Plumb event recorder to garbage collector controller 2020-11-17 10:42:45 -05:00
Basant Amarkhed
f11c4e9c8c Testcases for MapCIDRsByIPFamily 2020-11-17 07:35:50 +00:00
Basant Amarkhed
707073d2f9 Fixup #1 addressing review comments 2020-11-17 07:13:51 +00:00