kube-proxy needs to delete stale conntrack entries for UDP services to
avoid blackholing traffic. Instead of using the conntrack binary it
can use netlink calls directly, reducing the containers images size and
the security surface.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
Co-authored-by: Antonio Ojea <aojea@google.com>
* LocalTrafficDetector construction and test improvements
* Reorder getLocalDetector unit test fields so "input" args come before "output" args
* Don't pass DetectLocalMode as a separate arg to getLocalDetector
It's already part of `config`
* Clarify test names in preparation for merging
* Merge single-stack/dual-stack LocalTrafficDetector construction
Also, only warn if the *primary* IP family is not correctly configured
(since we don't actually know if the cluster is really dual-stack or
not), and pass the pair of detectors to the proxiers as a map rather
than an array.
* Remove the rest of Test_getDualStackLocalDetectorTuple
This reverts commit 8bccf4873b, except
for the nftables unit test changes, since we still want the "new"
results (not to mention the bugfixes), just for a different reason
now.
NFTables proxy will now drop traffic directed towards unallocated
ClusterIPs and reject traffic directed towards invalid ports of
Cluster IPs.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
And use the fake interface in the unit tests, removing the dependency
on setting up FakeExec stuff when conntrack cleanup will be invoked.
Also, remove the isIPv6 argument to CleanStaleEntries, because it can
be inferred from the other args.
safeIpset was a wrapper for thread-safely sharing an ipset.IPSet, but
this was unnecessary because ipset.IPSet is just a wrapper around exec
anyway and doesn't need any locking.
BaseEndpointInfo's fields, unlike BaseServicePortInfo's, were all
exported, which then required adding "Get" before some of the function
names in Endpoint so they wouldn't conflict.
Fix that, now that the iptables and ipvs unit tests don't need to be
able to construct BaseEndpointInfos by hand.
A new --init-only flag is added tha makes kube-proxy perform
configuration that requires privileged mode and exit. It is
intended to be executed in a privileged initContainer, while
the main container may run with a stricter securityContext
The use of "Endpoint" vs "Endpoints" in these type names is tricky
because it doesn't always make sense to use the same singular/plural
convention as the corresonding service-related types, since often the
service-related type is referring to a single service while the
endpoint-related type is referring to multiple endpoint IPs.
The "endpointsInfo" types in the iptables and winkernel proxiers are
now "endpointInfo" because they describe a single endpoint IP (and
wrap proxy.BaseEndpointInfo).
"UpdateEndpointMapResult" is now "UpdateEndpointsMapResult", because
it is the result of EndpointsMap.Update (and it's clearly correct for
EndpointsMap to have plural "Endpoints" because it's a map to an array
of proxy.Endpoint objects.)
"EndpointChangeTracker" is now "EndpointsChangeTracker" because it
tracks changes to the full set of endpoints for a particular service
(and the new name matches the existing "endpointsChange" type and
"Proxier.endpointsChanges" fields.)
TL;DR: we want to start failing the LB HC if a node is tainted with ToBeDeletedByClusterAutoscaler.
This field might need refinement, but currently is deemed our best way of understanding if
a node is about to get deleted. We want to do this only for eTP:Cluster services.
The goal is to connection draining terminating nodes