Basically all callers want dual-stack-if-possible, so simplify that.
Also, tweak the startup-time checking in kubelet to treat "no iptables
support" as interesting but not an error.
Remove a bunch of comments that are either inaccurate ("the proxier
can only be tested by e2e tests") or weirdly overspecific about
obvious details ("the proxier will not exit if an iptables call
fails").
Remove the utilexec.Interface args from the iptables/ipvs constructors
(which have been unused since the conntrack cleanup code was ported to
netlink).
Remove the EventRecorder fields from the iptables/ipvs Proxiers, which
have been unused since we removed the port-opener code in 2022.
Remove the strictARP field from the ipvs Proxier, which has apparently
always been unused (strictARP is only looked at at construct time).
KubeProxy operates with a single health server and two proxies,
one for each IP family. The use of the term 'proxier' in the
types and functions within pkg/proxy/healthcheck can be
misleading, as it may suggest the existence of two health
servers, one for each IP family.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
kube-proxy needs to delete stale conntrack entries for UDP services to
avoid blackholing traffic. Instead of using the conntrack binary it
can use netlink calls directly, reducing the containers images size and
the security surface.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
Co-authored-by: Antonio Ojea <aojea@google.com>
If the nfacct sub-system is not available in the kernel then:
1. nfacct based metrics won't be registered.
2. proxier will not attempt to ensure the counters
Signed-off-by: Daman Arora <aroradaman@gmail.com>
* LocalTrafficDetector construction and test improvements
* Reorder getLocalDetector unit test fields so "input" args come before "output" args
* Don't pass DetectLocalMode as a separate arg to getLocalDetector
It's already part of `config`
* Clarify test names in preparation for merging
* Merge single-stack/dual-stack LocalTrafficDetector construction
Also, only warn if the *primary* IP family is not correctly configured
(since we don't actually know if the cluster is really dual-stack or
not), and pass the pair of detectors to the proxiers as a map rather
than an array.
* Remove the rest of Test_getDualStackLocalDetectorTuple
Track packets dropped by proxy which were marked invalid by conntrack
using nfacct netfilter extended accounting infrastructure.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
This reverts commit 8bccf4873b, except
for the nftables unit test changes, since we still want the "new"
results (not to mention the bugfixes), just for a different reason
now.
NFTables proxy will now drop traffic directed towards unallocated
ClusterIPs and reject traffic directed towards invalid ports of
Cluster IPs.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
And use the fake interface in the unit tests, removing the dependency
on setting up FakeExec stuff when conntrack cleanup will be invoked.
Also, remove the isIPv6 argument to CleanStaleEntries, because it can
be inferred from the other args.
In the original version of "MinimizeIPTablesRestore", we skipped the
bottom half of the sync loop when we weren't re-syncing a service, so
certain things that couldn't be skipped had to be done in the top
half. But the code was later changed to always run through the whole
loop body (just not necessarily writing out rules in the bottom half),
so we can reorganize things now to put some related bits of code back
together.
(In particular, this also resolves the fact that we were accidentally
adding the endpoint chains to activeNATChains twice.)
Also change activeNATChains to be a proper generic Set type.
BaseEndpointInfo's fields, unlike BaseServicePortInfo's, were all
exported, which then required adding "Get" before some of the function
names in Endpoint so they wouldn't conflict.
Fix that, now that the iptables and ipvs unit tests don't need to be
able to construct BaseEndpointInfos by hand.
A new --init-only flag is added tha makes kube-proxy perform
configuration that requires privileged mode and exit. It is
intended to be executed in a privileged initContainer, while
the main container may run with a stricter securityContext
The use of "Endpoint" vs "Endpoints" in these type names is tricky
because it doesn't always make sense to use the same singular/plural
convention as the corresonding service-related types, since often the
service-related type is referring to a single service while the
endpoint-related type is referring to multiple endpoint IPs.
The "endpointsInfo" types in the iptables and winkernel proxiers are
now "endpointInfo" because they describe a single endpoint IP (and
wrap proxy.BaseEndpointInfo).
"UpdateEndpointMapResult" is now "UpdateEndpointsMapResult", because
it is the result of EndpointsMap.Update (and it's clearly correct for
EndpointsMap to have plural "Endpoints" because it's a map to an array
of proxy.Endpoint objects.)
"EndpointChangeTracker" is now "EndpointsChangeTracker" because it
tracks changes to the full set of endpoints for a particular service
(and the new name matches the existing "endpointsChange" type and
"Proxier.endpointsChanges" fields.)
Conntrack invalid packets may cause unexpected and subtle bugs
on esblished connections, because of that we install by default an
iptables rules that drops the packets with this conntrack state.
However, there are network scenarios, specially those that use multihoming
nodes, that may have legit traffic that is detected by conntrack as
invalid, hence these iptables rules are causing problems dropping this
traffic.
An alternative to solve the spurious problems caused by the invalid
connectrack packets is to set the sysctl nf_conntrack_tcp_be_liberal
option, but this is a system wide setting and we don't want kube-proxy
to be opinionated about the whole node networking configuration.
Kube-proxy will only install the DROP rules for invalid conntrack states
if the nf_conntrack_tcp_be_liberal is not set.
Change-Id: I5eb326931ed915f5ae74d210f0a375842b6a790e