59 Commits

Author SHA1 Message Date
Dan Winship
13f0449e4c Fix up kube-proxy import ordering/organization. 2025-03-07 10:43:43 -05:00
Nadia Pinaeva
cc0faf086d [kube-proxy:nftables] Skip EP chain updates on startup.
Endpoint chain contents are fairly predictable from their name and
existing affinity sets. Skip endpoint chain updates, when we can be sure
that rules in that chain are still correct.

Add unit test to verify first transaction is optimized.
Change baseRules ordering to make it accepted by nft.ParseDump.

Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
2025-02-27 10:07:22 +01:00
Dan Winship
f5969adb14 Clean up NewServiceChangeTracker/NewEndpointsChangeTracker args
Remove the now-unused event recorders, and put the remaining args into
a sensible order, and consistent between the two.
2024-12-14 12:12:42 -05:00
Nadia Pinaeva
90e64a57c6 kube-proxy,nftables: add debug logging for failed transaction.
Use a rate limiter to avoid large output with a high rate.

Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
2024-12-13 15:53:19 +01:00
Antonio Ojea
f93e6f3d3a kube-proxy implement dual stack metrics
Signed-off-by: Daman Arora <aroradaman@gmail.com>
Co-authored-by: Antonio Ojea <aojea@google.com>
2024-12-12 16:13:30 +05:30
Daman Arora
6657d220d3 proxy: cleanup UpdateServiceMapResult
Signed-off-by: Daman Arora <aroradaman@gmail.com>
2024-10-28 20:10:46 +05:30
Daman Arora
c398af07fa proxy: refactor UpdateEndpointsMapResult
Signed-off-by: Daman Arora <aroradaman@gmail.com>
2024-10-28 20:10:34 +05:30
Paco Xu
0e10a3a28c Revert "re: kube-proxy: internal config: refactor HealthzAddress and MetricsAddress " 2024-10-21 11:36:59 +08:00
Kubernetes Prow Robot
4d32d7e5ad Merge pull request #127930 from aroradaman/kube-proxy-refactor-healthz-metrics-address
re: kube-proxy: internal config: refactor HealthzAddress and MetricsAddress
2024-10-17 16:03:11 +01:00
Daman Arora
48f1356b2f pkg/proxy: refactor NodePortAddresses to NodeAddressHandler
Signed-off-by: Daman Arora <aroradaman@gmail.com>
2024-10-14 21:49:29 +05:30
Aohan Yang
da5738d9aa Set feature gate emulation version during test 2024-10-10 19:26:31 +08:00
Matthieu MOREL
f736cca0e5 fix: enable expected-actual rule from testifylint in module k8s.io/kubernetes
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-09-27 07:56:31 +02:00
Nadia Pinaeva
2ec3929134 [kube-proxy:nftables] Add partial sync unit test.
Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
2024-07-23 17:32:30 +02:00
Nadia Pinaeva
3ccf5b8a55 [kube-proxy:nftables] Add partialSync mode to only transact changed
objects.
Change the order of operations to stop current iteration if no changes
to the service chains are needed.
Bump syncProxy frequency to 1 hour.
In a test kind cluster creation of 10K services, 2 endpoints each,
takes ~25m before the fix and ~9min after. Maximum memory usage
during creation is ~650MiB and 260MiB respectively.
Another important metric is the time it takes to create 1 new service
when 10K svc already exist. It used to take ~8m before the fix,
with partialSync it takes ~141ms.

Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
2024-07-23 17:32:30 +02:00
Dan Winship
30bc1b59d7 Add unit tests to validate "bad IP/CIDR" handling in kube-proxy
Also, fix the handling of bad EndpointSlice IPs!
2024-07-18 10:55:13 -04:00
Dan Winship
f762e5c8de Remove an unnecessary comment in nftables output
(It's redundant with the chain name.)
2024-07-18 10:54:30 -04:00
Dan Winship
11f55eae96 Reduce some duplication in nftables unit tests 2024-07-18 10:53:36 -04:00
Kubernetes Prow Robot
ae8474adcd Merge pull request #124557 from danwinship/metrics-and-stuff
kube-proxy metrics cleanup (and stuff)
2024-04-26 18:31:57 -07:00
Dan Winship
3db434d6be Remove errors from LocalTrafficDetector constructors
The constructors only return an error if you pass them invalid data,
but we only ever pass them data which has already been validated,
making the error checking just annoying. Just make them return garbage
output if you give them garbage input.
2024-04-26 11:34:37 -04:00
Dan Winship
ba57fd7c84 Merge linux and windows kube-proxy metric registration together
Windows proxy metric registration was in a separate file, which had
led to some metrics (eg the new ProxyHealthzTotal and ProxyLivezTotal)
not being registered for Windows even though they were implemented by
platform-generic code.

(A few other metrics were neither registered on, nor implemented on
Windows, and that's probably a bug.)

Also, beyond linux-vs-windows, make it clearer which metrics are
specific to individual backends.
2024-04-26 09:27:41 -04:00
Dan Winship
dc1155bd53 Move LocalTrafficDetector from pkg/proxy/util/iptables to pkg/proxy/util
Since it's used for nftables as well now.
2024-04-25 08:51:43 -04:00
Marek Siarkowicz
3ee8178768 Cleanup defer from SetFeatureGateDuringTest function call 2024-04-24 20:25:29 +02:00
Dan Winship
3ecd933276 fix/simplify an nftables unit test
The nodeport-ips value is part of the baseline, which wouldn't change
no matter what Services or EndpointSlices we added/removed.
2024-04-18 09:25:06 -04:00
Dan Winship
19b3a9e194 (Mostly) Revert "change --nodeport-addresses behavior to default to primary node ip only"
This reverts commit 8bccf4873b, except
for the nftables unit test changes, since we still want the "new"
results (not to mention the bugfixes), just for a different reason
now.
2024-04-18 09:25:06 -04:00
Dan Winship
0b599aa8e3 Add --nodeport-addresses primary
The behavior when you specify no --nodeport-addresses value in a
dual-stack cluster is terrible and we can't fix it, for
backward-compatibility reasons. Actually, the behavior when you
specify no --nodeport-addresses value in a single-stack cluster isn't
exactly awesome either...

Allow specifying `--nodeport-addresses primary` to get the
previously-nftables-backend-specific behavior of listening on only the
node's primary IP or IPs.
2024-04-18 09:25:06 -04:00
Kubernetes Prow Robot
053acbed90 Merge pull request #122724 from nayihz/feat_nft_nodeport_addr
change --nodeport-addresses behavior to default to primary node ip only
2024-01-26 16:32:03 +01:00
Kubernetes Prow Robot
e023511deb Merge pull request #122920 from danwinship/knftables-migration
Update knftables, with new sigs.k8s.io module name
2024-01-26 07:14:16 +01:00
nayihz
8bccf4873b change --nodeport-addresses behavior to default to primary node ip only 2024-01-25 13:42:30 +08:00
Kubernetes Prow Robot
55f9657e07 Merge pull request #122692 from aroradaman/reject-packets-to-invalid-port
proxy/nftables: reject packets destined for invalid ports of service ips
2024-01-24 23:17:34 +01:00
Dan Winship
09abfa46be Update knftables, with new sigs.k8s.io module name 2024-01-23 08:09:05 -05:00
Daman Arora
25a40b1c7c pkg/proxy/nftables: handle traffic to node ports with no endpoints
NFTables proxy will no longer install drop and reject rules for node
port services with no endpoints in chains associated with forward and
output hooks.

Signed-off-by: Daman Arora <aroradaman@gmail.com>
2024-01-21 20:07:56 +05:30
Daman Arora
4b40299133 pkg/proxy/nftables: handle traffic to cluster ip
NFTables proxy will now drop traffic directed towards unallocated
ClusterIPs and reject traffic directed towards invalid ports of
Cluster IPs.

Signed-off-by: Daman Arora <aroradaman@gmail.com>
2024-01-21 19:58:37 +05:30
Dan Winship
fcb51554a1 Plumb the conntrack.Interface up to the proxiers
And use the fake interface in the unit tests, removing the dependency
on setting up FakeExec stuff when conntrack cleanup will be invoked.

Also, remove the isIPv6 argument to CleanStaleEntries, because it can
be inferred from the other args.
2024-01-15 13:09:05 -05:00
Dan Winship
cdf934d5bc Remove redundant iptables/nftables conntrack cleanup tests
The iptables and nftables proxy backends had 2 unit tests
(TestDeleteEndpointConnections and TestProxierDeleteNodePortStaleUDP)
that were effectively testing that:

  - If the proxy saw various Service/EndpointSlice events this would
    result in specific changes to the service/endpoints trackers, AND

  - If the service/endpoints trackers changed in those specific ways
    this would result in specific UpdateServiceMapResult and
    UpdateEndpointsMapResult values being generated, AND

  - If you passed those specific UpdateServiceMapResult and
    UpdateEndpointsMapResult values to conntrack.CleanStaleEntries it
    would make specific calls to the lower-level conntrack methods,
    AND

  - If you called the lower-level conntrack methods with those
    specific arguments, it would result in specific executions of the
    conntrack binary, mixed with a specific number of klog
    invocations.

This... is not a good unit test. We already test the change tracker
behavior in other unit tests, and we already tested the
Update{Service,Endpoints}MapResult behavior in the pkg/proxy unit
tests, and we already tested the conntrack exec behavior in
pkg/proxy/conntrack/conntrack_test.go, and we now test the
CleanStaleEntries behavior in pkg/proxy/conntrack/cleanup_test.go. So
there is no need to try to test the top-to-bottom behavior as a "unit
test".
2024-01-15 13:08:52 -05:00
Daman Arora
4ffa12b9d9 pkg/proxy/nftables: drop ct-state-invalid rule
Signed-off-by: Daman Arora <aroradaman@gmail.com>
2024-01-10 22:53:09 +05:30
Kubernetes Prow Robot
95a159299b Merge pull request #122614 from tnqn/nftables-firewall
kube-proxy: fix LoadBalancerSourceRanges not working for nftables mode
2024-01-09 22:27:16 +01:00
Quan Tian
f21f8d9984 kube-proxy: fix LoadBalancerSourceRanges not working for nftables mode
Previously, the firewall-check chain was run in input, forward, and
output hook but not prerouting hook. When the LoadBalancer traffic
arrived at input or forward hook, it had been DNATed to endpoint IP and
port, so the firewall-check chain didn't take effect, traffic from out
of LoadBalancerSourceRanges was not dropped.

It was not detected by unit test because the chains were sorted by
priority only, while hook should be taken into consideration.

The commit links the firewall-check chain to prerouting hook and unlinks
it from input and forward hook to ensure the traffic is filtered before
DNAT. The priorities of filter chains are updated from "DNATPriority-1"
to "DNATPriority-10" to allow third parties to insert something else
between them.

Signed-off-by: Quan Tian <qtian@vmware.com>
2024-01-09 17:34:16 +08:00
Lars Ekman
d2294007b0 kube-proxy: store LoadBalancerVIPs as net.IP
They were stored as strings which could be non-canonical
and cause problems
2024-01-09 09:17:43 +01:00
Kubernetes Prow Robot
2cf7465755 Merge pull request #122605 from tnqn/stale-chain-cleanup
kube-proxy: do not delete previously stale but currently active chains
2024-01-08 17:30:53 +01:00
Kubernetes Prow Robot
f538feed8c Merge pull request #122296 from tnqn/nftables-kernel-requirement
kube-proxy: change implementation of LoadBalancerSourceRanges for wider kernel support
2024-01-08 17:30:27 +01:00
Quan Tian
377f521038 kube-proxy: change implementation of LoadBalancerSourceRanges for wider kernel support
The nftables implementation made use of concatenation of ranges when
creating the set "firewall-allow", but the support was not available
before kernel 5.6. Therefore, nftables mode couldn't run on earlier
kernels, while 5.4 is still widely used.

An alternative of concatenation of ranges is to create a separate
firewall chain for every service port that needs firewalling, and jump
to the service's firewall chain from the common firewall chain via a
rule with vmap.

Renaming from "firewall" to "firewall-ips" is required when changing the
set to the map to support existing clusters to upgrade, otherwise it
would fail to create the map. Besides, "firewall-ips" corresponds to the
"service-ips" map, later we can add use "firewall-nodeports" if it's
determined that NodePort traffic should be subject to
LoadBalancerSourceRanges.

Signed-off-by: Quan Tian <qtian@vmware.com>
2024-01-08 19:26:38 +08:00
Quan Tian
ca8c27c480 kube-proxy: do not delete previously stale but currently active chains
In some cases a chain could change from stale to active, but once it's
added to staleChains it would always be deleted once. When the proxier
tries to delete a previously stale but currently active chain, it would
fail and lead to errors, though it won't cause real problem thanks to
kernel's validation.

The commit removes a chain from staleChains if it becomes active.

Signed-off-by: Quan Tian <qtian@vmware.com>
2024-01-08 17:53:52 +08:00
Dan Winship
c1ce1e00ee Properly build-tag the Linux kube-proxy backend code
This had to be able to build on OS X before to make verify-typecheck
pass, but now that that's fixed we can tag the code properly as being
linux-only.
2023-12-18 20:20:51 -05:00
Dan Winship
0993bb78ef Redo service dispatch with maps 2023-10-31 17:54:53 -04:00
Dan Winship
9d71513ac1 Redo no-endpoint handling with maps 2023-10-31 17:54:53 -04:00
Dan Winship
4128631d0f Redo LoadBalancerSourceRanges firewall using sets 2023-10-31 17:54:53 -04:00
Dan Winship
ef1347b06d Port NAT rules to nftables (and backend is now functional) 2023-10-31 17:54:51 -04:00
Dan Winship
0c5c620b4f Port filter rules to nftables 2023-10-31 17:40:45 -04:00
Dan Winship
6cff415305 Port service/endpoint chain creation/cleanup to nftables 2023-10-31 17:40:45 -04:00
Dan Winship
2735ad541e Port table setup/cleanup code to nftables 2023-10-31 17:40:30 -04:00