kubeproxy_conntrack_reconciler_deleted_entries_total can be used
to track total entries deleted in conntrack reconciliation.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
kube_proxy_conntrack_reconciler_sync_duration_seconds can be used
to track the latency of conntrack flow reconciliation.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
objects.
Change the order of operations to stop current iteration if no changes
to the service chains are needed.
Bump syncProxy frequency to 1 hour.
In a test kind cluster creation of 10K services, 2 endpoints each,
takes ~25m before the fix and ~9min after. Maximum memory usage
during creation is ~650MiB and 260MiB respectively.
Another important metric is the time it takes to create 1 new service
when 10K svc already exist. It used to take ~8m before the fix,
with partialSync it takes ~141ms.
Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
If the nfacct sub-system is not available in the kernel then:
1. nfacct based metrics won't be registered.
2. proxier will not attempt to ensure the counters
Signed-off-by: Daman Arora <aroradaman@gmail.com>
Track packets dropped by proxy which were marked invalid by conntrack
using nfacct netfilter extended accounting infrastructure.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
Windows proxy metric registration was in a separate file, which had
led to some metrics (eg the new ProxyHealthzTotal and ProxyLivezTotal)
not being registered for Windows even though they were implemented by
platform-generic code.
(A few other metrics were neither registered on, nor implemented on
Windows, and that's probably a bug.)
Also, beyond linux-vs-windows, make it clearer which metrics are
specific to individual backends.
Instead of using two metrics use just one metrics with multiple labels,
since the labels can only get 2 values, 200 or 503 there is no risk of
carindality explosion and are simple to represent in graphs.
Change-Id: I0e9cbd6ec2051de44d277d673dc20f02b96aa4d1
Historically, IptablesRulesTotal could have been intepreted as either
"the total number of iptables rules kube-proxy is responsible for" or
"the number of iptables rules kube-proxy rewrote on the last sync".
Post-MinimizeIPTablesRestore, these are very different things (and
IptablesRulesTotal unintentionally became the latter).
Fix IptablesRulesTotal (sync_proxy_rules_iptables_total) to be "the
total number of iptables rules kube-proxy is responsible for" and add
IptablesRulesLastSync (sync_proxy_rules_iptables_last) to be "the
number of iptables rules kube-proxy rewrote on the last sync".
This adds a metric, kubeproxy_sync_proxy_rules_last_queued_timestamp,
that captures the last time a change was queued to be applied to the
proxy. This matches the healthz logic, which fails if a pending change
is stale.
This allows us to write alerts that mirror healthz.
Signed-off-by: Casey Callendrello <cdc@redhat.com>
As mentioned in issue #80061, in iptables lock contention case,
we can see increasing rate of iptables restore failures because it
need to grab iptables file lock.
The failure metric can provide administrators more insight
Metrics will be collected in kube-proxy iptables and ipvs modes
Signed-off-by: Hui Luo <luoh@vmware.com>
refs https://github.com/kubernetes/perf-tests/issues/640
We have too fine buckets granularity for lower latencies, at cost of the higher
latecies (7+ minutes). This is causing spikes in SLI calculated based on that
metrics.
I don't have strong opinion about actual values - those seemed to be better
matching our need. But let's have discussion about them.
Values:
0.015 s
0.030 s
0.060 s
0.120 s
0.240 s
0.480 s
0.960 s
1.920 s
3.840 s
7.680 s
15.360 s
30.720 s
61.440 s
122.880 s
245.760 s
491.520 s
983.040 s
1966.080 s
3932.160 s
7864.320 s
This adds some useful metrics around pending changes and last successful
sync time.
The goal is for administrators to be able to alert on proxies that, for
whatever reason, are quite stale.
Signed-off-by: Casey Callendrello <cdc@redhat.com>