This patch introduces a whitelist-based label filtering mechanism in
cadvisor/kubelet metrics collection. By explicitly keeping only the
desired labels, we avoid noisy and high-cardinality dimensions while
retaining meaningful CPU metrics for analysis.
This improves the stability of the metrics pipeline and ensures
consistent visibility into application workloads.
```release-note
[platform] Introduce whitelist label filtering for cadvisor/kubelet
metrics to reduce noise and improve CPU metric reliability.
```
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
This patch delivers changes to the monitoring config of Kube-OVN
plunger, which were accidentally omitted in its release, leading to a
duplicate service, broken monitoring agents' helm release and not
actually scraping the plunger.
```release-note
[kubeovn-plunger] Fix the VMServiceScrape object for collecting the
plunger's metrics.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
This patch implements external monitoring of the Kube-OVN cluster. A new
reconciler timed to run its reconcile loop at a fixed interval execs
into the ovn-central pods and collects their cluster info. If the
members' opinions about the cluster disagree, an alert is raised. Other
issues with the distributed consensus are also highlighted.
```release-note
[kubeovn,cozystack-controller] Implement the KubeOVN plunger, an
external monitoring agent for the ovn-central cluster.
```
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
* Reuse the vmagent's serviceaccount
* Mount the serviceaccount token instead of manually creating secrets
* Give the kube-rbac-proxy a unique labelset to avoid targeting wrong
pods
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
This commit enables Cilium's host firewall feature and makes use of it
to deny external connections to two exporters running as daemonset pods
in the host network namespace.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Host firewall is now enabled by default, adding an extra layer of
security.
- Enhanced network traffic management with new policies:
- One policy tightens access to critical service ports.
- Another secures monitoring endpoints by restricting unauthorized
external access.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Updated the remote write configuration to support multiple endpoints,
allowing data ingestion from both short-term and long-term services for
improved flexibility.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: kklinch0 <kklinch0@gmail.com>
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Enhanced system monitoring with a new configuration option to collect
etcd metrics. Users can now enable the scraping of etcd metrics via
updated settings, which improves observability.
- Introduced a secure proxy mechanism that conditionally routes metrics
data from etcd, offering administrators greater control over monitoring
capabilities.
- New configuration sections added to various bundles to support etcd
metrics scraping.
- **Bug Fixes**
- Removed outdated configuration for VMNodeScrape resource, ensuring
clarity and accuracy in monitoring configurations.
- **Chores**
- Added new service accounts, roles, and bindings to facilitate secure
access for monitoring etcd metrics.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Andrei Kvapil <kvapss@gmail.com>
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Resource Configuration**
- Updated VMAgent memory limits from 500Mi to 1024Mi.
- Increased VMAgent memory requests from 200Mi to 768Mi.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced a HelmRelease configuration for monitoring agents in
Kubernetes.
- Added a new section for `fluent-bit` with configurations for readiness
probes, volumes, and log processing.
- **Bug Fixes**
- Enhanced monitoring capabilities with detailed configurations for log
management and external integrations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Streamlined metadata for monitoring agents by removing specific
Helm-related annotations and labels.
- Updated service scrape configuration to enhance target pod
identification with a new relabeling entry.
- **Bug Fixes**
- Adjusted label selection in the `VMServiceScrape` resource to improve
service scrape functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced new HelmRelease configurations for cert-manager, monitoring
agents, and Victoria Metrics Operator in Kubernetes.
- Added resource specifications for `vmselect` in the VMCluster
configuration.
- Enhanced resource management for `vmselect` with defined limits and
requests for memory and CPU.
- **Bug Fixes**
- Adjusted resource limits for Redis failover memory allocation.
- **Documentation**
- Updated README and release notes for various components, enhancing
clarity and usability.
- **Chores**
- Updated image versions across multiple components for consistency and
performance improvements.
- Modified migration scripts to facilitate transitions and manage
resources effectively.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Andrei Kvapil <kvapss@gmail.com>