19 Commits

Author SHA1 Message Date
IvanHunters
8601299a91 [platform] Add whitelist for labels in cadvisor/kubelet metrics
This patch introduces a whitelist-based label filtering mechanism in
cadvisor/kubelet metrics collection. By explicitly keeping only the
desired labels, we avoid noisy and high-cardinality dimensions while
retaining meaningful CPU metrics for analysis.

This improves the stability of the metrics pipeline and ensures
consistent visibility into application workloads.

```release-note
[platform] Introduce whitelist label filtering for cadvisor/kubelet
metrics to reduce noise and improve CPU metric reliability.
```

Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
2025-09-30 11:04:19 +03:00
Timofei Larkin
08d2d61f1a [kubeovn] Fix service scrape for plunger
This patch delivers changes to the monitoring config of Kube-OVN
plunger, which were accidentally omitted in its release, leading to a
duplicate service, broken monitoring agents' helm release and not
actually scraping the plunger.

```release-note
[kubeovn-plunger] Fix the VMServiceScrape object for collecting the
plunger's metrics.
```

Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
2025-09-15 10:50:16 +03:00
Timofei Larkin
382a9787f4 [kubeovn] Implement the KubeOVN plunger
This patch implements external monitoring of the Kube-OVN cluster. A new
reconciler timed to run its reconcile loop at a fixed interval execs
into the ovn-central pods and collects their cluster info. If the
members' opinions about the cluster disagree, an alert is raised. Other
issues with the distributed consensus are also highlighted.

```release-note
[kubeovn,cozystack-controller] Implement the KubeOVN plunger, an
external monitoring agent for the ovn-central cluster.
```

Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
2025-09-11 02:11:58 +03:00
Timofei Larkin
62a6da0063 Make VMAgent extraArgs tunable
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
2025-06-23 11:15:42 +03:00
Andrei Kvapil
dfd01ff118 [platform] Fix deps for paas-hosted bundle
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-06-09 10:12:06 +02:00
kevin880202
fc8b52d73d reset and add audit/event monitoring in fluentbit values
Signed-off-by: kevin880202 <dytoponts11@gmail.com>
2025-05-21 22:07:27 +08:00
kklinch0
4e9446d934 [monitoring] fix vpa for vmagent delete resources
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-04-17 21:38:28 +03:00
Timofei Larkin
60b96e0a62 Refactor management etcd monitoring config
* Reuse the vmagent's serviceaccount
* Mount the serviceaccount token instead of manually creating secrets
* Give the kube-rbac-proxy a unique labelset to avoid targeting wrong
  pods

Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
2025-04-10 16:59:43 +03:00
kklinch0
8e2e77da56 [monitoring] add vpa for vmagent
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-04-08 16:40:39 +03:00
Timofei Larkin
d9c6fb7625 Enable Cilium host firewall (#736)
This commit enables Cilium's host firewall feature and makes use of it
to deny external connections to two exporters running as daemonset pods
in the host network namespace.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Host firewall is now enabled by default, adding an extra layer of
security.
  - Enhanced network traffic management with new policies:
    - One policy tightens access to critical service ports.
- Another secures monitoring endpoints by restricting unauthorized
external access.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
2025-04-02 13:16:15 +02:00
klinch0
a2af07d1dc bugfix/fix-longterm (#697)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Updated the remote write configuration to support multiple endpoints,
allowing data ingestion from both short-term and long-term services for
improved flexibility.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-03-14 10:58:16 +01:00
kklinch0
a6a0752feb add metric-labels-allowlist 2025-03-10 12:35:58 +03:00
kklinch0
6354b564b4 update monitoring-agents stack 2025-03-10 12:04:50 +03:00
kklinch0
554d5dbbca feature/change-severity-for-kube-client-certificate-expiration 2025-03-05 12:41:26 +03:00
klinch0
5a47754a92 feature/add-etcd-vm-node-scrape (#614)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced system monitoring with a new configuration option to collect
etcd metrics. Users can now enable the scraping of etcd metrics via
updated settings, which improves observability.
- Introduced a secure proxy mechanism that conditionally routes metrics
data from etcd, offering administrators greater control over monitoring
capabilities.
- New configuration sections added to various bundles to support etcd
metrics scraping.
  
- **Bug Fixes**
- Removed outdated configuration for VMNodeScrape resource, ensuring
clarity and accuracy in monitoring configurations.

- **Chores**
- Added new service accounts, roles, and bindings to facilitate secure
access for monitoring etcd metrics.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Andrei Kvapil <kvapss@gmail.com>
2025-02-06 13:40:30 +01:00
klinch0
26388c7757 up vmagent limit (#555)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Resource Configuration**
	- Updated VMAgent memory limits from 500Mi to 1024Mi.
	- Increased VMAgent memory requests from 200Mi to 768Mi.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-01-02 12:29:15 +01:00
klinch0
97d006e99f fix logs (#537)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced a HelmRelease configuration for monitoring agents in
Kubernetes.
- Added a new section for `fluent-bit` with configurations for readiness
probes, volumes, and log processing.

- **Bug Fixes**
- Enhanced monitoring capabilities with detailed configurations for log
management and external integrations.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2024-12-23 23:42:00 +01:00
Andrei Kvapil
49df7e24a3 Fix kube-state-mterics and flux alerts labels (#520)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Streamlined metadata for monitoring agents by removing specific
Helm-related annotations and labels.
- Updated service scrape configuration to enhance target pod
identification with a new relabeling entry.

- **Bug Fixes**
- Adjusted label selection in the `VMServiceScrape` resource to improve
service scrape functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-12-09 14:00:59 +01:00
klinch0
3c27a1e9bf add metrics agents (#461)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced new HelmRelease configurations for cert-manager, monitoring
agents, and Victoria Metrics Operator in Kubernetes.
- Added resource specifications for `vmselect` in the VMCluster
configuration.
- Enhanced resource management for `vmselect` with defined limits and
requests for memory and CPU.

- **Bug Fixes**
	- Adjusted resource limits for Redis failover memory allocation.

- **Documentation**
- Updated README and release notes for various components, enhancing
clarity and usability.

- **Chores**
- Updated image versions across multiple components for consistency and
performance improvements.
- Modified migration scripts to facilitate transitions and manage
resources effectively.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Andrei Kvapil <kvapss@gmail.com>
2024-11-04 19:01:33 +01:00