Commit Graph

44 Commits

Author SHA1 Message Date
kklinch0
2d294f0546 monitoring disable alerta sign up
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-06-29 01:18:38 +03:00
Andrei Kvapil
3b8a9f9d2c Configure all apps to use new function to generate subjects
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-06-16 20:32:11 +02:00
kklinch0
0fa70d9d38 [platform] cut resources
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-06-13 01:06:05 +03:00
kklinch0
225d103509 [k8s] add topologySpreadConstraints for client k8s cluster
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-05-28 23:17:00 +02:00
kklinch0
1d639fda0d [tests] fix tests and add retry for monitoring hr tests
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-04-10 15:23:31 +03:00
kklinch0
5729666e72 fix cpu
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-03-20 14:59:47 +03:00
kklinch0
85ec09b8de bugfix/add-limits-for-extra-packages
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-03-20 14:59:47 +03:00
kklinch0
e0a63c32b0 bugfix/fix-monitoring-resources
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-03-20 14:59:47 +03:00
kklinch0
aa084b4635 feature/add-vpa-for-monitoring 2025-03-10 10:02:12 +03:00
xy2
063439ac94 Raise maxLabelsPerTimeseries for VictoriaMetrics vminsert. (#677) 2025-03-06 14:44:58 +01:00
Andrei Kvapil
d4452ea708 Merge pull request #660 from xy2/169-victoria-limits
Increase VMSelect default cpu limit
2025-03-05 14:13:32 +01:00
kklinch0
88729e4124 rename globalAppTopologySpreadConstraints 2025-03-05 11:39:41 +03:00
kklinch0
4cce138d31 feature/add-topologyspreadconstraints-pg 2025-03-05 10:41:43 +03:00
Denis Seleznev
3e273c03b6 Increase the default cpu limit for vminsert. 2025-03-03 19:31:27 +01:00
Denis Seleznev
da0437a774 Make it possible to set cpu limit too. 2025-03-03 19:31:05 +01:00
Denis Seleznev
78cff8c223 Change defaults calculation logic. 2025-03-03 19:18:24 +01:00
Andrei Kvapil
ef2e065c77 Update Grafana, build plugins (#607)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

- **New Features**
  - Updated Grafana to version 11.4.0
- Added new Grafana plugins: VictoriaMetrics logs datasource, Natel
Discrete Panel, and Worldmap Panel

- **Improvements**
  - Enhanced Grafana image build process
  - Dynamically manage Grafana image versioning
  - Updated plugin installation method

- **Version Update**
  - Monitoring package version bumped from 1.7.0 to 1.8.0

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-01-27 18:45:48 +01:00
klinch0
e037cb0e3e feature/add-tg-severity-setting (#585)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added option to disable Telegram alerts for specific severity levels
in the Monitoring Hub.

- **Documentation**
- Updated README with new parameter
`alerta.alerts.telegram.disabledSeverity`.

- **Chores**
  - Bumped monitoring package version from 1.6.1 to 1.7.0.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Andrei Kvapil <kvapss@gmail.com>
2025-01-17 14:49:33 +01:00
klinch0
59b4a0fb91 bugfix/monitoring add nil checker (#587)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Version Update**
  - Monitoring application version updated from 1.6.1 to 1.6.2
- **Configuration Improvements**
  - Enhanced resource configuration checks for VM cluster components
- Improved handling of resource definitions to prevent potential errors

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-01-17 14:47:29 +01:00
klinch0
3bb975965d enablePodMonitor (#572)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enabled pod monitoring for multiple database clusters (Alerta,
Keycloak, SeaweedFS, Grafana)

- **Chores**
	- Updated monitoring package version from 1.6.0 to 1.6.1
- Updated version mapping with specific commit hash for monitoring
package
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-01-15 10:53:16 +01:00
Andrei Kvapil
107f390ae8 workloadmonitor (#563)
- upd redis
- update kubernetes app to use workloadmonitors
- upd kubernetes
- fix version


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added `WorkloadMonitor` resources for various components including
Kubernetes clusters, Redis, Sentinel, and SeaweedFS.
- Introduced monitoring capabilities for `alerta`, `alertmanager`,
`grafana`, and `vlogs` services.
- Enhanced RBAC configurations to support new monitoring resources
across multiple API groups.

- **Improvements**
	- Updated metadata and labeling for virtual machine templates.
	- Added dynamic resource naming based on release and group names.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-01-09 13:25:12 +01:00
klinch0
d4634797f3 feature/add resources to vmcluster (#556)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **Version Updates**
  - Tenant application version bumped from 1.6.5 to 1.6.6
  - Monitoring application version updated from 1.5.3 to 1.5.4

- **Monitoring Configuration**
- Adjusted metrics storage deduplication interval: shortterm from 5
minutes to 15 seconds, longterm from 15 seconds to 5 minutes
- Updated resource configurations for VM components, including new
resource specifications for vminsert, vmselect, and vmstorage
- Increased memory limits and requests for VMAgent from 500Mi to 1024Mi
and from 200Mi to 768Mi, respectively

- **Performance Improvements**
  - Enhanced resource allocation for monitoring services
  - More flexible configuration options for metrics storage
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-01-09 13:18:46 +01:00
Andrei Kvapil
c48aed0aa8 hardcode vlogs version (#545) 2024-12-27 14:33:32 +01:00
klinch0
17fbda6e12 fix-vm-logs-url (#538)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
	- Updated monitoring application version to 1.5.3.
- Changed the data source type in Grafana configuration to
`victoriametrics-logs-datasource`.
- **Bug Fixes**
	- Corrected plugin loading configuration in Grafana.
- **Chores**
- Updated version mapping for the monitoring package in the versions
map.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2024-12-23 23:40:52 +01:00
klinch0
c1ca19dc18 add grafana size configure (#536)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced a new parameter for Grafana's database size with a default
value of 10Gi.
  
- **Bug Fixes**
- Updated default values for `alerta.alerts.telegram.token` and
`alerta.alerts.telegram.chatID` to empty strings.

- **Documentation**
- Revised the README to reflect changes in default parameter values and
added new parameters for Grafana.

- **Chores**
  - Updated the monitoring application's version from 1.5.2 to 1.5.3.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2024-12-20 11:21:54 +01:00
klinch0
cfe86c0815 delete-cpu-limit (#535)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Enhanced resource management for the VMCluster resource, specifically
for the `vmstorage` component.
- Added resource specifications including memory limits and CPU
requests.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2024-12-19 21:48:11 +01:00
Andrei Kvapil
66d9b17525 fix monitoring: show alerts only from first instance (#521)
We don't need to show alerts from longterm instance, because the alerts
have shorter timeout than metrics collection interval


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Updated the `VMAlert` YAML template to generate only the first
`VMAlert` resource based on metrics storage values.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-12-09 14:00:40 +01:00
klinch0
3c27a1e9bf add metrics agents (#461)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced new HelmRelease configurations for cert-manager, monitoring
agents, and Victoria Metrics Operator in Kubernetes.
- Added resource specifications for `vmselect` in the VMCluster
configuration.
- Enhanced resource management for `vmselect` with defined limits and
requests for memory and CPU.

- **Bug Fixes**
	- Adjusted resource limits for Redis failover memory allocation.

- **Documentation**
- Updated README and release notes for various components, enhancing
clarity and usability.

- **Chores**
- Updated image versions across multiple components for consistency and
performance improvements.
- Modified migration scripts to facilitate transitions and manage
resources effectively.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Andrei Kvapil <kvapss@gmail.com>
2024-11-04 19:01:33 +01:00
klinch0
57e90b700f fix alerta webhook url (#457)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Updated monitoring application version to 1.5.1, enhancing overall
stability.
	- Introduced a new API key handling mechanism for improved security.
- Enhanced Ingress resource configurability with dynamic URL
construction.

- **Bug Fixes**
- Adjusted formatting for Ingress annotations to ensure proper
functionality.

- **Chores**
- Updated version mappings for the monitoring package in the versions
map.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2024-10-28 15:59:28 +01:00
Andrei Kvapil
4812874389 fix uploading vm images using virtctl (#422)
Upstream fix:
https://github.com/kubevirt/containerized-data-importer/pull/3461

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a new version (`v1beta1`) for the CDI operator alongside
the existing version, enhancing configuration options.
- Expanded `spec` section with detailed descriptions for various
configurations including data volume management and TLS security
profiles.
- Added a new Ingress resource for the `cdi-uploadproxy` service,
improving traffic routing capabilities.
- Introduced new configuration parameters for dynamic upload proxy URL
management.

- **Improvements**
- Updated permissions for the CDI operator to manage additional
resources, improving its data handling capabilities.
- Refined deployment configuration with updated container image
references and environment variables for better operational control.
- Enhanced network policy definitions by adding specific rules for new
services while maintaining existing policies.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-10-16 18:37:13 +02:00
Andrei Kvapil
15001dc6ad Fix ingress for grafana and alerta (#401)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-10-07 21:12:53 +02:00
Mr Khachaturov
eda62ff77b External-dns and new clusterissuer dns01 Cloudflare (#374)
Overview

This pull request introduces the integration of External-DNS into the
full bundles and adds support for a dns01 ClusterIssuer using
Cloudflare. It enhances the DNS management capabilities for our
deployments by allowing dynamic DNS record management directly from
Kubernetes resources.

Changes Made

 1. **External-DNS Integration:**
 
   - Added External-DNS to the full deployment bundles.

- Configured External-DNS to automatically manage DNS records for
services within the Kubernetes cluster ( we must discuss how to
configure external-dns via configmap or create an application in tenant
`external-dns` where we can define values).

We must define some additional annotations for ingresses in order to
make external-dns work , so we must discuss this also which is best
method to configure it ( from configmap or dashboard ).

**2. dns01 ClusterIssuer for Cloudflare:**

- Implemented support for a dns01 ClusterIssuer using Cloudflare.
- This allows for automated certificate issuance via DNS challenge,
leveraging Cloudflare as the DNS provider.
- The configuration can be defined in the Cozystack ConfigMap

3. Default Ingress Configuration: 

- Updated the default Ingress resources to use Cloudflare for DNS
challenges.
- Ensured that if the Cloudflare issuer is defined in the Cozystack
ConfigMap, it will be utilized for all default Ingresses, streamlining
the deployment process and improving reliability.

**Benefits**

- Automated DNS Management: With External-DNS, DNS entries will be
created and updated automatically based on the state of Kubernetes
resources, reducing manual overhead.
- Seamless Certificate Management: The dns01 ClusterIssuer integration
allows for automated SSL/TLS certificate issuance, enhancing security
for deployed applications.
- Flexibility in Configuration: Users can easily switch between
different issuers by updating the Cozystack ConfigMap, providing
flexibility in the choice of DNS and certificate management solutions.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **New Features**
- Introduced a new `external-dns` release with support for managing DNS
records in Kubernetes.
- Added configuration options for DNS synchronization policies and
provider settings.
  - Implemented a new lookup for issuer types in Ingress configurations.
- Expanded configuration with new entries for `external-dns` in multiple
deployment files, enhancing deployment flexibility.

- **Documentation**
- Comprehensive README and configuration schema for the `external-dns`
Helm chart added, detailing installation and customization options.

- **Improvements**
  - Enhanced RBAC configuration for flexible permissions management.
- Updated annotations and health check configurations for better service
monitoring.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Co-authored-by: Andrei Kvapil <kvapss@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-10-04 12:56:39 +02:00
Andrei Kvapil
910a9e5378 Grafana remove flant-statusmap-panel plugin (#360)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-09-26 11:44:25 +02:00
Andrei Kvapil
ec27a19afb Add basic alerting system (#355)
![alerta](https://github.com/user-attachments/assets/87f792c1-0e1f-4070-84b1-7335cc0e7810)


- Remove grafana-oncall
- Add Alerta
- Configure basic alerts
- Update grafana 10 --> 11

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added new configuration options for the Alerta service, enhancing user
customization.
- Introduced a new Helm chart for the VictoriaMetrics Kubernetes stack,
enabling comprehensive monitoring solutions.
- Added VMAuth feature for enhanced authentication in the Kubernetes
stack.

- **Bug Fixes**
- Fixed issues with the ETCD dashboard and improved ingress path prefix
handling.

- **Documentation**
- Updated README and release guide for the VictoriaMetrics stack with
installation and configuration instructions.
	- Introduced a changelog for organized tracking of changes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-09-26 10:56:53 +02:00
Andrei Kvapil
b8e33d194d Prepare release v0.13.0 (#321)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced monitoring capabilities for Kubernetes deployments, including
checks for `vmalert`, `vlogs`, and `vmcluster`.

- **Updates**
- Updated container images for `cozystack` and `darkhttpd` to version
`v0.13.0`.
- Version mapping updates for `ferretdb`, `kubernetes`, and
`virtual-machine` packages.
- Updated image tags and digests for Kubeapps components to version
`v0.13.0`.
	- Updated image tag for Kamaji to version `v0.13.0`.
	- Added new pod metadata labels to the `vmalertmanager` configuration.

- **Bug Fixes**
- Improved operational status checks for Kubernetes resources using
JSONPath expressions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-08-31 09:04:40 +02:00
Andrei Kvapil
adaf603bc2 Add fluent-bit and VictoriaLogs (#305)
![Screenshot 2024-08-28 at 15-10-20 Explore - vlog-generic -
Grafana](https://github.com/user-attachments/assets/4ba926d3-fb56-411b-88d5-a00d5d17b3dc)

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-08-29 12:46:46 +02:00
Andrei Kvapil
c07c4bbdab Introduce stroageClass option for all applications (#290)
Provide the oportunity to specify StroageClass in applications

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-08-20 17:19:10 +02:00
Andrei Kvapil
4471b4ba2a Fix vmrules to process memory metrics (#289)
This PR fixes memory charts,

fixes https://github.com/aenix-io/cozystack/issues/285


![image](https://github.com/user-attachments/assets/3ceb8a4d-6fdf-49d3-80be-ff83567ba61c)

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-08-16 10:26:23 +02:00
Andrei Kvapil
c56e576906 fix network-policies (#272)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-08-12 10:10:18 +02:00
Andrei Kvapil
00f7c3647b Upd dashboard and handle ResourceView (#262)
- Patch Dashboard to use specific role for resourceview
- Update kubeapps v2.11.0

partially fixes https://github.com/aenix-io/cozystack/issues/259

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-08-07 12:35:45 +02:00
Andrei Kvapil
c9e0d63b77 Rename system releases to have -system suffix
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-07-19 12:26:17 +02:00
Kingdon Barrett
931e39fb5c Upgrade to Flux 2.3.x (#167)
Signed-off-by: Kingdon Barrett <kingdon+github@tuesdaystudios.com>
Co-authored-by: Andrei Kvapil <kvapss@gmail.com>
2024-06-17 16:02:32 +02:00
Andrei Kvapil
2d21ed6ac9 fix: grafana ingress class (#85)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-04-17 22:51:54 +02:00
Andrei Kvapil
f642698921 Preapare release v0.0.1
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-02-08 12:04:32 +01:00