48 Commits

Author SHA1 Message Date
Alexey Artamonov
82cebe3ad7 add settings alert for slack
Signed-off-by: Alexey Artamonov <scooby18@yandex.ru>
2025-10-22 17:22:18 +03:00
Timofei Larkin
c16e37e079 [controller,api] Refactor tenant resource label
This patch refactors the secret selectors to use the
`internal.cozystack.io/tenantresource` label for managing secret
visibility and removes any selectors based on it or the previous
`apps.cozystack.io/tenantresource` label, the idea being that this label
will only ever be set by the controller.

```
[controller,api] Refactor labels for the secret selector.
```

Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
2025-10-01 13:24:40 +03:00
Timofei Larkin
21ca1349c4 [monitoring] Add lineage labels to VM components
Since the VictoriaMetrics operator aggressively manages the metadata on
all owned components, the addition of labels by the lineage webhook
causes non-stop updates sent to the k8s API server. We mitigate this by
modifying the Monitoring Helm chart to set the `managedMetadata` field
on all VictoriaMetrics custom resources, where applicable.

```release-note
[monitoring] Explicitly set lineage labels on VictoriaMetrics' resources
known not to play nice when something modifies their owned resources in
flight.
```

Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
2025-09-25 10:53:34 +03:00
Timofei Larkin
7e4883dfcc [platform] Mark some secrets as non-user-facing
Some k8s secrets created when deploying managed applications are
unhelpful to the end user or are outright not meant to be shown, because
they contain internal credentials not meant to be presented to the user.
This patch adds an `apps.cozystack.io/tenantresource=false` label to
such resources which will be later used to filter out such secrets in
the web UI.

```release-note
[platform] Mark non-user-facing secrets as such to avoid clutter in the
dashboard and leaking internal credentials.
```

Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
2025-09-23 15:09:18 +03:00
kklinch0
2d294f0546 monitoring disable alerta sign up
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-06-29 01:18:38 +03:00
Andrei Kvapil
3b8a9f9d2c Configure all apps to use new function to generate subjects
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-06-16 20:32:11 +02:00
kklinch0
0fa70d9d38 [platform] cut resources
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-06-13 01:06:05 +03:00
kklinch0
225d103509 [k8s] add topologySpreadConstraints for client k8s cluster
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-05-28 23:17:00 +02:00
kklinch0
1d639fda0d [tests] fix tests and add retry for monitoring hr tests
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-04-10 15:23:31 +03:00
kklinch0
5729666e72 fix cpu
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-03-20 14:59:47 +03:00
kklinch0
85ec09b8de bugfix/add-limits-for-extra-packages
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-03-20 14:59:47 +03:00
kklinch0
e0a63c32b0 bugfix/fix-monitoring-resources
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-03-20 14:59:47 +03:00
kklinch0
aa084b4635 feature/add-vpa-for-monitoring 2025-03-10 10:02:12 +03:00
xy2
063439ac94 Raise maxLabelsPerTimeseries for VictoriaMetrics vminsert. (#677) 2025-03-06 14:44:58 +01:00
Andrei Kvapil
d4452ea708 Merge pull request #660 from xy2/169-victoria-limits
Increase VMSelect default cpu limit
2025-03-05 14:13:32 +01:00
kklinch0
88729e4124 rename globalAppTopologySpreadConstraints 2025-03-05 11:39:41 +03:00
kklinch0
4cce138d31 feature/add-topologyspreadconstraints-pg 2025-03-05 10:41:43 +03:00
Denis Seleznev
3e273c03b6 Increase the default cpu limit for vminsert. 2025-03-03 19:31:27 +01:00
Denis Seleznev
da0437a774 Make it possible to set cpu limit too. 2025-03-03 19:31:05 +01:00
Denis Seleznev
78cff8c223 Change defaults calculation logic. 2025-03-03 19:18:24 +01:00
Andrei Kvapil
ef2e065c77 Update Grafana, build plugins (#607)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

- **New Features**
  - Updated Grafana to version 11.4.0
- Added new Grafana plugins: VictoriaMetrics logs datasource, Natel
Discrete Panel, and Worldmap Panel

- **Improvements**
  - Enhanced Grafana image build process
  - Dynamically manage Grafana image versioning
  - Updated plugin installation method

- **Version Update**
  - Monitoring package version bumped from 1.7.0 to 1.8.0

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-01-27 18:45:48 +01:00
klinch0
e037cb0e3e feature/add-tg-severity-setting (#585)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added option to disable Telegram alerts for specific severity levels
in the Monitoring Hub.

- **Documentation**
- Updated README with new parameter
`alerta.alerts.telegram.disabledSeverity`.

- **Chores**
  - Bumped monitoring package version from 1.6.1 to 1.7.0.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Andrei Kvapil <kvapss@gmail.com>
2025-01-17 14:49:33 +01:00
klinch0
59b4a0fb91 bugfix/monitoring add nil checker (#587)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Version Update**
  - Monitoring application version updated from 1.6.1 to 1.6.2
- **Configuration Improvements**
  - Enhanced resource configuration checks for VM cluster components
- Improved handling of resource definitions to prevent potential errors

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-01-17 14:47:29 +01:00
klinch0
3bb975965d enablePodMonitor (#572)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enabled pod monitoring for multiple database clusters (Alerta,
Keycloak, SeaweedFS, Grafana)

- **Chores**
	- Updated monitoring package version from 1.6.0 to 1.6.1
- Updated version mapping with specific commit hash for monitoring
package
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-01-15 10:53:16 +01:00
Andrei Kvapil
107f390ae8 workloadmonitor (#563)
- upd redis
- update kubernetes app to use workloadmonitors
- upd kubernetes
- fix version


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added `WorkloadMonitor` resources for various components including
Kubernetes clusters, Redis, Sentinel, and SeaweedFS.
- Introduced monitoring capabilities for `alerta`, `alertmanager`,
`grafana`, and `vlogs` services.
- Enhanced RBAC configurations to support new monitoring resources
across multiple API groups.

- **Improvements**
	- Updated metadata and labeling for virtual machine templates.
	- Added dynamic resource naming based on release and group names.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-01-09 13:25:12 +01:00
klinch0
d4634797f3 feature/add resources to vmcluster (#556)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **Version Updates**
  - Tenant application version bumped from 1.6.5 to 1.6.6
  - Monitoring application version updated from 1.5.3 to 1.5.4

- **Monitoring Configuration**
- Adjusted metrics storage deduplication interval: shortterm from 5
minutes to 15 seconds, longterm from 15 seconds to 5 minutes
- Updated resource configurations for VM components, including new
resource specifications for vminsert, vmselect, and vmstorage
- Increased memory limits and requests for VMAgent from 500Mi to 1024Mi
and from 200Mi to 768Mi, respectively

- **Performance Improvements**
  - Enhanced resource allocation for monitoring services
  - More flexible configuration options for metrics storage
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-01-09 13:18:46 +01:00
Andrei Kvapil
c48aed0aa8 hardcode vlogs version (#545) 2024-12-27 14:33:32 +01:00
klinch0
17fbda6e12 fix-vm-logs-url (#538)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
	- Updated monitoring application version to 1.5.3.
- Changed the data source type in Grafana configuration to
`victoriametrics-logs-datasource`.
- **Bug Fixes**
	- Corrected plugin loading configuration in Grafana.
- **Chores**
- Updated version mapping for the monitoring package in the versions
map.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2024-12-23 23:40:52 +01:00
klinch0
c1ca19dc18 add grafana size configure (#536)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced a new parameter for Grafana's database size with a default
value of 10Gi.
  
- **Bug Fixes**
- Updated default values for `alerta.alerts.telegram.token` and
`alerta.alerts.telegram.chatID` to empty strings.

- **Documentation**
- Revised the README to reflect changes in default parameter values and
added new parameters for Grafana.

- **Chores**
  - Updated the monitoring application's version from 1.5.2 to 1.5.3.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2024-12-20 11:21:54 +01:00
klinch0
cfe86c0815 delete-cpu-limit (#535)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Enhanced resource management for the VMCluster resource, specifically
for the `vmstorage` component.
- Added resource specifications including memory limits and CPU
requests.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2024-12-19 21:48:11 +01:00
Andrei Kvapil
66d9b17525 fix monitoring: show alerts only from first instance (#521)
We don't need to show alerts from longterm instance, because the alerts
have shorter timeout than metrics collection interval


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Updated the `VMAlert` YAML template to generate only the first
`VMAlert` resource based on metrics storage values.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-12-09 14:00:40 +01:00
klinch0
3c27a1e9bf add metrics agents (#461)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced new HelmRelease configurations for cert-manager, monitoring
agents, and Victoria Metrics Operator in Kubernetes.
- Added resource specifications for `vmselect` in the VMCluster
configuration.
- Enhanced resource management for `vmselect` with defined limits and
requests for memory and CPU.

- **Bug Fixes**
	- Adjusted resource limits for Redis failover memory allocation.

- **Documentation**
- Updated README and release notes for various components, enhancing
clarity and usability.

- **Chores**
- Updated image versions across multiple components for consistency and
performance improvements.
- Modified migration scripts to facilitate transitions and manage
resources effectively.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Andrei Kvapil <kvapss@gmail.com>
2024-11-04 19:01:33 +01:00
klinch0
57e90b700f fix alerta webhook url (#457)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Updated monitoring application version to 1.5.1, enhancing overall
stability.
	- Introduced a new API key handling mechanism for improved security.
- Enhanced Ingress resource configurability with dynamic URL
construction.

- **Bug Fixes**
- Adjusted formatting for Ingress annotations to ensure proper
functionality.

- **Chores**
- Updated version mappings for the monitoring package in the versions
map.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2024-10-28 15:59:28 +01:00
Andrei Kvapil
4812874389 fix uploading vm images using virtctl (#422)
Upstream fix:
https://github.com/kubevirt/containerized-data-importer/pull/3461

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced a new version (`v1beta1`) for the CDI operator alongside
the existing version, enhancing configuration options.
- Expanded `spec` section with detailed descriptions for various
configurations including data volume management and TLS security
profiles.
- Added a new Ingress resource for the `cdi-uploadproxy` service,
improving traffic routing capabilities.
- Introduced new configuration parameters for dynamic upload proxy URL
management.

- **Improvements**
- Updated permissions for the CDI operator to manage additional
resources, improving its data handling capabilities.
- Refined deployment configuration with updated container image
references and environment variables for better operational control.
- Enhanced network policy definitions by adding specific rules for new
services while maintaining existing policies.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-10-16 18:37:13 +02:00
Andrei Kvapil
15001dc6ad Fix ingress for grafana and alerta (#401)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-10-07 21:12:53 +02:00
Mr Khachaturov
eda62ff77b External-dns and new clusterissuer dns01 Cloudflare (#374)
Overview

This pull request introduces the integration of External-DNS into the
full bundles and adds support for a dns01 ClusterIssuer using
Cloudflare. It enhances the DNS management capabilities for our
deployments by allowing dynamic DNS record management directly from
Kubernetes resources.

Changes Made

 1. **External-DNS Integration:**
 
   - Added External-DNS to the full deployment bundles.

- Configured External-DNS to automatically manage DNS records for
services within the Kubernetes cluster ( we must discuss how to
configure external-dns via configmap or create an application in tenant
`external-dns` where we can define values).

We must define some additional annotations for ingresses in order to
make external-dns work , so we must discuss this also which is best
method to configure it ( from configmap or dashboard ).

**2. dns01 ClusterIssuer for Cloudflare:**

- Implemented support for a dns01 ClusterIssuer using Cloudflare.
- This allows for automated certificate issuance via DNS challenge,
leveraging Cloudflare as the DNS provider.
- The configuration can be defined in the Cozystack ConfigMap

3. Default Ingress Configuration: 

- Updated the default Ingress resources to use Cloudflare for DNS
challenges.
- Ensured that if the Cloudflare issuer is defined in the Cozystack
ConfigMap, it will be utilized for all default Ingresses, streamlining
the deployment process and improving reliability.

**Benefits**

- Automated DNS Management: With External-DNS, DNS entries will be
created and updated automatically based on the state of Kubernetes
resources, reducing manual overhead.
- Seamless Certificate Management: The dns01 ClusterIssuer integration
allows for automated SSL/TLS certificate issuance, enhancing security
for deployed applications.
- Flexibility in Configuration: Users can easily switch between
different issuers by updating the Cozystack ConfigMap, providing
flexibility in the choice of DNS and certificate management solutions.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

- **New Features**
- Introduced a new `external-dns` release with support for managing DNS
records in Kubernetes.
- Added configuration options for DNS synchronization policies and
provider settings.
  - Implemented a new lookup for issuer types in Ingress configurations.
- Expanded configuration with new entries for `external-dns` in multiple
deployment files, enhancing deployment flexibility.

- **Documentation**
- Comprehensive README and configuration schema for the `external-dns`
Helm chart added, detailing installation and customization options.

- **Improvements**
  - Enhanced RBAC configuration for flexible permissions management.
- Updated annotations and health check configurations for better service
monitoring.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Co-authored-by: Andrei Kvapil <kvapss@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-10-04 12:56:39 +02:00
Andrei Kvapil
910a9e5378 Grafana remove flant-statusmap-panel plugin (#360)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-09-26 11:44:25 +02:00
Andrei Kvapil
ec27a19afb Add basic alerting system (#355)
![alerta](https://github.com/user-attachments/assets/87f792c1-0e1f-4070-84b1-7335cc0e7810)


- Remove grafana-oncall
- Add Alerta
- Configure basic alerts
- Update grafana 10 --> 11

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Added new configuration options for the Alerta service, enhancing user
customization.
- Introduced a new Helm chart for the VictoriaMetrics Kubernetes stack,
enabling comprehensive monitoring solutions.
- Added VMAuth feature for enhanced authentication in the Kubernetes
stack.

- **Bug Fixes**
- Fixed issues with the ETCD dashboard and improved ingress path prefix
handling.

- **Documentation**
- Updated README and release guide for the VictoriaMetrics stack with
installation and configuration instructions.
	- Introduced a changelog for organized tracking of changes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-09-26 10:56:53 +02:00
Andrei Kvapil
b8e33d194d Prepare release v0.13.0 (#321)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced monitoring capabilities for Kubernetes deployments, including
checks for `vmalert`, `vlogs`, and `vmcluster`.

- **Updates**
- Updated container images for `cozystack` and `darkhttpd` to version
`v0.13.0`.
- Version mapping updates for `ferretdb`, `kubernetes`, and
`virtual-machine` packages.
- Updated image tags and digests for Kubeapps components to version
`v0.13.0`.
	- Updated image tag for Kamaji to version `v0.13.0`.
	- Added new pod metadata labels to the `vmalertmanager` configuration.

- **Bug Fixes**
- Improved operational status checks for Kubernetes resources using
JSONPath expressions.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-08-31 09:04:40 +02:00
Andrei Kvapil
adaf603bc2 Add fluent-bit and VictoriaLogs (#305)
![Screenshot 2024-08-28 at 15-10-20 Explore - vlog-generic -
Grafana](https://github.com/user-attachments/assets/4ba926d3-fb56-411b-88d5-a00d5d17b3dc)

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-08-29 12:46:46 +02:00
Andrei Kvapil
c07c4bbdab Introduce stroageClass option for all applications (#290)
Provide the oportunity to specify StroageClass in applications

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-08-20 17:19:10 +02:00
Andrei Kvapil
4471b4ba2a Fix vmrules to process memory metrics (#289)
This PR fixes memory charts,

fixes https://github.com/aenix-io/cozystack/issues/285


![image](https://github.com/user-attachments/assets/3ceb8a4d-6fdf-49d3-80be-ff83567ba61c)

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-08-16 10:26:23 +02:00
Andrei Kvapil
c56e576906 fix network-policies (#272)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-08-12 10:10:18 +02:00
Andrei Kvapil
00f7c3647b Upd dashboard and handle ResourceView (#262)
- Patch Dashboard to use specific role for resourceview
- Update kubeapps v2.11.0

partially fixes https://github.com/aenix-io/cozystack/issues/259

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-08-07 12:35:45 +02:00
Andrei Kvapil
c9e0d63b77 Rename system releases to have -system suffix
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-07-19 12:26:17 +02:00
Kingdon Barrett
931e39fb5c Upgrade to Flux 2.3.x (#167)
Signed-off-by: Kingdon Barrett <kingdon+github@tuesdaystudios.com>
Co-authored-by: Andrei Kvapil <kvapss@gmail.com>
2024-06-17 16:02:32 +02:00
Andrei Kvapil
2d21ed6ac9 fix: grafana ingress class (#85)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-04-17 22:51:54 +02:00
Andrei Kvapil
f642698921 Preapare release v0.0.1
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2024-02-08 12:04:32 +01:00